I am a fond user of Wikipedia, so I was interested to find out how to use it when I have no Internet connection. On my mobile phone, I already use the free and open-source Aard Dictionary, but for space reasons I only have the French Wikipedia, without images, and with some formatting glitches. On my computers, where I have more space and where it is easier to set things up, I wanted to have more, so that I can have a usable Wikipedia when I'm on the train, in a plane, or in a place with a crappy Internet connection.
This blog post explains how to set up a local Wikipedia mirror using
Kiwix. Start by
downloading Kiwix and
unpacking it. Kiwix ships with a special GUI to browse Wikipedia offline, but I
prefer to use my usual Web browser. Fortunately, Kiwix also includes the
kiwix-serve program that can serve dumps as a regular Web server.
Next, download the dumps for the projects that you want. Ideally I would be interested in generating my own dumps, but I haven't looked into this yet. I used Wikipedia and Wiktionary French and English, totalling to about 60 GB.
Next, to be able to perform full-text search, you must index the dumps. Maybe
the pre-indexed dumps can be used, I haven't tried. I indexed them manually
instead. For each file
a.zim, I did
kiwix-index -v a.zim a.zim.idx, where
kiwix-index is provided by Kiwix. The process takes a lot of time (10-30 hours
or so) but does not require any interaction. The indexes take another 30 GB.
To serve all dumps with
kiwix-serve, you need to build a library. First, move
.zim and the
.zim.idx files (or symlink them) to have shorter names;
this will make the URLs shorter afterwards. I use
wten.zim. Now run, for each file
a.zim in the working
kiwix-manage `pwd`/wiki.xml add `pwd`/a.zim
the name of the library file that will be created. It is safer to use absolute
This should have created the library file
Now, to start the server, choose a port number (say
4242) and run
--port=4242 --library /where/you/put/wiki.xml. Test it by browsing to
http://localhost:4242 and checking that it works. If it does, you probably want to
arrange for this command to be run at startup.
Please note that Kiwix will also be available from other machines, not just localhost. I couldn't find a way to change this behavior. For now, I use iptables to filter incoming connections from other hosts:
sudo iptables -A INPUT -p tcp --dport 4242 -s 127.0.0.0/8 -j ACCEPT sudo iptables -A INPUT -p tcp --dport 4242 -j REJECT sudo ip6tables -A INPUT -p tcp --dport 4242 -s ::1 -j ACCEPT sudo ip6tables -A INPUT -p tcp --dport 4242 -j REJECT sudo iptables-save | sudo tee /etc/iptables/rules.v4 sudo ip6tables-save | sudo tee /etc/iptables/rules.v6
The last step, if you use Firefox, is to use Smart
be able to reach your local dump efficiently. To do this, for every of the dumps
that you have on
localhost:4242, right-click on the search text field and add
a keyword for it. As I use Firefox
Sync, those bookmarks are
synchronized across my different machines.
These smart keywords work for full-text search in the dumps. If you want to have
bookmarks that directly reach articles and will never perform full-text search
(because it is faster), you can edit the bookmarks in Firefox to set "Location"
to, e.g., "
http://localhost:23552/wen/A/%s.html". However, this technique does
not work with all dumps (it depends on the URL structure), and in this case you
must remember that the first letter of the search term must be uppercase.
In terms of formatting, the Kiwix dumps are fairly OK. There are occasional
glitches (e.g., with
<span>) but not many of them, and certainly a lot less
than with Aard Dictionary. Equations are supported, and images are there if you
pick the right dumps. Most templates, formatting, etc., is fine. The HTML
interface added by
kiwix-serve is not perfect but it's mostly unobtrusive.