Downloading All of Wikipedia
March 25, 2022Why?
I recently came across this article explaining why people in Russia are downloading copies of Wikipedia “just in-case”.
I thought it might be a good idea to capture all of the information on Wikipedia because... gestures broadly around me, sort of like a digital prepper.
Also, this is more of an exercise in “why not?” than anything else. I’ll probably delete the 87GB when this 14TB drive gets close to running out of space.
Try It Out:
Go to https://wikipedia.michaellunzer.com to test out what it's like browsing my offline copy of wikipedia.
How:
This project leaned heavily on this webpage containing the docker-compose file instructing how to easily spin up the Kiwix server in a container:
https://thehomelab.wiki/books/docker/page/setup-and-install-kiwix-serve-on-debian-systems
The Homelab Wiki does a great job explaining each .zim file's contents:
What do mini, nopic and maxi mean in the Wikipedia zim files?
File size is always an issue when downloading such big content, so Kiwix produces each Wikipedia file in three flavours:
- Mini: only the introduction of each article, plus the infobox. Saves about 95% of space vs. the full version.
- nopic: full articles, but no images. About 75% smaller than the full version
- Maxi: the default full version.
- CD into your mapped data directory for the Docker container to download the zim files using wget.
- Add the name of each zim file to your stack.
- Start or deploy the docker stack and enjoy!
Zim Files:
All of wikipedia is contained in a singular .zim file. Check them out here:
http://download.kiwix.org/zim/wikipedia/
Please note that the Zim file that contains all of wikipedia was last updated in December 2021. I’m not sure of the update cadence but I’ll check it in a few months and update my file accordingly. I’m also unaware if there is a way to capture incremental changes rather than downloading an incremental change file.
Docker Compose
version: "3.9"
services:
kiwix-serve:
image: kiwix/kiwix-serve
volumes:
- /srv/dev-disk-by-label-Atlantic14TB/wikipedia:/data
ports:
- '8411:80'
command:
wikipedia_en_top_mini_2022-02.zim
wikipedia_en_all_maxi_2021-12.zim