Skip to main content


MediaWiki XML Dumps contain the content of a wiki (wiki pages with all their revisions), without the site-related data. A XML dump does not create a full backup of the wiki database, the dump does not contain user accounts, images, edit logs, etc.

Installation and Setup​

We need to install several python packages.

The mediawiki-utilities supports XML schema 0.11 in unmerged branches.

pip install -qU git+

The mediawiki-utilities mwxml has a bug, fix PR pending.

pip install -qU git+
pip install -qU mwparserfromhell

Document Loader​

See a usage example.

from langchain_community.document_loaders import MWDumpLoader