Wikipedia is a multilingual free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and using a wiki-based editing system called MediaWiki.
Wikipediais the largest and most-read reference work in history.
This notebook shows how to load wiki pages from
wikipedia.org into the Document format that we use downstream.
First, you need to install
wikipedia python package.
#!pip install wikipedia
WikipediaLoader has these arguments:
query: free text which used to find documents in Wikipedia
lang: default=”en”. Use it to search in a specific language part of Wikipedia
load_max_docs: default=100. Use it to limit number of downloaded documents. It takes time to download all 100 documents, so use a small number for experiments. There is a hard limit of 300 for now.
load_all_available_meta: default=False. By default only the most important fields downloaded:
Published(date when document was published/last updated),
Summary. If True, other fields also downloaded.
from langchain.document_loaders import WikipediaLoader
docs = WikipediaLoader(query='HUNTER X HUNTER', load_max_docs=2).load() len(docs)
docs.metadata # meta-information of the Document
docs.page_content[:400] # a content of the Document