ArxivLoader#
- class langchain_community.document_loaders.arxiv.ArxivLoader(query: str, doc_content_chars_max: int | None = None, **kwargs: Any)[source]#
Load a query result from Arxiv. The loader converts the original PDF format into the text.
- Setup:
Install
arxiv
andPyMuPDF
packages.PyMuPDF
transforms PDF files downloaded from the arxiv.org site into the text format.pip install -U arxiv pymupdf
- Instantiate:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning", # load_max_docs=2, # load_all_available_meta=False )
- Load:
docs = loader.load() print(docs[0].page_content[:100]) print(docs[0].metadata)
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Async load:
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Use summaries of articles as docs:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning" ) docs = loader.get_summaries_as_docs() print(docs[0].page_content[:100]) print(docs[0].metadata)
Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning { 'Entry ID': 'http://arxiv.org/abs/2402.03268v2', 'Published': datetime.date(2024, 2, 29), 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang' }
Initialize with search query to find documents in the Arxiv. Supports all arguments of ArxivAPIWrapper.
- Parameters:
query (str) β free text which used to find documents in the Arxiv
doc_content_chars_max (int | None) β cut limit for the length of a documentβs content
kwargs (Any) β
Methods
__init__
(query[,Β doc_content_chars_max])Initialize with search query to find documents in the Arxiv.
A lazy loader for Documents.
aload
()Load data into Document objects.
Uses papers summaries as documents rather than source Arvix papers
Lazy load Arvix documents
load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(query: str, doc_content_chars_max: int | None = None, **kwargs: Any)[source]#
Initialize with search query to find documents in the Arxiv. Supports all arguments of ArxivAPIWrapper.
- Parameters:
query (str) β free text which used to find documents in the Arxiv
doc_content_chars_max (int | None) β cut limit for the length of a documentβs content
kwargs (Any) β
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- get_summaries_as_docs() List[Document] [source]#
Uses papers summaries as documents rather than source Arvix papers
- Return type:
List[Document]
- load_and_split(text_splitter: TextSplitter | None = None) List[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) β TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
List[Document]
Examples using ArxivLoader