ArxivLoader#
- class langchain_community.document_loaders.arxiv.ArxivLoader(
- query: str,
- doc_content_chars_max: int | None = None,
- **kwargs: Any,
Load a query result from Arxiv. The loader converts the original PDF format into the text.
- Setup:
Install
arxivandPyMuPDFpackages.PyMuPDFtransforms PDF files downloaded from the arxiv.org site into the text format.pip install -U arxiv pymupdf
- Instantiate:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning", # load_max_docs=2, # load_all_available_meta=False )
- Load:
docs = loader.load() print(docs[0].page_content[:100]) print(docs[0].metadata)
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Async load:
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggre { 'Published': '2024-02-29', 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang', 'Summary': 'Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning...' }
- Use summaries of articles as docs:
from langchain_community.document_loaders import ArxivLoader loader = ArxivLoader( query="reasoning" ) docs = loader.get_summaries_as_docs() print(docs[0].page_content[:100]) print(docs[0].metadata)
Pre-trained language models (LMs) are able to perform complex reasoning without explicit fine-tuning { 'Entry ID': 'http://arxiv.org/abs/2402.03268v2', 'Published': datetime.date(2024, 2, 29), 'Title': 'Understanding the Reasoning Ability of Language Models From the Perspective of Reasoning Paths Aggregation', 'Authors': 'Xinyi Wang, Alfonso Amayuelas, Kexun Zhang, Liangming Pan, Wenhu Chen, William Yang Wang' }
Initialize with search query to find documents in the Arxiv. Supports all arguments of ArxivAPIWrapper.
- Parameters:
query (str) β free text which used to find documents in the Arxiv
doc_content_chars_max (int | None) β cut limit for the length of a documentβs content
kwargs (Any)
Methods
__init__(query[,Β doc_content_chars_max])Initialize with search query to find documents in the Arxiv.
A lazy loader for Documents.
aload()Load data into Document objects.
Uses papers summaries as documents rather than source Arvix papers
Lazy load Arvix documents
load()Load data into Document objects.
load_and_split([text_splitter])Load Documents and split into chunks.
- __init__(
- query: str,
- doc_content_chars_max: int | None = None,
- **kwargs: Any,
Initialize with search query to find documents in the Arxiv. Supports all arguments of ArxivAPIWrapper.
- Parameters:
query (str) β free text which used to find documents in the Arxiv
doc_content_chars_max (int | None) β cut limit for the length of a documentβs content
kwargs (Any)
- async alazy_load() AsyncIterator[Document]#
A lazy loader for Documents.
- Yields:
the documents.
- Return type:
AsyncIterator[Document]
- async aload() list[Document]#
Load data into Document objects.
- Returns:
the documents.
- Return type:
list[Document]
- get_summaries_as_docs() List[Document][source]#
Uses papers summaries as documents rather than source Arvix papers
- Return type:
List[Document]
- load() list[Document]#
Load data into Document objects.
- Returns:
the documents.
- Return type:
list[Document]
- load_and_split(
- text_splitter: TextSplitter | None = None,
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) β TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Raises:
ImportError β If langchain-text-splitters is not installed and no text_splitter is provided.
- Returns:
List of Documents.
- Return type:
list[Document]
Examples using ArxivLoader