BibtexLoader#

class langchain_community.document_loaders.bibtex.BibtexLoader( file_path: str, *, parser: BibtexparserWrapper | None = None, max_docs: int | None = None, max_content_chars: int | None = 4000, load_extra_metadata: bool = False, file_pattern: str = '[^:]+\\.pdf', )[source]#

Load a bibtex file.

Each document represents one entry from the bibtex file.

If a PDF file is present in the file bibtex field, the original PDF is loaded into the document text. If no such file entry is present, the abstract field is used instead.

Initialize the BibtexLoader.

Parameters:

file_path (str) – Path to the bibtex file.
parser (BibtexparserWrapper | None) – The parser to use. If None, a default parser is used.
max_docs (int | None) – Max number of associated documents to load. Use -1 means no limit.
max_content_chars (int | None) – Maximum number of characters to load from the PDF.
load_extra_metadata (bool) – Whether to load extra metadata from the PDF.
file_pattern (str) – Regex pattern to match the file name in the bibtex.

Methods

`__init__`(file_path, *[, parser, max_docs, ...])	Initialize the BibtexLoader.
`alazy_load`()	A lazy loader for Documents.
`aload`()	Load data into Document objects.
`lazy_load`()	Load bibtex file using bibtexparser and get the article texts plus the article metadata.
`load`()	Load data into Document objects.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

__init__( file_path: str, *, parser: BibtexparserWrapper | None = None, max_docs: int | None = None, max_content_chars: int | None = 4000, load_extra_metadata: bool = False, file_pattern: str = '[^:]+\\.pdf', )[source]#

Initialize the BibtexLoader.

Parameters:

file_path (str) – Path to the bibtex file.
parser (BibtexparserWrapper | None) – The parser to use. If None, a default parser is used.
max_docs (int | None) – Max number of associated documents to load. Use -1 means no limit.
max_content_chars (int | None) – Maximum number of characters to load from the PDF.
load_extra_metadata (bool) – Whether to load extra metadata from the PDF.
file_pattern (str) – Regex pattern to match the file name in the bibtex.

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

Yields:: the documents.
Return type:: AsyncIterator[Document]

async aload() → list[Document]#

Load data into Document objects.

Returns:: the documents.
Return type:: list[Document]

lazy_load() → Iterator[Document][source]#

Load bibtex file using bibtexparser and get the article texts plus the article metadata. See https://bibtexparser.readthedocs.io/en/master/

Returns:: a list of documents with the document.page_content in text format
Return type:: Iterator[Document]

load() → list[Document]#

Load data into Document objects.

Returns:: the documents.
Return type:: list[Document]

load_and_split( text_splitter: TextSplitter | None = None, ) → list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Raises:: ImportError – If langchain-text-splitters is not installed and no text_splitter is provided.
Returns:: List of Documents.
Return type:: list[Document]

Examples using BibtexLoader

BibTeX