GrobidParser#

class langchain_community.document_loaders.parsers.grobid.GrobidParser( segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument', )[source]#

Load article PDF files using Grobid.

Methods

`__init__`(segment_sentences[, grobid_server])
`lazy_parse`(blob)	Lazy parsing interface.
`parse`(blob)	Eagerly parse the blob into a document or documents.
`process_xml`(file_path, xml_data, ...)	Process the XML file from Grobin.

Parameters:

segment_sentences (bool)
grobid_server (str)

__init__( segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument', ) → None[source]#

Parameters:

segment_sentences (bool)
grobid_server (str)

Return type:

None

lazy_parse( blob: Blob, ) → Iterator[Document][source]#

Lazy parsing interface.

Subclasses are required to implement this method.

Parameters:: blob (Blob) – Blob instance
Returns:: Generator of documents
Return type:: Iterator[Document]

parse(blob: Blob) → list[Document]#

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters:: blob (Blob) – Blob instance
Returns:: List of documents
Return type:: list[Document]

process_xml( file_path: str, xml_data: str, segment_sentences: bool, ) → Iterator[Document][source]#

Process the XML file from Grobin.

Parameters:

file_path (str)
xml_data (str)
segment_sentences (bool)

Return type:

Iterator[Document]

Examples using GrobidParser

Grobid