GrobidParser#

class langchain_community.document_loaders.parsers.grobid.GrobidParser(segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument')[source]#

Load article PDF files using Grobid.

Methods

`__init__`(segment_sentences[, grobid_server])
`lazy_parse`(blob)	Lazy parsing interface.
`parse`(blob)	Eagerly parse the blob into a document or documents.
`process_xml`(file_path, xml_data, ...)	Process the XML file from Grobin.

Parameters:

segment_sentences (bool) –
grobid_server (str) –

__init__(segment_sentences: bool, grobid_server: str = 'http://localhost:8070/api/processFulltextDocument') → None[source]#

Parameters:

segment_sentences (bool) –
grobid_server (str) –

Return type:

None

lazy_parse(blob: Blob) → Iterator[Document][source]#

Lazy parsing interface.

Subclasses are required to implement this method.

Parameters:: blob (Blob) – Blob instance
Returns:: Generator of documents
Return type:: Iterator[Document]

parse(blob: Blob) → List[Document]#

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters:: blob (Blob) – Blob instance
Returns:: List of documents
Return type:: List[Document]

process_xml(file_path: str, xml_data: str, segment_sentences: bool) → Iterator[Document][source]#

Process the XML file from Grobin.

Parameters:

file_path (str) –
xml_data (str) –
segment_sentences (bool) –

Return type:

Iterator[Document]

Examples using GrobidParser

Grobid