MimeTypeBasedParser#

class langchain_community.document_loaders.parsers.generic.MimeTypeBasedParser(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: BaseBlobParser | None = None)[source]#

Parser that uses mime-types to parse a blob.

This parser is useful for simple pipelines where the mime-type is sufficient to determine how to parse a blob.

To use, configure handlers based on mime-types and pass them to the initializer.

Example

from langchain_community.document_loaders.parsers.generic import MimeTypeBasedParser

parser = MimeTypeBasedParser(
    handlers={
        "application/pdf": ...,
    },
    fallback_parser=...,
)

Define a parser that uses mime-types to determine how to parse a blob.

Parameters:
  • handlers (Mapping[str, BaseBlobParser]) – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser (BaseBlobParser | None) – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

Methods

__init__(handlers,Β *[,Β fallback_parser])

Define a parser that uses mime-types to determine how to parse a blob.

lazy_parse(blob)

Load documents from a blob.

parse(blob)

Eagerly parse the blob into a document or documents.

__init__(handlers: Mapping[str, BaseBlobParser], *, fallback_parser: BaseBlobParser | None = None) β†’ None[source]#

Define a parser that uses mime-types to determine how to parse a blob.

Parameters:
  • handlers (Mapping[str, BaseBlobParser]) – A mapping from mime-types to functions that take a blob, parse it and return a document.

  • fallback_parser (BaseBlobParser | None) – A fallback_parser parser to use if the mime-type is not found in the handlers. If provided, this parser will be used to parse blobs with all mime-types not found in the handlers. If not provided, a ValueError will be raised if the mime-type is not found in the handlers.

Return type:

None

lazy_parse(blob: Blob) β†’ Iterator[Document][source]#

Load documents from a blob.

Parameters:

blob (Blob) –

Return type:

Iterator[Document]

parse(blob: Blob) β†’ List[Document]#

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters:

blob (Blob) – Blob instance

Returns:

List of documents

Return type:

List[Document]