DocugamiLoader#

class langchain_community.document_loaders.docugami.DocugamiLoader[source]#

Bases: BaseLoader, BaseModel

Deprecated since version 0.0.24: Use :class:`~docugami_langchain.DocugamiLoader` instead. It will not be removed until langchain-community==1.0.

Load from Docugami.

To use, you should have the dgml-utils python package installed.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param access_token: str | None = None#: The Docugami API access token to use.

param api: str = 'https://api.docugami.com/v1preview1'#: The Docugami API endpoint to use.

param document_ids: Sequence[str] | None = None#: The Docugami API document IDs to use.

param file_paths: Sequence[Path | str] | None [Required]#: The local file paths to use.

param include_project_metadata_in_doc_metadata: bool = True#: Set to True if you want to include the project metadata in the doc metadata.

param include_xml_tags: bool = False#: Set to true for XML tags in chunk output text.

param max_metadata_length: int = 512#: Max length of metadata text returned.

param min_text_length: int = 32#: Threshold under which chunks are appended to next to avoid over-chunking.

param parent_hierarchy_levels: int = 0#: Set appropriately to get parent chunks using the chunk hierarchy.

param sub_chunk_tables: bool = False#: Set to True to return sub-chunks within tables.

param whitespace_normalize_text: bool = True#: Set to False if you want to full whitespace formatting in the original XML doc, including indentation.

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

async aload() → list[Document]#

Load data into Document objects.

lazy_load() → Iterator[Document]#

A lazy loader for Documents.

Load documents.

load_and_split( text_splitter: TextSplitter | None = None, ) → list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns:: List of Documents.
Return type:: list[Document]