Skip to main content
Open on GitHub

Doctran

Doctran is a python package. It uses LLMs and open-source NLP libraries to transform raw text into clean, structured, information-dense documents that are optimized for vector space retrieval. You can think of Doctran as a black box where messy strings go in and nice, clean, labelled strings come out.

Installation and Setupโ€‹

pip install doctran

Document Transformersโ€‹

Document Interrogatorโ€‹

See a usage example for DoctranQATransformer.

from langchain_community.document_transformers import DoctranQATransformer
API Reference:DoctranQATransformer

Property Extractorโ€‹

See a usage example for DoctranPropertyExtractor.

from langchain_community.document_transformers import DoctranPropertyExtractor

Document Translatorโ€‹

See a usage example for DoctranTextTranslator.

from langchain_community.document_transformers import DoctranTextTranslator
API Reference:DoctranTextTranslator

Was this page helpful?