document_transformers
#
Document Transformers are classes to transform Documents.
Document Transformers usually used to transform a lot of Documents in a single run.
Class hierarchy:
BaseDocumentTransformer --> <name> # Examples: DoctranQATransformer, DoctranTextTranslator
Main helpers:
Document
Classes
|
Transform HTML content by extracting specific tags and removing unwanted ones. |
|
Extract properties from text documents using doctran. |
|
Extract QA from text documents using doctran. |
|
Translate text documents using doctran. |
|
Perform K-means clustering on document vectors. |
|
Filter that drops redundant documents by comparing their embeddings. |
Replace occurrences of a particular search pattern with a replacement string |
|
|
Reorder long context. |
|
Converts HTML documents to Markdown format with customizable options for handling links, images, other tags and heading styles using the markdownify library. |
|
Nuclia Text Transformer. |
Extract metadata tags from document contents using OpenAI functions. |
Functions
|
Get all navigable strings from a BeautifulSoup element. |
|
Convert a list of documents to a list of documents with state. |
|
Create a DocumentTransformer that uses an OpenAI function chain to automatically |
Deprecated classes
|
Deprecated since version 0.0.32: Use |