OpenAIMetadataTagger#

class langchain_community.document_transformers.openai_functions.OpenAIMetadataTagger[source]#

Bases: BaseDocumentTransformer, BaseModel

Extract metadata tags from document contents using OpenAI functions.

Example:

from langchain_community.chat_models import ChatOpenAI
from langchain_community.document_transformers import OpenAIMetadataTagger
from langchain_core.documents import Document

schema = {
    "properties": {
        "movie_title": { "type": "string" },
        "critic": { "type": "string" },
        "tone": {
            "type": "string",
            "enum": ["positive", "negative"]
        },
        "rating": {
            "type": "integer",
            "description": "The number of stars the critic rated the movie"
        }
    },
    "required": ["movie_title", "critic", "tone"]
}

# Must be an OpenAI model that supports functions
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
tagging_chain = create_tagging_chain(schema, llm)
document_transformer = OpenAIMetadataTagger(tagging_chain=tagging_chain)
original_documents = [
    Document(page_content="Review of The Bee Movie

By Roger Ebert

This is the greatest movie ever made. 4 out of 5 stars.”),: Document(page_content=”Review of The Godfather

By Anonymous

This movie was super boring. 1 out of 5 stars.”, metadata={“reliable”: False}),

]

enhanced_documents = document_transformer.transform_documents(original_documents)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param tagging_chain: Any [Required]#: The chain used to extract metadata from each document.

async atransform_documents(

documents: Sequence[Document],

**kwargs: Any,

) → Sequence[Document][source]#

Asynchronously transform a list of documents.

Parameters:

documents (Sequence[Document]) – A sequence of Documents to be transformed.
kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

transform_documents(

documents: Sequence[Document],

**kwargs: Any,

) → Sequence[Document][source]#

Automatically extract and populate metadata for each document according to the provided schema.

Parameters:

documents (Sequence[Document])
kwargs (Any)

Return type:

Sequence[Document]