EmbeddingsRedundantFilter#

class langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsRedundantFilter[source]#

Bases: BaseDocumentTransformer, BaseModel

Filter that drops redundant documents by comparing their embeddings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param embeddings: Embeddings [Required]#

Embeddings to use for embedding document contents.

param similarity_fn: Callable = <function cosine_similarity>#

Similarity function for comparing documents. Function expected to take as input two matrices (List[List[float]]) and return a matrix of scores where higher values indicate greater similarity.

param similarity_threshold: float = 0.95#

Threshold for determining when two documents are similar enough to be considered redundant.

async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document]#

Asynchronously transform a list of documents.

Parameters:
  • documents (Sequence[Document]) – A sequence of Documents to be transformed.

  • kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

transform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document][source]#

Filter down documents.

Parameters:
  • documents (Sequence[Document])

  • kwargs (Any)

Return type:

Sequence[Document]

Examples using EmbeddingsRedundantFilter