EmbeddingsRedundantFilter#

class langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsRedundantFilter[source]#

Bases: BaseDocumentTransformer, BaseModel

Filter that drops redundant documents by comparing their embeddings.

Create a new model by parsing and validating input data from keyword arguments.

Raises ValidationError if the input data cannot be parsed to form a valid model.

param embeddings: Embeddings [Required]#

Embeddings to use for embedding document contents.

param similarity_fn: Callable = <function cosine_similarity>#

Similarity function for comparing documents. Function expected to take as input two matrices (List[List[float]]) and return a matrix of scores where higher values indicate greater similarity.

param similarity_threshold: float = 0.95#

Threshold for determining when two documents are similar enough to be considered redundant.

async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document]#

Asynchronously transform a list of documents.

Parameters:
  • documents (Sequence[Document]) – A sequence of Documents to be transformed.

  • kwargs (Any) –

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

transform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document][source]#

Filter down documents.

Parameters:
  • documents (Sequence[Document]) –

  • kwargs (Any) –

Return type:

Sequence[Document]

Examples using EmbeddingsRedundantFilter