EmbeddingsRedundantFilter#
- class langchain_community.document_transformers.embeddings_redundant_filter.EmbeddingsRedundantFilter[source]#
Bases:
BaseDocumentTransformer
,BaseModel
Filter that drops redundant documents by comparing their embeddings.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- param embeddings: Embeddings [Required]#
Embeddings to use for embedding document contents.
- param similarity_fn: Callable = <function cosine_similarity>#
Similarity function for comparing documents. Function expected to take as input two matrices (List[List[float]]) and return a matrix of scores where higher values indicate greater similarity.
- param similarity_threshold: float = 0.95#
Threshold for determining when two documents are similar enough to be considered redundant.
Examples using EmbeddingsRedundantFilter