Vector Stores#
Wrappers on top of vector stores.
- class langchain.vectorstores.AnalyticDB(connection_string: str, embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'langchain', collection_metadata: Optional[dict] = None, pre_delete_collection: bool = False, logger: Optional[logging.Logger] = None)[source]#
VectorStore implementation using AnalyticDB. AnalyticDB is a distributed full PostgresSQL syntax cloud-native database. - connection_string is a postgres connection string. - embedding_function any embedding function implementing
langchain.embeddings.base.Embeddings interface.
- collection_name is the name of the collection to use. (default: langchain)
- NOTE: This is not the name of the table, but the name of the collection.
The tables will be created when initializing the store (if not exists) So, make sure the user has the right permissions to create tables.
- pre_delete_collection if True, will delete the collection if it exists.
(default: False) - Useful for testing.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod connection_string_from_db_params(driver: str, host: str, port: int, database: str, user: str, password: str) str [source]#
Return connection string from database parameters.
- classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, collection_name: str = 'langchain', ids: Optional[List[str]] = None, pre_delete_collection: bool = False, **kwargs: Any) langchain.vectorstores.analyticdb.AnalyticDB [source]#
Return VectorStore initialized from documents and embeddings. Postgres connection string is required Either pass it as a parameter or set the PGVECTOR_CONNECTION_STRING environment variable.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'langchain', ids: Optional[List[str]] = None, pre_delete_collection: bool = False, **kwargs: Any) langchain.vectorstores.analyticdb.AnalyticDB [source]#
Return VectorStore initialized from texts and embeddings. Postgres connection string is required Either pass it as a parameter or set the PGVECTOR_CONNECTION_STRING environment variable.
- get_collection(session: sqlalchemy.orm.session.Session) Optional[langchain.vectorstores.analyticdb.CollectionStore] [source]#
- similarity_search(query: str, k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Run similarity search with AnalyticDB with distance.
- Parameters
query (str) – Query text to search for.
k (int) – Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of Documents most similar to the query.
- similarity_search_by_vector(embedding: List[float], k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of Documents most similar to the query vector.
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.Annoy(embedding_function: Callable, index: Any, metric: str, docstore: langchain.docstore.base.Docstore, index_to_docstore_id: Dict[int, str])[source]#
Wrapper around Annoy vector database.
To use, you should have the
annoy
python package installed.Example
from langchain import Annoy db = Annoy(embedding_function, index, docstore, index_to_docstore_id)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_embeddings(text_embeddings: List[Tuple[str, List[float]]], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, metric: str = 'angular', trees: int = 100, n_jobs: int = - 1, **kwargs: Any) langchain.vectorstores.annoy.Annoy [source]#
Construct Annoy wrapper from embeddings.
- Parameters
text_embeddings – List of tuples of (text, embedding)
embedding – Embedding function to use.
metadatas – List of metadata dictionaries to associate with documents.
metric – Metric to use for indexing. Defaults to “angular”.
trees – Number of trees to use for indexing. Defaults to 100.
n_jobs – Number of jobs to use for indexing. Defaults to -1
- This is a user friendly interface that:
Creates an in memory docstore with provided embeddings
Initializes the Annoy database
This is intended to be a quick way to get started.
Example
from langchain import Annoy from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() text_embeddings = embeddings.embed_documents(texts) text_embedding_pairs = list(zip(texts, text_embeddings)) db = Annoy.from_embeddings(text_embedding_pairs, embeddings)
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, metric: str = 'angular', trees: int = 100, n_jobs: int = - 1, **kwargs: Any) langchain.vectorstores.annoy.Annoy [source]#
Construct Annoy wrapper from raw documents.
- Parameters
texts – List of documents to index.
embedding – Embedding function to use.
metadatas – List of metadata dictionaries to associate with documents.
metric – Metric to use for indexing. Defaults to “angular”.
trees – Number of trees to use for indexing. Defaults to 100.
n_jobs – Number of jobs to use for indexing. Defaults to -1.
- This is a user friendly interface that:
Embeds documents.
Creates an in memory docstore
Initializes the Annoy database
This is intended to be a quick way to get started.
Example
from langchain import Annoy from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() index = Annoy.from_texts(texts, embeddings)
- classmethod load_local(folder_path: str, embeddings: langchain.embeddings.base.Embeddings) langchain.vectorstores.annoy.Annoy [source]#
Load Annoy index, docstore, and index_to_docstore_id to disk.
- Parameters
folder_path – folder path to load index, docstore, and index_to_docstore_id from.
embeddings – Embeddings to use when generating queries.
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
k – Number of Documents to return. Defaults to 4.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- process_index_results(idxs: List[int], dists: List[float]) List[Tuple[langchain.schema.Document, float]] [source]#
Turns annoy results into a list of documents and scores.
- Parameters
idxs – List of indices of the documents in the index.
dists – List of distances of the documents in the index.
- Returns
List of Documents and scores.
- save_local(folder_path: str, prefault: bool = False) None [source]#
Save Annoy index, docstore, and index_to_docstore_id to disk.
- Parameters
folder_path – folder path to save index, docstore, and index_to_docstore_id to.
prefault – Whether to pre-load the index into memory.
- similarity_search(query: str, k: int = 4, search_k: int = - 1, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the query.
- similarity_search_by_index(docstore_index: int, k: int = 4, search_k: int = - 1, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to docstore_index.
- Parameters
docstore_index – Index of document in docstore
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the embedding.
- similarity_search_by_vector(embedding: List[float], k: int = 4, search_k: int = - 1, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the embedding.
- similarity_search_with_score(query: str, k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the query and score for each
- similarity_search_with_score_by_index(docstore_index: int, k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the query and score for each
- similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.AtlasDB(name: str, embedding_function: Optional[langchain.embeddings.base.Embeddings] = None, api_key: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False)[source]#
Wrapper around Atlas: Nomic’s neural database and rhizomatic instrument.
To use, you should have the
nomic
python package installed.Example
from langchain.vectorstores import AtlasDB from langchain.embeddings.openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings() vectorstore = AtlasDB("my_project", embeddings.embed_query)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, refresh: bool = True, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts (Iterable[str]) – Texts to add to the vectorstore.
metadatas (Optional[List[dict]], optional) – Optional list of metadatas.
ids (Optional[List[str]]) – An optional list of ids.
refresh (bool) – Whether or not to refresh indices with the updated data. Default True.
- Returns
List of IDs of the added texts.
- Return type
List[str]
- create_index(**kwargs: Any) Any [source]#
Creates an index in your project.
See https://docs.nomic.ai/atlas_api.html#nomic.project.AtlasProject.create_index for full detail.
- classmethod from_documents(documents: List[langchain.schema.Document], embedding: Optional[langchain.embeddings.base.Embeddings] = None, ids: Optional[List[str]] = None, name: Optional[str] = None, api_key: Optional[str] = None, persist_directory: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False, index_kwargs: Optional[dict] = None, **kwargs: Any) langchain.vectorstores.atlas.AtlasDB [source]#
Create an AtlasDB vectorstore from a list of documents.
- Parameters
name (str) – Name of the collection to create.
api_key (str) – Your nomic API key,
documents (List[Document]) – List of documents to add to the vectorstore.
embedding (Optional[Embeddings]) – Embedding function. Defaults to None.
ids (Optional[List[str]]) – Optional list of document IDs. If None, ids will be auto created
description (str) – A description for your project.
is_public (bool) – Whether your project is publicly accessible. True by default.
reset_project_if_exists (bool) – Whether to reset this project if it already exists. Default False. Generally userful during development and testing.
index_kwargs (Optional[dict]) – Dict of kwargs for index creation. See https://docs.nomic.ai/atlas_api.html
- Returns
Nomic’s neural database and finest rhizomatic instrument
- Return type
- classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, name: Optional[str] = None, api_key: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False, index_kwargs: Optional[dict] = None, **kwargs: Any) langchain.vectorstores.atlas.AtlasDB [source]#
Create an AtlasDB vectorstore from a raw documents.
- Parameters
texts (List[str]) – The list of texts to ingest.
name (str) – Name of the project to create.
api_key (str) – Your nomic API key,
embedding (Optional[Embeddings]) – Embedding function. Defaults to None.
metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.
ids (Optional[List[str]]) – Optional list of document IDs. If None, ids will be auto created
description (str) – A description for your project.
is_public (bool) – Whether your project is publicly accessible. True by default.
reset_project_if_exists (bool) – Whether to reset this project if it already exists. Default False. Generally userful during development and testing.
index_kwargs (Optional[dict]) – Dict of kwargs for index creation. See https://docs.nomic.ai/atlas_api.html
- Returns
Nomic’s neural database and finest rhizomatic instrument
- Return type
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Run similarity search with AtlasDB
- Parameters
query (str) – Query text to search for.
k (int) – Number of results to return. Defaults to 4.
- Returns
List of documents most similar to the query text.
- Return type
List[Document]
- class langchain.vectorstores.Chroma(collection_name: str = 'langchain', embedding_function: Optional[Embeddings] = None, persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, collection_metadata: Optional[Dict] = None, client: Optional[chromadb.Client] = None)[source]#
Wrapper around ChromaDB embeddings platform.
To use, you should have the
chromadb
python package installed.Example
from langchain.vectorstores import Chroma from langchain.embeddings.openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings() vectorstore = Chroma("langchain_store", embeddings)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts (Iterable[str]) – Texts to add to the vectorstore.
metadatas (Optional[List[dict]], optional) – Optional list of metadatas.
ids (Optional[List[str]], optional) – Optional list of IDs.
- Returns
List of IDs of the added texts.
- Return type
List[str]
- classmethod from_documents(documents: List[Document], embedding: Optional[Embeddings] = None, ids: Optional[List[str]] = None, collection_name: str = 'langchain', persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, client: Optional[chromadb.Client] = None, **kwargs: Any) Chroma [source]#
Create a Chroma vectorstore from a list of documents.
If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.
- Parameters
collection_name (str) – Name of the collection to create.
persist_directory (Optional[str]) – Directory to persist the collection.
ids (Optional[List[str]]) – List of document IDs. Defaults to None.
documents (List[Document]) – List of documents to add to the vectorstore.
embedding (Optional[Embeddings]) – Embedding function. Defaults to None.
client_settings (Optional[chromadb.config.Settings]) – Chroma client settings
- Returns
Chroma vectorstore.
- Return type
- classmethod from_texts(texts: List[str], embedding: Optional[Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, collection_name: str = 'langchain', persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, client: Optional[chromadb.Client] = None, **kwargs: Any) Chroma [source]#
Create a Chroma vectorstore from a raw documents.
If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.
- Parameters
texts (List[str]) – List of texts to add to the collection.
collection_name (str) – Name of the collection to create.
persist_directory (Optional[str]) – Directory to persist the collection.
embedding (Optional[Embeddings]) – Embedding function. Defaults to None.
metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.
ids (Optional[List[str]]) – List of document IDs. Defaults to None.
client_settings (Optional[chromadb.config.Settings]) – Chroma client settings
- Returns
Chroma vectorstore.
- Return type
- get(include: Optional[List[str]] = None) Dict[str, Any] [source]#
Gets the collection.
- Parameters
include (Optional[List[str]]) – List of fields to include from db. Defaults to None.
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of Documents selected by maximal marginal relevance.
- persist() None [source]#
Persist the collection.
This can be used to explicitly persist the data to disk. It will also be called automatically when the object is destroyed.
- similarity_search(query: str, k: int = 4, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Run similarity search with Chroma.
- Parameters
query (str) – Query text to search for.
k (int) – Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of documents most similar to the query text.
- Return type
List[Document]
- similarity_search_by_vector(embedding: List[float], k: int = 4, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector. :param embedding: Embedding to look up documents similar to. :type embedding: str :param k: Number of Documents to return. Defaults to 4. :type k: int :param filter: Filter by metadata. Defaults to None. :type filter: Optional[Dict[str, str]]
- Returns
List of Documents most similar to the query vector.
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Run similarity search with Chroma with distance.
- Parameters
query (str) – Query text to search for.
k (int) – Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.
- Return type
List[Tuple[Document, float]]
- class langchain.vectorstores.Clickhouse(embedding: langchain.embeddings.base.Embeddings, config: Optional[langchain.vectorstores.clickhouse.ClickhouseSettings] = None, **kwargs: Any)[source]#
Wrapper around ClickHouse vector database
You need a clickhouse-connect python package, and a valid account to connect to ClickHouse.
ClickHouse can not only search with simple vector indexes, it also supports complex query with multiple conditions, constraints and even sub-queries.
- For more information, please visit
[ClickHouse official site](https://clickhouse.com/clickhouse)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, batch_size: int = 32, ids: Optional[Iterable[str]] = None, **kwargs: Any) List[str] [source]#
Insert more texts through the embeddings and add to the VectorStore.
- Parameters
texts – Iterable of strings to add to the VectorStore.
ids – Optional list of ids to associate with the texts.
batch_size – Batch size of insertion
metadata – Optional column data to be inserted
- Returns
List of ids from adding the texts into the VectorStore.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, config: Optional[langchain.vectorstores.clickhouse.ClickhouseSettings] = None, text_ids: Optional[Iterable[str]] = None, batch_size: int = 32, **kwargs: Any) langchain.vectorstores.clickhouse.Clickhouse [source]#
Create ClickHouse wrapper with existing texts
- Parameters
embedding_function (Embeddings) – Function to extract text embedding
texts (Iterable[str]) – List or tuple of strings to be added
config (ClickHouseSettings, Optional) – ClickHouse configuration
text_ids (Optional[Iterable], optional) – IDs for the texts. Defaults to None.
batch_size (int, optional) – Batchsize when transmitting data to ClickHouse. Defaults to 32.
metadata (List[dict], optional) – metadata to texts. Defaults to None.
into (Other keyword arguments will pass) – [clickhouse-connect](https://clickhouse.com/docs/en/integrations/python#clickhouse-connect-driver-api)
- Returns
ClickHouse Index
- property metadata_column: str#
- similarity_search(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search with ClickHouse
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of Documents
- Return type
List[Document]
- similarity_search_by_vector(embedding: List[float], k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search with ClickHouse by vectors
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of (Document, similarity)
- Return type
List[Document]
- similarity_search_with_relevance_scores(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Perform a similarity search with ClickHouse
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of documents
- Return type
List[Document]
- pydantic settings langchain.vectorstores.ClickhouseSettings[source]#
ClickHouse Client Configuration
- Attribute:
- clickhouse_host (str)An URL to connect to MyScale backend.
Defaults to ‘localhost’.
clickhouse_port (int) : URL port to connect with HTTP. Defaults to 8443. username (str) : Username to login. Defaults to None. password (str) : Password to login. Defaults to None. index_type (str): index type string. index_param (list): index build parameter. index_query_params(dict): index query parameters. database (str) : Database name to find the table. Defaults to ‘default’. table (str) : Table name to operate on.
Defaults to ‘vector_table’.
- metric (str)Metric to compute distance,
supported are (‘angular’, ‘euclidean’, ‘manhattan’, ‘hamming’, ‘dot’). Defaults to ‘angular’. spotify/annoy
- column_map (Dict)Column type map to project column name onto langchain
semantics. Must have keys: text, id, vector, must be same size to number of columns. For example: .. code-block:: python
- {
‘id’: ‘text_id’, ‘uuid’: ‘global_unique_id’ ‘embedding’: ‘text_embedding’, ‘document’: ‘text_plain’, ‘metadata’: ‘metadata_dictionary_in_json’,
}
Defaults to identity map.
Show JSON schema
{ "title": "ClickhouseSettings", "description": "ClickHouse Client Configuration\n\nAttribute:\n clickhouse_host (str) : An URL to connect to MyScale backend.\n Defaults to 'localhost'.\n clickhouse_port (int) : URL port to connect with HTTP. Defaults to 8443.\n username (str) : Username to login. Defaults to None.\n password (str) : Password to login. Defaults to None.\n index_type (str): index type string.\n index_param (list): index build parameter.\n index_query_params(dict): index query parameters.\n database (str) : Database name to find the table. Defaults to 'default'.\n table (str) : Table name to operate on.\n Defaults to 'vector_table'.\n metric (str) : Metric to compute distance,\n supported are ('angular', 'euclidean', 'manhattan', 'hamming',\n 'dot'). Defaults to 'angular'.\n https://github.com/spotify/annoy/blob/main/src/annoymodule.cc#L149-L169\n\n column_map (Dict) : Column type map to project column name onto langchain\n semantics. Must have keys: `text`, `id`, `vector`,\n must be same size to number of columns. For example:\n .. code-block:: python\n\n {\n 'id': 'text_id',\n 'uuid': 'global_unique_id'\n 'embedding': 'text_embedding',\n 'document': 'text_plain',\n 'metadata': 'metadata_dictionary_in_json',\n }\n\n Defaults to identity map.", "type": "object", "properties": { "host": { "title": "Host", "default": "localhost", "env_names": "{'clickhouse_host'}", "type": "string" }, "port": { "title": "Port", "default": 8123, "env_names": "{'clickhouse_port'}", "type": "integer" }, "username": { "title": "Username", "env_names": "{'clickhouse_username'}", "type": "string" }, "password": { "title": "Password", "env_names": "{'clickhouse_password'}", "type": "string" }, "index_type": { "title": "Index Type", "default": "annoy", "env_names": "{'clickhouse_index_type'}", "type": "string" }, "index_param": { "title": "Index Param", "default": [ 100, "'L2Distance'" ], "env_names": "{'clickhouse_index_param'}", "anyOf": [ { "type": "array", "items": {} }, { "type": "object" } ] }, "index_query_params": { "title": "Index Query Params", "default": {}, "env_names": "{'clickhouse_index_query_params'}", "type": "object", "additionalProperties": { "type": "string" } }, "column_map": { "title": "Column Map", "default": { "id": "id", "uuid": "uuid", "document": "document", "embedding": "embedding", "metadata": "metadata" }, "env_names": "{'clickhouse_column_map'}", "type": "object", "additionalProperties": { "type": "string" } }, "database": { "title": "Database", "default": "default", "env_names": "{'clickhouse_database'}", "type": "string" }, "table": { "title": "Table", "default": "langchain", "env_names": "{'clickhouse_table'}", "type": "string" }, "metric": { "title": "Metric", "default": "angular", "env_names": "{'clickhouse_metric'}", "type": "string" } }, "additionalProperties": false }
- Config
env_file: str = .env
env_file_encoding: str = utf-8
env_prefix: str = clickhouse_
- Fields
column_map (Dict[str, str])
database (str)
host (str)
index_param (Optional[Union[List, Dict]])
index_query_params (Dict[str, str])
index_type (str)
metric (str)
password (Optional[str])
port (int)
table (str)
username (Optional[str])
- field column_map: Dict[str, str] = {'document': 'document', 'embedding': 'embedding', 'id': 'id', 'metadata': 'metadata', 'uuid': 'uuid'}#
- field database: str = 'default'#
- field host: str = 'localhost'#
- field index_param: Optional[Union[List, Dict]] = [100, "'L2Distance'"]#
- field index_query_params: Dict[str, str] = {}#
- field index_type: str = 'annoy'#
- field metric: str = 'angular'#
- field password: Optional[str] = None#
- field port: int = 8123#
- field table: str = 'langchain'#
- field username: Optional[str] = None#
- class langchain.vectorstores.DeepLake(dataset_path: str = './deeplake/', token: Optional[str] = None, embedding_function: Optional[langchain.embeddings.base.Embeddings] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, num_workers: int = 0, verbose: bool = True, **kwargs: Any)[source]#
Wrapper around Deep Lake, a data lake for deep learning applications.
We implement naive similarity search and filtering for fast prototyping, but it can be extended with Tensor Query Language (TQL) for production use cases over billion rows.
Why Deep Lake?
Not only stores embeddings, but also the original data with version control.
- Serverless, doesn’t require another service and can be used with major
cloud providers (S3, GCS, etc.)
- More than just a multi-modal vector store. You can use the dataset
to fine-tune your own LLM models.
To use, you should have the
deeplake
python package installed.Example
from langchain.vectorstores import DeepLake from langchain.embeddings.openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings() vectorstore = DeepLake("langchain_store", embeddings.embed_query)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts (Iterable[str]) – Texts to add to the vectorstore.
metadatas (Optional[List[dict]], optional) – Optional list of metadatas.
ids (Optional[List[str]], optional) – Optional list of IDs.
- Returns
List of IDs of the added texts.
- Return type
List[str]
- delete(ids: Any[List[str], None] = None, filter: Any[Dict[str, str], None] = None, delete_all: Any[bool, None] = None) bool [source]#
Delete the entities in the dataset
- Parameters
ids (Optional[List[str]], optional) – The document_ids to delete. Defaults to None.
filter (Optional[Dict[str, str]], optional) – The filter to delete by. Defaults to None.
delete_all (Optional[bool], optional) – Whether to drop the dataset. Defaults to None.
- classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, dataset_path: str = './deeplake/', **kwargs: Any) langchain.vectorstores.deeplake.DeepLake [source]#
Create a Deep Lake dataset from a raw documents.
If a dataset_path is specified, the dataset will be persisted in that location, otherwise by default at ./deeplake
- Parameters
path (str, pathlib.Path) –
The full path to the dataset. Can be:
- Deep Lake cloud path of the form
hub://username/dataset_name
. To write to Deep Lake cloud datasets, ensure that you are logged in to Deep Lake (use ‘activeloop login’ from command line)
- Deep Lake cloud path of the form
- AWS S3 path of the form
s3://bucketname/path/to/dataset
. Credentials are required in either the environment
- AWS S3 path of the form
- Google Cloud Storage path of the form
gcs://bucketname/path/to/dataset
Credentials are required in either the environment
- Local file system path of the form
./path/to/dataset
or ~/path/to/dataset
orpath/to/dataset
.
- Local file system path of the form
- In-memory path of the form
mem://path/to/dataset
which doesn’t save the dataset, but keeps it in memory instead. Should be used only for testing as it does not persist.
- In-memory path of the form
documents (List[Document]) – List of documents to add.
embedding (Optional[Embeddings]) – Embedding function. Defaults to None.
metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.
ids (Optional[List[str]]) – List of document IDs. Defaults to None.
- Returns
Deep Lake dataset.
- Return type
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – text to embed and run the query on.
k – Number of Documents to return. Defaults to 4.
query – Text to look up documents similar to.
embedding – Embedding function to use. Defaults to None.
k – Number of Documents to return. Defaults to 4.
distance_metric – L2 for Euclidean, L1 for Nuclear, max L-infinity distance, cos for cosine similarity, ‘dot’ for dot product Defaults to L2.
filter – Attribute filter by metadata example {‘key’: ‘value’}. Defaults to None.
maximal_marginal_relevance – Whether to use maximal marginal relevance. Defaults to False.
fetch_k – Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.
return_score – Whether to return the score. Defaults to False.
- Returns
List of Documents most similar to the query vector.
- similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query vector.
- similarity_search_with_score(query: str, distance_metric: str = 'L2', k: int = 4, filter: Optional[Dict[str, str]] = None) List[Tuple[langchain.schema.Document, float]] [source]#
Run similarity search with Deep Lake with distance returned.
- Parameters
query (str) – Query text to search for.
distance_metric – L2 for Euclidean, L1 for Nuclear, max L-infinity distance, cos for cosine similarity, ‘dot’ for dot product. Defaults to L2.
k (int) – Number of results to return. Defaults to 4.
filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.
- Returns
- List of documents most similar to the query
text with distance in float.
- Return type
List[Tuple[Document, float]]
- class langchain.vectorstores.DocArrayHnswSearch(doc_index: BaseDocIndex, embedding: langchain.embeddings.base.Embeddings)[source]#
Wrapper around HnswLib storage.
To use it, you should have the
docarray
package with version >=0.32.0 installed. You can install it with pip install “langchain[docarray]”.- classmethod from_params(embedding: langchain.embeddings.base.Embeddings, work_dir: str, n_dim: int, dist_metric: Literal['cosine', 'ip', 'l2'] = 'cosine', max_elements: int = 1024, index: bool = True, ef_construction: int = 200, ef: int = 10, M: int = 16, allow_replace_deleted: bool = True, num_threads: int = 1, **kwargs: Any) langchain.vectorstores.docarray.hnsw.DocArrayHnswSearch [source]#
Initialize DocArrayHnswSearch store.
- Parameters
embedding (Embeddings) – Embedding function.
work_dir (str) – path to the location where all the data will be stored.
n_dim (int) – dimension of an embedding.
dist_metric (str) – Distance metric for DocArrayHnswSearch can be one of: “cosine”, “ip”, and “l2”. Defaults to “cosine”.
max_elements (int) – Maximum number of vectors that can be stored. Defaults to 1024.
index (bool) – Whether an index should be built for this field. Defaults to True.
ef_construction (int) – defines a construction time/accuracy trade-off. Defaults to 200.
ef (int) – parameter controlling query time/accuracy trade-off. Defaults to 10.
M (int) – parameter that defines the maximum number of outgoing connections in the graph. Defaults to 16.
allow_replace_deleted (bool) – Enables replacing of deleted elements with new added ones. Defaults to True.
num_threads (int) – Sets the number of cpu threads to use. Defaults to 1.
**kwargs – Other keyword arguments to be passed to the get_doc_cls method.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, work_dir: Optional[str] = None, n_dim: Optional[int] = None, **kwargs: Any) langchain.vectorstores.docarray.hnsw.DocArrayHnswSearch [source]#
Create an DocArrayHnswSearch store and insert data.
- Parameters
texts (List[str]) – Text data.
embedding (Embeddings) – Embedding function.
metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.
work_dir (str) – path to the location where all the data will be stored.
n_dim (int) – dimension of an embedding.
**kwargs – Other keyword arguments to be passed to the __init__ method.
- Returns
DocArrayHnswSearch Vector Store
- class langchain.vectorstores.DocArrayInMemorySearch(doc_index: BaseDocIndex, embedding: langchain.embeddings.base.Embeddings)[source]#
Wrapper around in-memory storage for exact search.
To use it, you should have the
docarray
package with version >=0.32.0 installed. You can install it with pip install “langchain[docarray]”.- classmethod from_params(embedding: langchain.embeddings.base.Embeddings, metric: Literal['cosine_sim', 'euclidian_dist', 'sgeuclidean_dist'] = 'cosine_sim', **kwargs: Any) langchain.vectorstores.docarray.in_memory.DocArrayInMemorySearch [source]#
Initialize DocArrayInMemorySearch store.
- Parameters
embedding (Embeddings) – Embedding function.
metric (str) – metric for exact nearest-neighbor search. Can be one of: “cosine_sim”, “euclidean_dist” and “sqeuclidean_dist”. Defaults to “cosine_sim”.
**kwargs – Other keyword arguments to be passed to the get_doc_cls method.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, **kwargs: Any) langchain.vectorstores.docarray.in_memory.DocArrayInMemorySearch [source]#
Create an DocArrayInMemorySearch store and insert data.
- Parameters
texts (List[str]) – Text data.
embedding (Embeddings) – Embedding function.
metadatas (Optional[List[Dict[Any, Any]]]) – Metadata for each text if it exists. Defaults to None.
metric (str) – metric for exact nearest-neighbor search. Can be one of: “cosine_sim”, “euclidean_dist” and “sqeuclidean_dist”. Defaults to “cosine_sim”.
- Returns
DocArrayInMemorySearch Vector Store
- class langchain.vectorstores.ElasticVectorSearch(elasticsearch_url: str, index_name: str, embedding: langchain.embeddings.base.Embeddings, *, ssl_verify: Optional[Dict[str, Any]] = None)[source]#
Wrapper around Elasticsearch as a vector database.
To connect to an Elasticsearch instance that does not require login credentials, pass the Elasticsearch URL and index name along with the embedding object to the constructor.
Example
from langchain import ElasticVectorSearch from langchain.embeddings import OpenAIEmbeddings embedding = OpenAIEmbeddings() elastic_vector_search = ElasticVectorSearch( elasticsearch_url="http://localhost:9200", index_name="test_index", embedding=embedding )
To connect to an Elasticsearch instance that requires login credentials, including Elastic Cloud, use the Elasticsearch URL format https://username:password@es_host:9243. For example, to connect to Elastic Cloud, create the Elasticsearch URL with the required authentication details and pass it to the ElasticVectorSearch constructor as the named parameter elasticsearch_url.
You can obtain your Elastic Cloud URL and login credentials by logging in to the Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and navigating to the “Deployments” page.
To obtain your Elastic Cloud password for the default “elastic” user:
Log in to the Elastic Cloud console at https://cloud.elastic.co
Go to “Security” > “Users”
Locate the “elastic” user and click “Edit”
Click “Reset password”
Follow the prompts to reset the password
The format for Elastic Cloud URLs is https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.
Example
from langchain import ElasticVectorSearch from langchain.embeddings import OpenAIEmbeddings embedding = OpenAIEmbeddings() elastic_host = "cluster_id.region_id.gcp.cloud.es.io" elasticsearch_url = f"https://username:password@{elastic_host}:9243" elastic_vector_search = ElasticVectorSearch( elasticsearch_url=elasticsearch_url, index_name="test_index", embedding=embedding )
- Parameters
elasticsearch_url (str) – The URL for the Elasticsearch instance.
index_name (str) – The name of the Elasticsearch index for the embeddings.
embedding (Embeddings) – An object that provides the ability to embed text. It should be an instance of a class that subclasses the Embeddings abstract base class, such as OpenAIEmbeddings()
- Raises
ValueError – If the elasticsearch python package is not installed.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, refresh_indices: bool = True, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
refresh_indices – bool to refresh ElasticSearch indices
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, elasticsearch_url: Optional[str] = None, index_name: Optional[str] = None, refresh_indices: bool = True, **kwargs: Any) langchain.vectorstores.elastic_vector_search.ElasticVectorSearch [source]#
Construct ElasticVectorSearch wrapper from raw documents.
- This is a user-friendly interface that:
Embeds documents.
Creates a new index for the embeddings in the Elasticsearch instance.
Adds the documents to the newly created Elasticsearch index.
This is intended to be a quick way to get started.
Example
from langchain import ElasticVectorSearch from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() elastic_vector_search = ElasticVectorSearch.from_texts( texts, embeddings, elasticsearch_url="http://localhost:9200" )
- similarity_search(query: str, k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- class langchain.vectorstores.FAISS(embedding_function: typing.Callable, index: typing.Any, docstore: langchain.docstore.base.Docstore, index_to_docstore_id: typing.Dict[int, str], relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_relevance_score_fn>, normalize_L2: bool = False)[source]#
Wrapper around FAISS vector database.
To use, you should have the
faiss
python package installed.Example
from langchain import FAISS faiss = FAISS(embedding_function, index, docstore, index_to_docstore_id)
- add_embeddings(text_embeddings: Iterable[Tuple[str, List[float]]], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
text_embeddings – Iterable pairs of string and embedding to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of unique IDs.
- Returns
List of ids from adding the texts into the vectorstore.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of unique IDs.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_embeddings(text_embeddings: List[Tuple[str, List[float]]], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) langchain.vectorstores.faiss.FAISS [source]#
Construct FAISS wrapper from raw documents.
- This is a user friendly interface that:
Embeds documents.
Creates an in memory docstore
Initializes the FAISS database
This is intended to be a quick way to get started.
Example
from langchain import FAISS from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() text_embeddings = embeddings.embed_documents(texts) text_embedding_pairs = list(zip(texts, text_embeddings)) faiss = FAISS.from_embeddings(text_embedding_pairs, embeddings)
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) langchain.vectorstores.faiss.FAISS [source]#
Construct FAISS wrapper from raw documents.
- This is a user friendly interface that:
Embeds documents.
Creates an in memory docstore
Initializes the FAISS database
This is intended to be a quick way to get started.
Example
from langchain import FAISS from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() faiss = FAISS.from_texts(texts, embeddings)
- classmethod load_local(folder_path: str, embeddings: langchain.embeddings.base.Embeddings, index_name: str = 'index') langchain.vectorstores.faiss.FAISS [source]#
Load FAISS index, docstore, and index_to_docstore_id from disk.
- Parameters
folder_path – folder path to load index, docstore, and index_to_docstore_id from.
embeddings – Embeddings to use when generating queries
index_name – for saving with a specific index file name
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- merge_from(target: langchain.vectorstores.faiss.FAISS) None [source]#
Merge another FAISS object with the current one.
Add the target FAISS to the current one.
- Parameters
target – FAISS object you wish to merge into the current one
- Returns
None.
- save_local(folder_path: str, index_name: str = 'index') None [source]#
Save FAISS index, docstore, and index_to_docstore_id to disk.
- Parameters
folder_path – folder path to save index, docstore, and index_to_docstore_id to.
index_name – for saving with a specific index file name
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the embedding.
- similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of documents most similar to the query text with L2 distance in float. Lower score represents more similarity.
- similarity_search_with_score_by_vector(embedding: List[float], k: int = 4) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
embedding – Embedding vector to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of documents most similar to the query text and L2 distance in float for each. Lower score represents more similarity.
- class langchain.vectorstores.LanceDB(connection: Any, embedding: langchain.embeddings.base.Embeddings, vector_key: Optional[str] = 'vector', id_key: Optional[str] = 'id', text_key: Optional[str] = 'text')[source]#
Wrapper around LanceDB vector database.
To use, you should have
lancedb
python package installed.Example
db = lancedb.connect('./lancedb') table = db.open_table('my_table') vectorstore = LanceDB(table, embedding_function) vectorstore.add_texts(['text1', 'text2']) result = vectorstore.similarity_search('text1')
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Turn texts into embedding and add it to the database
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of ids to associate with the texts.
- Returns
List of ids of the added texts.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, connection: Any = None, vector_key: Optional[str] = 'vector', id_key: Optional[str] = 'id', text_key: Optional[str] = 'text', **kwargs: Any) langchain.vectorstores.lancedb.LanceDB [source]#
Return VectorStore initialized from texts and embeddings.
- class langchain.vectorstores.MatchingEngine(project_id: str, index: MatchingEngineIndex, endpoint: MatchingEngineIndexEndpoint, embedding: Embeddings, gcs_client: storage.Client, gcs_bucket_name: str, credentials: Optional[Credentials] = None)[source]#
Vertex Matching Engine implementation of the vector store.
While the embeddings are stored in the Matching Engine, the embedded documents will be stored in GCS.
An existing Index and corresponding Endpoint are preconditions for using this module.
See usage in docs/modules/indexes/vectorstores/examples/matchingengine.ipynb
Note that this implementation is mostly meant for reading if you are planning to do a real time implementation. While reading is a real time operation, updating the index takes close to one hour.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_components(project_id: str, region: str, gcs_bucket_name: str, index_id: str, endpoint_id: str, credentials_path: Optional[str] = None, embedding: Optional[langchain.embeddings.base.Embeddings] = None) langchain.vectorstores.matching_engine.MatchingEngine [source]#
Takes the object creation out of the constructor.
- Parameters
project_id – The GCP project id.
region – The default location making the API calls. It must have
regional. (the same location as the GCS bucket and must be) –
gcs_bucket_name – The location where the vectors will be stored in
created. (order for the index to be) –
index_id – The id of the created index.
endpoint_id – The id of the created endpoint.
credentials_path – (Optional) The path of the Google credentials on
system. (the local file) –
embedding – The
Embeddings
that will be used fortexts. (embedding the) –
- Returns
A configured MatchingEngine with the texts added to the index.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.matching_engine.MatchingEngine [source]#
Use from components instead.
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – The string that will be used to search for similar documents.
k – The amount of neighbors that will be retrieved.
- Returns
A list of k matching documents.
- class langchain.vectorstores.Milvus(embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'LangChainCollection', connection_args: Optional[dict[str, Any]] = None, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]#
Wrapper around the Milvus vector database.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, timeout: Optional[int] = None, batch_size: int = 1000, **kwargs: Any) List[str] [source]#
Insert text data into Milvus.
Inserting data when the collection has not be made yet will result in creating a new Collection. The data of the first entity decides the schema of the new collection, the dim is extracted from the first embedding and the columns are decided by the first metadata dict. Metada keys will need to be present for all inserted values. At the moment there is no None equivalent in Milvus.
- Parameters
texts (Iterable[str]) – The texts to embed, it is assumed that they all fit in memory.
metadatas (Optional[List[dict]]) – Metadata dicts attached to each of the texts. Defaults to None.
timeout (Optional[int]) – Timeout for each batch insert. Defaults to None.
batch_size (int, optional) – Batch size to use for insertion. Defaults to 1000.
- Raises
MilvusException – Failure to add texts
- Returns
The resulting keys for each inserted element.
- Return type
List[str]
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'LangChainCollection', connection_args: dict[str, Any] = {'host': 'localhost', 'password': '', 'port': '19530', 'secure': False, 'user': ''}, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: bool = False, **kwargs: Any) langchain.vectorstores.milvus.Milvus [source]#
Create a Milvus collection, indexes it with HNSW, and insert data.
- Parameters
texts (List[str]) – Text data.
embedding (Embeddings) – Embedding function.
metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.
collection_name (str, optional) – Collection name to use. Defaults to “LangChainCollection”.
connection_args (dict[str, Any], optional) – Connection args to use. Defaults to DEFAULT_MILVUS_CONNECTION.
consistency_level (str, optional) – Which consistency level to use. Defaults to “Session”.
index_params (Optional[dict], optional) – Which index_params to use. Defaults to None.
search_params (Optional[dict], optional) – Which search params to use. Defaults to None.
drop_old (Optional[bool], optional) – Whether to drop the collection with that name if it exists. Defaults to False.
- Returns
Milvus Vector Store
- Return type
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a search and return results that are reordered by MMR.
- Parameters
query (str) – The text being searched.
k (int, optional) – How many results to give. Defaults to 4.
fetch_k (int, optional) – Total results to select k from. Defaults to 20.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5
param (dict, optional) – The search params for the specified index. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Returns
Document results for search.
- Return type
List[Document]
- max_marginal_relevance_search_by_vector(embedding: list[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a search and return results that are reordered by MMR.
- Parameters
embedding (str) – The embedding vector being searched.
k (int, optional) – How many results to give. Defaults to 4.
fetch_k (int, optional) – Total results to select k from. Defaults to 20.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5
param (dict, optional) – The search params for the specified index. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Returns
Document results for search.
- Return type
List[Document]
- similarity_search(query: str, k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search against the query string.
- Parameters
query (str) – The text to search.
k (int, optional) – How many results to return. Defaults to 4.
param (dict, optional) – The search params for the index type. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Returns
Document results for search.
- Return type
List[Document]
- similarity_search_by_vector(embedding: List[float], k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search against the query string.
- Parameters
embedding (List[float]) – The embedding vector to search.
k (int, optional) – How many results to return. Defaults to 4.
param (dict, optional) – The search params for the index type. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Returns
Document results for search.
- Return type
List[Document]
- similarity_search_with_score(query: str, k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Perform a search on a query string and return results with score.
For more information about the search parameters, take a look at the pymilvus documentation found here: https://milvus.io/api-reference/pymilvus/v2.2.6/Collection/search().md
- Parameters
query (str) – The text being searched.
k (int, optional) – The amount of results ot return. Defaults to 4.
param (dict) – The search params for the specified index. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Return type
List[float], List[Tuple[Document, any, any]]
- similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Perform a search on a query string and return results with score.
For more information about the search parameters, take a look at the pymilvus documentation found here: https://milvus.io/api-reference/pymilvus/v2.2.6/Collection/search().md
- Parameters
embedding (List[float]) – The embedding vector being searched.
k (int, optional) – The amount of results ot return. Defaults to 4.
param (dict) – The search params for the specified index. Defaults to None.
expr (str, optional) – Filtering expression. Defaults to None.
timeout (int, optional) – How long to wait before timeout error. Defaults to None.
kwargs – Collection.search() keyword arguments.
- Returns
Result doc and score.
- Return type
List[Tuple[Document, float]]
- class langchain.vectorstores.MongoDBAtlasVectorSearch(collection: Collection[MongoDBDocumentType], embedding: Embeddings, *, index_name: str = 'default', text_key: str = 'text', embedding_key: str = 'embedding')[source]#
Wrapper around MongoDB Atlas Vector Search.
To use, you should have both: - the
pymongo
python package installed - a connection string associated with a MongoDB Atlas Cluster having deployed anAtlas Search index
Example
from langchain.vectorstores import MongoDBAtlasVectorSearch from langchain.embeddings.openai import OpenAIEmbeddings from pymongo import MongoClient mongo_client = MongoClient("<YOUR-CONNECTION-STRING>") collection = mongo_client["<db_name>"]["<collection_name>"] embeddings = OpenAIEmbeddings() vectorstore = MongoDBAtlasVectorSearch(collection, embeddings)
- add_texts(texts: Iterable[str], metadatas: Optional[List[Dict[str, Any]]] = None, **kwargs: Any) List [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_connection_string(connection_string: str, namespace: str, embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.mongodb_atlas.MongoDBAtlasVectorSearch [source]#
- classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, collection: Optional[Collection[MongoDBDocumentType]] = None, **kwargs: Any) MongoDBAtlasVectorSearch [source]#
Construct MongoDBAtlasVectorSearch wrapper from raw documents.
- This is a user-friendly interface that:
Embeds documents.
- Adds the documents to a provided MongoDB Atlas Vector Search index
(Lucene)
This is intended to be a quick way to get started.
Example
- similarity_search(query: str, k: int = 4, pre_filter: Optional[dict] = None, post_filter_pipeline: Optional[List[Dict]] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return MongoDB documents most similar to query.
Use the knnBeta Operator available in MongoDB Atlas Search This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes. For more: https://www.mongodb.com/docs/atlas/atlas-search/knn-beta
- Parameters
query – Text to look up documents similar to.
k – Optional Number of Documents to return. Defaults to 4.
pre_filter – Optional Dictionary of argument(s) to prefilter on document fields.
post_filter_pipeline – Optional Pipeline of MongoDB aggregation stages following the knnBeta search.
- Returns
List of Documents most similar to the query and score for each
- similarity_search_with_score(query: str, *, k: int = 4, pre_filter: Optional[dict] = None, post_filter_pipeline: Optional[List[Dict]] = None) List[Tuple[langchain.schema.Document, float]] [source]#
Return MongoDB documents most similar to query, along with scores.
Use the knnBeta Operator available in MongoDB Atlas Search This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes. For more: https://www.mongodb.com/docs/atlas/atlas-search/knn-beta
- Parameters
query – Text to look up documents similar to.
k – Optional Number of Documents to return. Defaults to 4.
pre_filter – Optional Dictionary of argument(s) to prefilter on document fields.
post_filter_pipeline – Optional Pipeline of MongoDB aggregation stages following the knnBeta search.
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.MyScale(embedding: langchain.embeddings.base.Embeddings, config: Optional[langchain.vectorstores.myscale.MyScaleSettings] = None, **kwargs: Any)[source]#
Wrapper around MyScale vector database
You need a clickhouse-connect python package, and a valid account to connect to MyScale.
MyScale can not only search with simple vector indexes, it also supports complex query with multiple conditions, constraints and even sub-queries.
- For more information, please visit
[myscale official site](https://docs.myscale.com/en/overview/)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, batch_size: int = 32, ids: Optional[Iterable[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
ids – Optional list of ids to associate with the texts.
batch_size – Batch size of insertion
metadata – Optional column data to be inserted
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, config: Optional[langchain.vectorstores.myscale.MyScaleSettings] = None, text_ids: Optional[Iterable[str]] = None, batch_size: int = 32, **kwargs: Any) langchain.vectorstores.myscale.MyScale [source]#
Create Myscale wrapper with existing texts
- Parameters
embedding_function (Embeddings) – Function to extract text embedding
texts (Iterable[str]) – List or tuple of strings to be added
config (MyScaleSettings, Optional) – Myscale configuration
text_ids (Optional[Iterable], optional) – IDs for the texts. Defaults to None.
batch_size (int, optional) – Batchsize when transmitting data to MyScale. Defaults to 32.
metadata (List[dict], optional) – metadata to texts. Defaults to None.
into (Other keyword arguments will pass) – [clickhouse-connect](https://clickhouse.com/docs/en/integrations/python#clickhouse-connect-driver-api)
- Returns
MyScale Index
- property metadata_column: str#
- similarity_search(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search with MyScale
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of Documents
- Return type
List[Document]
- similarity_search_by_vector(embedding: List[float], k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Perform a similarity search with MyScale by vectors
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of (Document, similarity)
- Return type
List[Document]
- similarity_search_with_relevance_scores(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Perform a similarity search with MyScale
- Parameters
query (str) – query string
k (int, optional) – Top K neighbors to retrieve. Defaults to 4.
where_str (Optional[str], optional) – where condition string. Defaults to None.
NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.
- Returns
List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.
- Return type
List[Document]
- pydantic settings langchain.vectorstores.MyScaleSettings[source]#
MyScale Client Configuration
- Attribute:
- myscale_host (str)An URL to connect to MyScale backend.
Defaults to ‘localhost’.
myscale_port (int) : URL port to connect with HTTP. Defaults to 8443. username (str) : Username to login. Defaults to None. password (str) : Password to login. Defaults to None. index_type (str): index type string. index_param (dict): index build parameter. database (str) : Database name to find the table. Defaults to ‘default’. table (str) : Table name to operate on.
Defaults to ‘vector_table’.
- metric (str)Metric to compute distance,
supported are (‘l2’, ‘cosine’, ‘ip’). Defaults to ‘cosine’.
- column_map (Dict)Column type map to project column name onto langchain
semantics. Must have keys: text, id, vector, must be same size to number of columns. For example: .. code-block:: python
- {
‘id’: ‘text_id’, ‘vector’: ‘text_embedding’, ‘text’: ‘text_plain’, ‘metadata’: ‘metadata_dictionary_in_json’,
}
Defaults to identity map.
Show JSON schema
{ "title": "MyScaleSettings", "description": "MyScale Client Configuration\n\nAttribute:\n myscale_host (str) : An URL to connect to MyScale backend.\n Defaults to 'localhost'.\n myscale_port (int) : URL port to connect with HTTP. Defaults to 8443.\n username (str) : Username to login. Defaults to None.\n password (str) : Password to login. Defaults to None.\n index_type (str): index type string.\n index_param (dict): index build parameter.\n database (str) : Database name to find the table. Defaults to 'default'.\n table (str) : Table name to operate on.\n Defaults to 'vector_table'.\n metric (str) : Metric to compute distance,\n supported are ('l2', 'cosine', 'ip'). Defaults to 'cosine'.\n column_map (Dict) : Column type map to project column name onto langchain\n semantics. Must have keys: `text`, `id`, `vector`,\n must be same size to number of columns. For example:\n .. code-block:: python\n\n {\n 'id': 'text_id',\n 'vector': 'text_embedding',\n 'text': 'text_plain',\n 'metadata': 'metadata_dictionary_in_json',\n }\n\n Defaults to identity map.", "type": "object", "properties": { "host": { "title": "Host", "default": "localhost", "env_names": "{'myscale_host'}", "type": "string" }, "port": { "title": "Port", "default": 8443, "env_names": "{'myscale_port'}", "type": "integer" }, "username": { "title": "Username", "env_names": "{'myscale_username'}", "type": "string" }, "password": { "title": "Password", "env_names": "{'myscale_password'}", "type": "string" }, "index_type": { "title": "Index Type", "default": "IVFFLAT", "env_names": "{'myscale_index_type'}", "type": "string" }, "index_param": { "title": "Index Param", "env_names": "{'myscale_index_param'}", "type": "object", "additionalProperties": { "type": "string" } }, "column_map": { "title": "Column Map", "default": { "id": "id", "text": "text", "vector": "vector", "metadata": "metadata" }, "env_names": "{'myscale_column_map'}", "type": "object", "additionalProperties": { "type": "string" } }, "database": { "title": "Database", "default": "default", "env_names": "{'myscale_database'}", "type": "string" }, "table": { "title": "Table", "default": "langchain", "env_names": "{'myscale_table'}", "type": "string" }, "metric": { "title": "Metric", "default": "cosine", "env_names": "{'myscale_metric'}", "type": "string" } }, "additionalProperties": false }
- Config
env_file: str = .env
env_file_encoding: str = utf-8
env_prefix: str = myscale_
- Fields
column_map (Dict[str, str])
database (str)
host (str)
index_param (Optional[Dict[str, str]])
index_type (str)
metric (str)
password (Optional[str])
port (int)
table (str)
username (Optional[str])
- field column_map: Dict[str, str] = {'id': 'id', 'metadata': 'metadata', 'text': 'text', 'vector': 'vector'}#
- field database: str = 'default'#
- field host: str = 'localhost'#
- field index_param: Optional[Dict[str, str]] = None#
- field index_type: str = 'IVFFLAT'#
- field metric: str = 'cosine'#
- field password: Optional[str] = None#
- field port: int = 8443#
- field table: str = 'langchain'#
- field username: Optional[str] = None#
- class langchain.vectorstores.OpenSearchVectorSearch(opensearch_url: str, index_name: str, embedding_function: langchain.embeddings.base.Embeddings, **kwargs: Any)[source]#
Wrapper around OpenSearch as a vector database.
Example
from langchain import OpenSearchVectorSearch opensearch_vector_search = OpenSearchVectorSearch( "http://localhost:9200", "embeddings", embedding_function )
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, bulk_size: int = 500, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
bulk_size – Bulk API request count; Default: 500
- Returns
List of ids from adding the texts into the vectorstore.
- Optional Args:
vector_field: Document field embeddings are stored in. Defaults to “vector_field”.
text_field: Document field the text of the document is stored in. Defaults to “text”.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, bulk_size: int = 500, **kwargs: Any) langchain.vectorstores.opensearch_vector_search.OpenSearchVectorSearch [source]#
Construct OpenSearchVectorSearch wrapper from raw documents.
Example
from langchain import OpenSearchVectorSearch from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() opensearch_vector_search = OpenSearchVectorSearch.from_texts( texts, embeddings, opensearch_url="http://localhost:9200" )
OpenSearch by default supports Approximate Search powered by nmslib, faiss and lucene engines recommended for large datasets. Also supports brute force search through Script Scoring and Painless Scripting.
- Optional Args:
vector_field: Document field embeddings are stored in. Defaults to “vector_field”.
text_field: Document field the text of the document is stored in. Defaults to “text”.
- Optional Keyword Args for Approximate Search:
engine: “nmslib”, “faiss”, “lucene”; default: “nmslib”
space_type: “l2”, “l1”, “cosinesimil”, “linf”, “innerproduct”; default: “l2”
ef_search: Size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches; default: 512
ef_construction: Size of the dynamic list used during k-NN graph creation. Higher values lead to more accurate graph but slower indexing speed; default: 512
m: Number of bidirectional links created for each new element. Large impact on memory consumption. Between 2 and 100; default: 16
- Keyword Args for Script Scoring or Painless Scripting:
is_appx_search: False
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
By default supports Approximate Search. Also supports Script Scoring and Painless Scripting.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- Optional Args:
vector_field: Document field embeddings are stored in. Defaults to “vector_field”.
text_field: Document field the text of the document is stored in. Defaults to “text”.
metadata_field: Document field that metadata is stored in. Defaults to “metadata”. Can be set to a special value “*” to include the entire document.
- Optional Args for Approximate Search:
search_type: “approximate_search”; default: “approximate_search”
boolean_filter: A Boolean filter consists of a Boolean query that contains a k-NN query and a filter.
subquery_clause: Query clause on the knn vector field; default: “must”
lucene_filter: the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering.
- Optional Args for Script Scoring Search:
search_type: “script_scoring”; default: “approximate_search”
space_type: “l2”, “l1”, “linf”, “cosinesimil”, “innerproduct”, “hammingbit”; default: “l2”
pre_filter: script_score query to pre-filter documents before identifying nearest neighbors; default: {“match_all”: {}}
- Optional Args for Painless Scripting Search:
search_type: “painless_scripting”; default: “approximate_search”
space_type: “l2Squared”, “l1Norm”, “cosineSimilarity”; default: “l2Squared”
pre_filter: script_score query to pre-filter documents before identifying nearest neighbors; default: {“match_all”: {}}
- similarity_search_with_score(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs and it’s scores most similar to query.
By default supports Approximate Search. Also supports Script Scoring and Painless Scripting.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents along with its scores most similar to the query.
- Optional Args:
same as similarity_search
- class langchain.vectorstores.Pinecone(index: Any, embedding_function: Callable, text_key: str, namespace: Optional[str] = None)[source]#
Wrapper around Pinecone vector database.
To use, you should have the
pinecone-client
python package installed.Example
from langchain.vectorstores import Pinecone from langchain.embeddings.openai import OpenAIEmbeddings import pinecone # The environment should be the one specified next to the API key # in your Pinecone console pinecone.init(api_key="***", environment="...") index = pinecone.Index("langchain-demo") embeddings = OpenAIEmbeddings() vectorstore = Pinecone(index, embeddings.embed_query, "text")
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, namespace: Optional[str] = None, batch_size: int = 32, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of ids to associate with the texts.
namespace – Optional pinecone namespace to add the texts to.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_existing_index(index_name: str, embedding: langchain.embeddings.base.Embeddings, text_key: str = 'text', namespace: Optional[str] = None) langchain.vectorstores.pinecone.Pinecone [source]#
Load pinecone vectorstore from index name.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, batch_size: int = 32, text_key: str = 'text', index_name: Optional[str] = None, namespace: Optional[str] = None, **kwargs: Any) langchain.vectorstores.pinecone.Pinecone [source]#
Construct Pinecone wrapper from raw documents.
- This is a user friendly interface that:
Embeds documents.
Adds the documents to a provided Pinecone index
This is intended to be a quick way to get started.
Example
from langchain import Pinecone from langchain.embeddings import OpenAIEmbeddings import pinecone # The environment should be the one specified next to the API key # in your Pinecone console pinecone.init(api_key="***", environment="...") embeddings = OpenAIEmbeddings() pinecone = Pinecone.from_texts( texts, embeddings, index_name="langchain-demo" )
- similarity_search(query: str, k: int = 4, filter: Optional[dict] = None, namespace: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return pinecone documents most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – Dictionary of argument(s) to filter on metadata
namespace – Namespace to search in. Default will search in ‘’ namespace.
- Returns
List of Documents most similar to the query and score for each
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None, namespace: Optional[str] = None) List[Tuple[langchain.schema.Document, float]] [source]#
Return pinecone documents most similar to query, along with scores.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – Dictionary of argument(s) to filter on metadata
namespace – Namespace to search in. Default will search in ‘’ namespace.
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.Qdrant(client: Any, collection_name: str, embeddings: Optional[langchain.embeddings.base.Embeddings] = None, content_payload_key: str = 'page_content', metadata_payload_key: str = 'metadata', embedding_function: Optional[Callable] = None)[source]#
Wrapper around Qdrant vector database.
To use you should have the
qdrant-client
package installed.Example
from qdrant_client import QdrantClient from langchain import Qdrant client = QdrantClient() collection_name = "MyCollection" qdrant = Qdrant(client, collection_name, embedding_function)
- CONTENT_KEY = 'page_content'#
- METADATA_KEY = 'metadata'#
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[Sequence[str]] = None, batch_size: int = 64, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of ids to associate with the texts. Ids have to be uuid-like strings.
batch_size – How many vectors upload per-request. Default: 64
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[Sequence[str]] = None, location: Optional[str] = None, url: Optional[str] = None, port: Optional[int] = 6333, grpc_port: int = 6334, prefer_grpc: bool = False, https: Optional[bool] = None, api_key: Optional[str] = None, prefix: Optional[str] = None, timeout: Optional[float] = None, host: Optional[str] = None, path: Optional[str] = None, collection_name: Optional[str] = None, distance_func: str = 'Cosine', content_payload_key: str = 'page_content', metadata_payload_key: str = 'metadata', batch_size: int = 64, shard_number: Optional[int] = None, replication_factor: Optional[int] = None, write_consistency_factor: Optional[int] = None, on_disk_payload: Optional[bool] = None, hnsw_config: Optional[common_types.HnswConfigDiff] = None, optimizers_config: Optional[common_types.OptimizersConfigDiff] = None, wal_config: Optional[common_types.WalConfigDiff] = None, quantization_config: Optional[common_types.QuantizationConfig] = None, init_from: Optional[common_types.InitFrom] = None, **kwargs: Any) Qdrant [source]#
Construct Qdrant wrapper from a list of texts.
- Parameters
texts – A list of texts to be indexed in Qdrant.
embedding – A subclass of Embeddings, responsible for text vectorization.
metadatas – An optional list of metadata. If provided it has to be of the same length as a list of texts.
ids – Optional list of ids to associate with the texts. Ids have to be uuid-like strings.
location – If :memory: - use in-memory Qdrant instance. If str - use it as a url parameter. If None - fallback to relying on host and port parameters.
url – either host or str of “Optional[scheme], host, Optional[port], Optional[prefix]”. Default: None
port – Port of the REST API interface. Default: 6333
grpc_port – Port of the gRPC interface. Default: 6334
prefer_grpc – If true - use gPRC interface whenever possible in custom methods. Default: False
https – If true - use HTTPS(SSL) protocol. Default: None
api_key – API key for authentication in Qdrant Cloud. Default: None
prefix –
If not None - add prefix to the REST URL path. Example: service/v1 will result in
http://localhost:6333/service/v1/{qdrant-endpoint} for REST API.
Default: None
timeout – Timeout for REST and gRPC API requests. Default: 5.0 seconds for REST and unlimited for gRPC
host – Host name of Qdrant service. If url and host are None, set to ‘localhost’. Default: None
path – Path in which the vectors will be stored while using local mode. Default: None
collection_name – Name of the Qdrant collection to be used. If not provided, it will be created randomly. Default: None
distance_func – Distance function. One of: “Cosine” / “Euclid” / “Dot”. Default: “Cosine”
content_payload_key – A payload key used to store the content of the document. Default: “page_content”
metadata_payload_key – A payload key used to store the metadata of the document. Default: “metadata”
batch_size – How many vectors upload per-request. Default: 64
shard_number – Number of shards in collection. Default is 1, minimum is 1.
replication_factor – Replication factor for collection. Default is 1, minimum is 1. Defines how many copies of each shard will be created. Have effect only in distributed mode.
write_consistency_factor – Write consistency factor for collection. Default is 1, minimum is 1. Defines how many replicas should apply the operation for us to consider it successful. Increasing this number will make the collection more resilient to inconsistencies, but will also make it fail if not enough replicas are available. Does not have any performance impact. Have effect only in distributed mode.
on_disk_payload – If true - point`s payload will not be stored in memory. It will be read from the disk every time it is requested. This setting saves RAM by (slightly) increasing the response time. Note: those payload values that are involved in filtering and are indexed - remain in RAM.
hnsw_config – Params for HNSW index
optimizers_config – Params for optimizer
wal_config – Params for Write-Ahead-Log
quantization_config – Params for quantization, if None - quantization will be disabled
init_from – Use data stored in another collection to initialize this collection
**kwargs – Additional arguments passed directly into REST client initialization
This is a user-friendly interface that: 1. Creates embeddings, one for each text 2. Initializes the Qdrant database as an in-memory docstore by default
(and overridable to a remote docstore)
Adds the text embeddings to the Qdrant database
This is intended to be a quick way to get started.
Example
from langchain import Qdrant from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() qdrant = Qdrant.from_texts(texts, embeddings, "localhost")
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- similarity_search(query: str, k: int = 4, filter: Optional[MetadataFilter] = None, search_params: Optional[common_types.SearchParams] = None, offset: int = 0, score_threshold: Optional[float] = None, consistency: Optional[common_types.ReadConsistency] = None, **kwargs: Any) List[Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – Filter by metadata. Defaults to None.
search_params – Additional search params
offset – Offset of the first result to return. May be used to paginate results. Note: large offset values may cause performance issues.
score_threshold – Define a minimal score threshold for the result. If defined, less similar results will not be returned. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
consistency –
Read consistency of the search. Defines how many replicas should be queried before returning the result. Values: - int - number of replicas to query, values should present in all
queried replicas
- ’majority’ - query all replicas, but return values present in the
majority of replicas
- ’quorum’ - query the majority of replicas, return values present in
all of them
’all’ - query all replicas, and return values present in all replicas
- Returns
List of Documents most similar to the query.
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[MetadataFilter] = None, search_params: Optional[common_types.SearchParams] = None, offset: int = 0, score_threshold: Optional[float] = None, consistency: Optional[common_types.ReadConsistency] = None, **kwargs: Any) List[Tuple[Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – Filter by metadata. Defaults to None.
search_params – Additional search params
offset – Offset of the first result to return. May be used to paginate results. Note: large offset values may cause performance issues.
score_threshold – Define a minimal score threshold for the result. If defined, less similar results will not be returned. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.
consistency –
Read consistency of the search. Defines how many replicas should be queried before returning the result. Values: - int - number of replicas to query, values should present in all
queried replicas
- ’majority’ - query all replicas, but return values present in the
majority of replicas
- ’quorum’ - query the majority of replicas, return values present in
all of them
’all’ - query all replicas, and return values present in all replicas
- Returns
List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.
- class langchain.vectorstores.Redis(redis_url: str, index_name: str, embedding_function: typing.Callable, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_relevance_score>, **kwargs: typing.Any)[source]#
Wrapper around Redis vector database.
To use, you should have the
redis
python package installed.Example
from langchain.vectorstores import Redis from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() vectorstore = Redis( redis_url="redis://username:password@localhost:6379" index_name="my-index", embedding_function=embeddings.embed_query, )
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, embeddings: Optional[List[List[float]]] = None, keys: Optional[List[str]] = None, batch_size: int = 1000, **kwargs: Any) List[str] [source]#
Add more texts to the vectorstore.
- Parameters
texts (Iterable[str]) – Iterable of strings/text to add to the vectorstore.
metadatas (Optional[List[dict]], optional) – Optional list of metadatas. Defaults to None.
embeddings (Optional[List[List[float]]], optional) – Optional pre-generated embeddings. Defaults to None.
keys (Optional[List[str]], optional) – Optional key values to use as ids. Defaults to None.
batch_size (int, optional) – Batch size to use for writes. Defaults to 1000.
- Returns
List of ids added to the vectorstore
- Return type
List[str]
- static drop_index(index_name: str, delete_documents: bool, **kwargs: Any) bool [source]#
Drop a Redis search index.
- Parameters
index_name (str) – Name of the index to drop.
delete_documents (bool) – Whether to drop the associated documents.
- Returns
Whether or not the drop was successful.
- Return type
bool
- classmethod from_existing_index(embedding: langchain.embeddings.base.Embeddings, index_name: str, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', **kwargs: Any) langchain.vectorstores.redis.Redis [source]#
Connect to an existing Redis index.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: Optional[str] = None, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', **kwargs: Any) langchain.vectorstores.redis.Redis [source]#
Create a Redis vectorstore from raw documents. This is a user-friendly interface that:
Embeds documents.
Creates a new index for the embeddings in Redis.
Adds the documents to the newly created Redis index.
This is intended to be a quick way to get started. .. rubric:: Example
- classmethod from_texts_return_keys(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: Optional[str] = None, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', distance_metric: Literal['COSINE', 'IP', 'L2'] = 'COSINE', **kwargs: Any) Tuple[langchain.vectorstores.redis.Redis, List[str]] [source]#
Create a Redis vectorstore from raw documents. This is a user-friendly interface that:
Embeds documents.
Creates a new index for the embeddings in Redis.
Adds the documents to the newly created Redis index.
This is intended to be a quick way to get started. .. rubric:: Example
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Returns the most similar indexed documents to the query text.
- Parameters
query (str) – The query text for which to find similar documents.
k (int) – The number of documents to return. Default is 4.
- Returns
A list of documents that are most similar to the query text.
- Return type
List[Document]
- similarity_search_limit_score(query: str, k: int = 4, score_threshold: float = 0.2, **kwargs: Any) List[langchain.schema.Document] [source]#
Returns the most similar indexed documents to the query text within the score_threshold range.
- Parameters
query (str) – The query text for which to find similar documents.
k (int) – The number of documents to return. Default is 4.
score_threshold (float) – The minimum matching score required for a document
0.2. (to be considered a match. Defaults to) –
similarity (Because the similarity calculation algorithm is based on cosine) –
:param : :param the smaller the angle: :param the higher the similarity.:
- Returns
A list of documents that are most similar to the query text, including the match score for each document.
- Return type
List[Document]
Note
If there are no documents that satisfy the score_threshold value, an empty list is returned.
- similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.SKLearnVectorStore(embedding: langchain.embeddings.base.Embeddings, *, persist_path: Optional[str] = None, serializer: Literal['json', 'bson', 'parquet'] = 'json', metric: str = 'cosine', **kwargs: Any)[source]#
A simple in-memory vector store based on the scikit-learn library NearestNeighbors implementation.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, persist_path: Optional[str] = None, **kwargs: Any) langchain.vectorstores.sklearn.SKLearnVectorStore [source]#
Return VectorStore initialized from texts and embeddings.
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param embedding: Embedding to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- class langchain.vectorstores.SingleStoreDB(embedding: langchain.embeddings.base.Embeddings, *, table_name: str = 'embeddings', content_field: str = 'content', metadata_field: str = 'metadata', vector_field: str = 'vector', pool_size: int = 5, max_overflow: int = 10, timeout: float = 30, **kwargs: Any)[source]#
This class serves as a Pythonic interface to the SingleStore DB database. The prerequisite for using this class is the installation of the
singlestoredb
Python package.The SingleStoreDB vectorstore can be created by providing an embedding function and the relevant parameters for the database connection, connection pool, and optionally, the names of the table and the fields to use.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, embeddings: Optional[List[List[float]]] = None, **kwargs: Any) List[str] [source]#
Add more texts to the vectorstore.
- Parameters
texts (Iterable[str]) – Iterable of strings/text to add to the vectorstore.
metadatas (Optional[List[dict]], optional) – Optional list of metadatas. Defaults to None.
embeddings (Optional[List[List[float]]], optional) – Optional pre-generated embeddings. Defaults to None.
- Returns
empty list
- Return type
List[str]
- connection_kwargs#
Create connection pool.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, table_name: str = 'embeddings', content_field: str = 'content', metadata_field: str = 'metadata', vector_field: str = 'vector', pool_size: int = 5, max_overflow: int = 10, timeout: float = 30, **kwargs: Any) langchain.vectorstores.singlestoredb.SingleStoreDB [source]#
Create a SingleStoreDB vectorstore from raw documents. This is a user-friendly interface that:
Embeds documents.
Creates a new table for the embeddings in SingleStoreDB.
Adds the documents to the newly created table.
This is intended to be a quick way to get started. .. rubric:: Example
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Returns the most similar indexed documents to the query text.
Uses cosine similarity.
- Parameters
query (str) – The query text for which to find similar documents.
k (int) – The number of documents to return. Default is 4.
- Returns
A list of documents that are most similar to the query text.
- Return type
List[Document]
- similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query. Uses cosine similarity.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query and score for each
- vector_field#
Pass the rest of the kwargs to the connection.
- class langchain.vectorstores.SupabaseVectorStore(client: supabase.client.Client, embedding: Embeddings, table_name: str, query_name: Union[str, None] = None)[source]#
VectorStore for a Supabase postgres database. Assumes you have the pgvector extension installed and a match_documents (or similar) function. For more details: https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/supabase
You can implement your own match_documents function in order to limit the search space to a subset of documents based on your own authorization or business logic.
Note that the Supabase Python client does not yet support async operations.
If you’d like to use max_marginal_relevance_search, please review the instructions below on modifying the match_documents function to return matched embeddings.
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict[Any, Any]]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- add_vectors(vectors: List[List[float]], documents: List[langchain.schema.Document]) List[str] [source]#
- classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, client: Optional[supabase.client.Client] = None, table_name: Optional[str] = 'documents', query_name: Union[str, None] = 'match_documents', **kwargs: Any) SupabaseVectorStore [source]#
Return VectorStore initialized from texts and embeddings.
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
max_marginal_relevance_search requires that query_name returns matched embeddings alongside the match documents. The following function demonstrates how to do this:
```sql CREATE FUNCTION match_documents_embeddings(query_embedding vector(1536),
match_count int)
- RETURNS TABLE(
id bigint, content text, metadata jsonb, embedding vector(1536), similarity float)
LANGUAGE plpgsql AS $$ # variable_conflict use_column
- BEGIN
RETURN query SELECT
id, content, metadata, embedding, 1 -(docstore.embedding <=> query_embedding) AS similarity
- FROM
docstore
- ORDER BY
docstore.embedding <=> query_embedding
LIMIT match_count;
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- query_name: str#
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query vector.
- similarity_search_by_vector_returning_embeddings(query: List[float], k: int) List[Tuple[langchain.schema.Document, float, numpy.ndarray[numpy.float32, Any]]] [source]#
- similarity_search_by_vector_with_relevance_scores(query: List[float], k: int) List[Tuple[langchain.schema.Document, float]] [source]#
- similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs and relevance scores in the range [0, 1].
0 is dissimilar, 1 is most similar.
- Parameters
query – input text
k – Number of Documents to return. Defaults to 4.
**kwargs –
kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to
filter the resulting set of retrieved docs
- Returns
List of Tuples of (doc, similarity_score)
- table_name: str#
- class langchain.vectorstores.Tair(embedding_function: langchain.embeddings.base.Embeddings, url: str, index_name: str, content_key: str = 'content', metadata_key: str = 'metadata', search_params: Optional[dict] = None, **kwargs: Any)[source]#
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Add texts data to an existing index.
- create_index_if_not_exist(dim: int, distance_type: str, index_type: str, data_type: str, **kwargs: Any) bool [source]#
- static drop_index(index_name: str = 'langchain', **kwargs: Any) bool [source]#
Drop an existing index.
- Parameters
index_name (str) – Name of the index to drop.
- Returns
True if the index is dropped successfully.
- Return type
bool
- classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair [source]#
Return VectorStore initialized from documents and embeddings.
- classmethod from_existing_index(embedding: langchain.embeddings.base.Embeddings, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair [source]#
Connect to an existing Tair index.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair [source]#
Return VectorStore initialized from texts and embeddings.
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Returns the most similar indexed documents to the query text.
- Parameters
query (str) – The query text for which to find similar documents.
k (int) – The number of documents to return. Default is 4.
- Returns
A list of documents that are most similar to the query text.
- Return type
List[Document]
- class langchain.vectorstores.Tigris(client: TigrisClient, embeddings: Embeddings, index_name: str)[source]#
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of ids for documents. Ids will be autogenerated if not provided.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, client: Optional[TigrisClient] = None, index_name: Optional[str] = None, **kwargs: Any) Tigris [source]#
Return VectorStore initialized from texts and embeddings.
- property search_index: TigrisVectorStore#
- similarity_search(query: str, k: int = 4, filter: Optional[TigrisFilter] = None, **kwargs: Any) List[Document] [source]#
Return docs most similar to query.
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[TigrisFilter] = None) List[Tuple[Document, float]] [source]#
Run similarity search with Chroma with distance.
- Parameters
query (str) – Query text to search for.
k (int) – Number of results to return. Defaults to 4.
filter (Optional[TigrisFilter]) – Filter by metadata. Defaults to None.
- Returns
- List of documents most similar to the query
text with distance in float.
- Return type
List[Tuple[Document, float]]
- class langchain.vectorstores.Typesense(typesense_client: Client, embedding: Embeddings, *, typesense_collection_name: Optional[str] = None, text_key: str = 'text')[source]#
Wrapper around Typesense vector search.
To use, you should have the
typesense
python package installed.Example
from langchain.embedding.openai import OpenAIEmbeddings from langchain.vectorstores import Typesense import typesense node = { "host": "localhost", # For Typesense Cloud use xxx.a1.typesense.net "port": "8108", # For Typesense Cloud use 443 "protocol": "http" # For Typesense Cloud use https } typesense_client = typesense.Client( { "nodes": [node], "api_key": "<API_KEY>", "connection_timeout_seconds": 2 } ) typesense_collection_name = "langchain-memory" embedding = OpenAIEmbeddings() vectorstore = Typesense( typesense_client, typesense_collection_name, embedding.embed_query, "text", )
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embedding and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
ids – Optional list of ids to associate with the texts.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_client_params(embedding: langchain.embeddings.base.Embeddings, *, host: str = 'localhost', port: Union[str, int] = '8108', protocol: str = 'http', typesense_api_key: Optional[str] = None, connection_timeout_seconds: int = 2, **kwargs: Any) langchain.vectorstores.typesense.Typesense [source]#
Initialize Typesense directly from client parameters.
Example
from langchain.embedding.openai import OpenAIEmbeddings from langchain.vectorstores import Typesense # Pass in typesense_api_key as kwarg or set env var "TYPESENSE_API_KEY". vectorstore = Typesense( OpenAIEmbeddings(), host="localhost", port="8108", protocol="http", typesense_collection_name="langchain-memory", )
- classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, typesense_client: Optional[Client] = None, typesense_client_params: Optional[dict] = None, typesense_collection_name: Optional[str] = None, text_key: str = 'text', **kwargs: Any) Typesense [source]#
Construct Typesense wrapper from raw text.
- similarity_search(query: str, k: int = 4, filter: Optional[str] = '', **kwargs: Any) List[langchain.schema.Document] [source]#
Return typesense documents most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – typesense filter_by expression to filter documents on
- Returns
List of Documents most similar to the query and score for each
- similarity_search_with_score(query: str, k: int = 4, filter: Optional[str] = '') List[Tuple[langchain.schema.Document, float]] [source]#
Return typesense documents most similar to query, along with scores.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
filter – typesense filter_by expression to filter documents on
- Returns
List of Documents most similar to the query and score for each
- class langchain.vectorstores.Vectara(vectara_customer_id: Optional[str] = None, vectara_corpus_id: Optional[str] = None, vectara_api_key: Optional[str] = None)[source]#
Implementation of Vector Store using Vectara (https://vectara.com). .. rubric:: Example
from langchain.vectorstores import Vectara vectorstore = Vectara( vectara_customer_id=vectara_customer_id, vectara_corpus_id=vectara_corpus_id, vectara_api_key=vectara_api_key )
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
- Returns
List of ids from adding the texts into the vectorstore.
- classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.vectara.Vectara [source]#
Construct Vectara wrapper from raw documents. This is intended to be a quick way to get started. .. rubric:: Example
from langchain import Vectara vectara = Vectara.from_texts( texts, vectara_customer_id=customer_id, vectara_corpus_id=corpus_id, vectara_api_key=api_key, )
- similarity_search(query: str, k: int = 5, alpha: float = 0.025, filter: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document] [source]#
Return Vectara documents most similar to query, along with scores.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 5.
filter – Dictionary of argument(s) to filter on metadata. For example a filter can be “doc.rating > 3.0 and part.lang = ‘deu’”} see https://docs.vectara.com/docs/search-apis/sql/filter-overview for more details.
- Returns
List of Documents most similar to the query
- similarity_search_with_score(query: str, k: int = 5, alpha: float = 0.025, filter: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return Vectara documents most similar to query, along with scores.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 5.
alpha – parameter for hybrid search (called “lambda” in Vectara documentation).
filter – Dictionary of argument(s) to filter on metadata. For example a filter can be “doc.rating > 3.0 and part.lang = ‘deu’”} see https://docs.vectara.com/docs/search-apis/sql/filter-overview for more details.
- Returns
List of Documents most similar to the query and score for each.
- class langchain.vectorstores.VectorStore[source]#
Interface for vector stores.
- async aadd_documents(documents: List[langchain.schema.Document], **kwargs: Any) List[str] [source]#
Run more documents through the embeddings and add to the vectorstore.
- Parameters
(List[Document] (documents) – Documents to add to the vectorstore.
- Returns
List of IDs of the added texts.
- Return type
List[str]
- async aadd_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- add_documents(documents: List[langchain.schema.Document], **kwargs: Any) List[str] [source]#
Run more documents through the embeddings and add to the vectorstore.
- Parameters
(List[Document] (documents) – Documents to add to the vectorstore.
- Returns
List of IDs of the added texts.
- Return type
List[str]
- abstract add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Run more texts through the embeddings and add to the vectorstore.
- Parameters
texts – Iterable of strings to add to the vectorstore.
metadatas – Optional list of metadatas associated with the texts.
kwargs – vectorstore specific parameters
- Returns
List of ids from adding the texts into the vectorstore.
- async classmethod afrom_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.base.VST [source]#
Return VectorStore initialized from documents and embeddings.
- async classmethod afrom_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.base.VST [source]#
Return VectorStore initialized from texts and embeddings.
- async amax_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
- async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
- async asearch(query: str, search_type: str, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query using specified search type.
- async asimilarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs most similar to query.
- classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.base.VST [source]#
Return VectorStore initialized from documents and embeddings.
- abstract classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.base.VST [source]#
Return VectorStore initialized from texts and embeddings.
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- search(query: str, search_type: str, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query using specified search type.
- abstract similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to embedding vector.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query vector.
- similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]] [source]#
Return docs and relevance scores in the range [0, 1].
0 is dissimilar, 1 is most similar.
- Parameters
query – input text
k – Number of Documents to return. Defaults to 4.
**kwargs –
kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to
filter the resulting set of retrieved docs
- Returns
List of Tuples of (doc, similarity_score)
- class langchain.vectorstores.Weaviate(client: typing.Any, index_name: str, text_key: str, embedding: typing.Optional[langchain.embeddings.base.Embeddings] = None, attributes: typing.Optional[typing.List[str]] = None, relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_score_normalizer>, by_text: bool = True)[source]#
Wrapper around Weaviate vector database.
To use, you should have the
weaviate-client
python package installed.Example
import weaviate from langchain.vectorstores import Weaviate client = weaviate.Client(url=os.environ["WEAVIATE_URL"], ...) weaviate = Weaviate(client, index_name, text_key)
- add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str] [source]#
Upload texts with metadata (properties) to Weaviate.
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.weaviate.Weaviate [source]#
Construct Weaviate wrapper from raw documents.
- This is a user-friendly interface that:
Embeds documents.
Creates a new index for the embeddings in the Weaviate instance.
Adds the documents to the newly created Weaviate index.
This is intended to be a quick way to get started.
Example
from langchain.vectorstores.weaviate import Weaviate from langchain.embeddings import OpenAIEmbeddings embeddings = OpenAIEmbeddings() weaviate = Weaviate.from_texts( texts, embeddings, weaviate_url="http://localhost:8080" )
- max_marginal_relevance_search(query: str, k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.
- Parameters
embedding – Embedding to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
fetch_k – Number of Documents to fetch to pass to MMR algorithm.
lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.
- Returns
List of Documents selected by maximal marginal relevance.
- similarity_search(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- similarity_search_by_text(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document] [source]#
Return docs most similar to query.
- Parameters
query – Text to look up documents similar to.
k – Number of Documents to return. Defaults to 4.
- Returns
List of Documents most similar to the query.
- class langchain.vectorstores.Zilliz(embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'LangChainCollection', connection_args: Optional[dict[str, Any]] = None, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]#
- classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'LangChainCollection', connection_args: dict[str, Any] = {}, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: bool = False, **kwargs: Any) langchain.vectorstores.zilliz.Zilliz [source]#
Create a Zilliz collection, indexes it with HNSW, and insert data.
- Parameters
texts (List[str]) – Text data.
embedding (Embeddings) – Embedding function.
metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.
collection_name (str, optional) – Collection name to use. Defaults to “LangChainCollection”.
connection_args (dict[str, Any], optional) – Connection args to use. Defaults to DEFAULT_MILVUS_CONNECTION.
consistency_level (str, optional) – Which consistency level to use. Defaults to “Session”.
index_params (Optional[dict], optional) – Which index_params to use. Defaults to None.
search_params (Optional[dict], optional) – Which search params to use. Defaults to None.
drop_old (Optional[bool], optional) – Whether to drop the collection with that name if it exists. Defaults to False.
- Returns
Zilliz Vector Store
- Return type