Vector Stores#

Wrappers on top of vector stores.

class langchain.vectorstores.AnalyticDB(connection_string: str, embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'langchain', collection_metadata: Optional[dict] = None, pre_delete_collection: bool = False, logger: Optional[logging.Logger] = None)[source]#

VectorStore implementation using AnalyticDB. AnalyticDB is a distributed full PostgresSQL syntax cloud-native database. - connection_string is a postgres connection string. - embedding_function any embedding function implementing

langchain.embeddings.base.Embeddings interface.

  • collection_name is the name of the collection to use. (default: langchain)
    • NOTE: This is not the name of the table, but the name of the collection.

      The tables will be created when initializing the store (if not exists) So, make sure the user has the right permissions to create tables.

  • pre_delete_collection if True, will delete the collection if it exists.

    (default: False) - Useful for testing.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

connect() sqlalchemy.engine.base.Connection[source]#
classmethod connection_string_from_db_params(driver: str, host: str, port: int, database: str, user: str, password: str) str[source]#

Return connection string from database parameters.

create_collection() None[source]#
create_tables_if_not_exists() None[source]#
delete_collection() None[source]#
drop_tables() None[source]#
classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, collection_name: str = 'langchain', ids: Optional[List[str]] = None, pre_delete_collection: bool = False, **kwargs: Any) langchain.vectorstores.analyticdb.AnalyticDB[source]#

Return VectorStore initialized from documents and embeddings. Postgres connection string is required Either pass it as a parameter or set the PGVECTOR_CONNECTION_STRING environment variable.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'langchain', ids: Optional[List[str]] = None, pre_delete_collection: bool = False, **kwargs: Any) langchain.vectorstores.analyticdb.AnalyticDB[source]#

Return VectorStore initialized from texts and embeddings. Postgres connection string is required Either pass it as a parameter or set the PGVECTOR_CONNECTION_STRING environment variable.

get_collection(session: sqlalchemy.orm.session.Session) Optional[langchain.vectorstores.analyticdb.CollectionStore][source]#
classmethod get_connection_string(kwargs: Dict[str, Any]) str[source]#

Run similarity search with AnalyticDB with distance.

Parameters
  • query (str) – Query text to search for.

  • k (int) – Number of results to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of Documents most similar to the query.

similarity_search_by_vector(embedding: List[float], k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of Documents most similar to the query vector.

similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, filter: Optional[dict] = None) List[Tuple[langchain.schema.Document, float]][source]#
class langchain.vectorstores.Annoy(embedding_function: Callable, index: Any, metric: str, docstore: langchain.docstore.base.Docstore, index_to_docstore_id: Dict[int, str])[source]#

Wrapper around Annoy vector database.

To use, you should have the annoy python package installed.

Example

from langchain import Annoy
db = Annoy(embedding_function, index, docstore, index_to_docstore_id)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_embeddings(text_embeddings: List[Tuple[str, List[float]]], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, metric: str = 'angular', trees: int = 100, n_jobs: int = - 1, **kwargs: Any) langchain.vectorstores.annoy.Annoy[source]#

Construct Annoy wrapper from embeddings.

Parameters
  • text_embeddings – List of tuples of (text, embedding)

  • embedding – Embedding function to use.

  • metadatas – List of metadata dictionaries to associate with documents.

  • metric – Metric to use for indexing. Defaults to “angular”.

  • trees – Number of trees to use for indexing. Defaults to 100.

  • n_jobs – Number of jobs to use for indexing. Defaults to -1

This is a user friendly interface that:
  1. Creates an in memory docstore with provided embeddings

  2. Initializes the Annoy database

This is intended to be a quick way to get started.

Example

from langchain import Annoy
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
text_embeddings = embeddings.embed_documents(texts)
text_embedding_pairs = list(zip(texts, text_embeddings))
db = Annoy.from_embeddings(text_embedding_pairs, embeddings)
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, metric: str = 'angular', trees: int = 100, n_jobs: int = - 1, **kwargs: Any) langchain.vectorstores.annoy.Annoy[source]#

Construct Annoy wrapper from raw documents.

Parameters
  • texts – List of documents to index.

  • embedding – Embedding function to use.

  • metadatas – List of metadata dictionaries to associate with documents.

  • metric – Metric to use for indexing. Defaults to “angular”.

  • trees – Number of trees to use for indexing. Defaults to 100.

  • n_jobs – Number of jobs to use for indexing. Defaults to -1.

This is a user friendly interface that:
  1. Embeds documents.

  2. Creates an in memory docstore

  3. Initializes the Annoy database

This is intended to be a quick way to get started.

Example

from langchain import Annoy
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
index = Annoy.from_texts(texts, embeddings)
classmethod load_local(folder_path: str, embeddings: langchain.embeddings.base.Embeddings) langchain.vectorstores.annoy.Annoy[source]#

Load Annoy index, docstore, and index_to_docstore_id to disk.

Parameters
  • folder_path – folder path to load index, docstore, and index_to_docstore_id from.

  • embeddings – Embeddings to use when generating queries.

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • k – Number of Documents to return. Defaults to 4.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

process_index_results(idxs: List[int], dists: List[float]) List[Tuple[langchain.schema.Document, float]][source]#

Turns annoy results into a list of documents and scores.

Parameters
  • idxs – List of indices of the documents in the index.

  • dists – List of distances of the documents in the index.

Returns

List of Documents and scores.

save_local(folder_path: str, prefault: bool = False) None[source]#

Save Annoy index, docstore, and index_to_docstore_id to disk.

Parameters
  • folder_path – folder path to save index, docstore, and index_to_docstore_id to.

  • prefault – Whether to pre-load the index into memory.

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the query.

similarity_search_by_index(docstore_index: int, k: int = 4, search_k: int = - 1, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to docstore_index.

Parameters
  • docstore_index – Index of document in docstore

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the embedding.

similarity_search_by_vector(embedding: List[float], k: int = 4, search_k: int = - 1, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the embedding.

similarity_search_with_score(query: str, k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score_by_index(docstore_index: int, k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, search_k: int = - 1) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • search_k – inspect up to search_k nodes which defaults to n_trees * n if not provided

Returns

List of Documents most similar to the query and score for each

class langchain.vectorstores.AtlasDB(name: str, embedding_function: Optional[langchain.embeddings.base.Embeddings] = None, api_key: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False)[source]#

Wrapper around Atlas: Nomic’s neural database and rhizomatic instrument.

To use, you should have the nomic python package installed.

Example

from langchain.vectorstores import AtlasDB
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = AtlasDB("my_project", embeddings.embed_query)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, refresh: bool = True, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts (Iterable[str]) – Texts to add to the vectorstore.

  • metadatas (Optional[List[dict]], optional) – Optional list of metadatas.

  • ids (Optional[List[str]]) – An optional list of ids.

  • refresh (bool) – Whether or not to refresh indices with the updated data. Default True.

Returns

List of IDs of the added texts.

Return type

List[str]

create_index(**kwargs: Any) Any[source]#

Creates an index in your project.

See https://docs.nomic.ai/atlas_api.html#nomic.project.AtlasProject.create_index for full detail.

classmethod from_documents(documents: List[langchain.schema.Document], embedding: Optional[langchain.embeddings.base.Embeddings] = None, ids: Optional[List[str]] = None, name: Optional[str] = None, api_key: Optional[str] = None, persist_directory: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False, index_kwargs: Optional[dict] = None, **kwargs: Any) langchain.vectorstores.atlas.AtlasDB[source]#

Create an AtlasDB vectorstore from a list of documents.

Parameters
  • name (str) – Name of the collection to create.

  • api_key (str) – Your nomic API key,

  • documents (List[Document]) – List of documents to add to the vectorstore.

  • embedding (Optional[Embeddings]) – Embedding function. Defaults to None.

  • ids (Optional[List[str]]) – Optional list of document IDs. If None, ids will be auto created

  • description (str) – A description for your project.

  • is_public (bool) – Whether your project is publicly accessible. True by default.

  • reset_project_if_exists (bool) – Whether to reset this project if it already exists. Default False. Generally userful during development and testing.

  • index_kwargs (Optional[dict]) – Dict of kwargs for index creation. See https://docs.nomic.ai/atlas_api.html

Returns

Nomic’s neural database and finest rhizomatic instrument

Return type

AtlasDB

classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, name: Optional[str] = None, api_key: Optional[str] = None, description: str = 'A description for your project', is_public: bool = True, reset_project_if_exists: bool = False, index_kwargs: Optional[dict] = None, **kwargs: Any) langchain.vectorstores.atlas.AtlasDB[source]#

Create an AtlasDB vectorstore from a raw documents.

Parameters
  • texts (List[str]) – The list of texts to ingest.

  • name (str) – Name of the project to create.

  • api_key (str) – Your nomic API key,

  • embedding (Optional[Embeddings]) – Embedding function. Defaults to None.

  • metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.

  • ids (Optional[List[str]]) – Optional list of document IDs. If None, ids will be auto created

  • description (str) – A description for your project.

  • is_public (bool) – Whether your project is publicly accessible. True by default.

  • reset_project_if_exists (bool) – Whether to reset this project if it already exists. Default False. Generally userful during development and testing.

  • index_kwargs (Optional[dict]) – Dict of kwargs for index creation. See https://docs.nomic.ai/atlas_api.html

Returns

Nomic’s neural database and finest rhizomatic instrument

Return type

AtlasDB

Run similarity search with AtlasDB

Parameters
  • query (str) – Query text to search for.

  • k (int) – Number of results to return. Defaults to 4.

Returns

List of documents most similar to the query text.

Return type

List[Document]

class langchain.vectorstores.Chroma(collection_name: str = 'langchain', embedding_function: Optional[Embeddings] = None, persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, collection_metadata: Optional[Dict] = None, client: Optional[chromadb.Client] = None)[source]#

Wrapper around ChromaDB embeddings platform.

To use, you should have the chromadb python package installed.

Example

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = Chroma("langchain_store", embeddings)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts (Iterable[str]) – Texts to add to the vectorstore.

  • metadatas (Optional[List[dict]], optional) – Optional list of metadatas.

  • ids (Optional[List[str]], optional) – Optional list of IDs.

Returns

List of IDs of the added texts.

Return type

List[str]

delete_collection() None[source]#

Delete the collection.

classmethod from_documents(documents: List[Document], embedding: Optional[Embeddings] = None, ids: Optional[List[str]] = None, collection_name: str = 'langchain', persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, client: Optional[chromadb.Client] = None, **kwargs: Any) Chroma[source]#

Create a Chroma vectorstore from a list of documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

Parameters
  • collection_name (str) – Name of the collection to create.

  • persist_directory (Optional[str]) – Directory to persist the collection.

  • ids (Optional[List[str]]) – List of document IDs. Defaults to None.

  • documents (List[Document]) – List of documents to add to the vectorstore.

  • embedding (Optional[Embeddings]) – Embedding function. Defaults to None.

  • client_settings (Optional[chromadb.config.Settings]) – Chroma client settings

Returns

Chroma vectorstore.

Return type

Chroma

classmethod from_texts(texts: List[str], embedding: Optional[Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, collection_name: str = 'langchain', persist_directory: Optional[str] = None, client_settings: Optional[chromadb.config.Settings] = None, client: Optional[chromadb.Client] = None, **kwargs: Any) Chroma[source]#

Create a Chroma vectorstore from a raw documents.

If a persist_directory is specified, the collection will be persisted there. Otherwise, the data will be ephemeral in-memory.

Parameters
  • texts (List[str]) – List of texts to add to the collection.

  • collection_name (str) – Name of the collection to create.

  • persist_directory (Optional[str]) – Directory to persist the collection.

  • embedding (Optional[Embeddings]) – Embedding function. Defaults to None.

  • metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.

  • ids (Optional[List[str]]) – List of document IDs. Defaults to None.

  • client_settings (Optional[chromadb.config.Settings]) – Chroma client settings

Returns

Chroma vectorstore.

Return type

Chroma

get(include: Optional[List[str]] = None) Dict[str, Any][source]#

Gets the collection.

Parameters

include (Optional[List[str]]) – List of fields to include from db. Defaults to None.

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of Documents selected by maximal marginal relevance.

persist() None[source]#

Persist the collection.

This can be used to explicitly persist the data to disk. It will also be called automatically when the object is destroyed.

Run similarity search with Chroma.

Parameters
  • query (str) – Query text to search for.

  • k (int) – Number of results to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of documents most similar to the query text.

Return type

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector. :param embedding: Embedding to look up documents similar to. :type embedding: str :param k: Number of Documents to return. Defaults to 4. :type k: int :param filter: Filter by metadata. Defaults to None. :type filter: Optional[Dict[str, str]]

Returns

List of Documents most similar to the query vector.

similarity_search_with_score(query: str, k: int = 4, filter: Optional[Dict[str, str]] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Run similarity search with Chroma with distance.

Parameters
  • query (str) – Query text to search for.

  • k (int) – Number of results to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.

Return type

List[Tuple[Document, float]]

update_document(document_id: str, document: langchain.schema.Document) None[source]#

Update a document in the collection.

Parameters
  • document_id (str) – ID of the document to update.

  • document (Document) – Document to update.

class langchain.vectorstores.Clickhouse(embedding: langchain.embeddings.base.Embeddings, config: Optional[langchain.vectorstores.clickhouse.ClickhouseSettings] = None, **kwargs: Any)[source]#

Wrapper around ClickHouse vector database

You need a clickhouse-connect python package, and a valid account to connect to ClickHouse.

ClickHouse can not only search with simple vector indexes, it also supports complex query with multiple conditions, constraints and even sub-queries.

For more information, please visit

[ClickHouse official site](https://clickhouse.com/clickhouse)

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, batch_size: int = 32, ids: Optional[Iterable[str]] = None, **kwargs: Any) List[str][source]#

Insert more texts through the embeddings and add to the VectorStore.

Parameters
  • texts – Iterable of strings to add to the VectorStore.

  • ids – Optional list of ids to associate with the texts.

  • batch_size – Batch size of insertion

  • metadata – Optional column data to be inserted

Returns

List of ids from adding the texts into the VectorStore.

drop() None[source]#

Helper function: Drop data

escape_str(value: str) str[source]#
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, config: Optional[langchain.vectorstores.clickhouse.ClickhouseSettings] = None, text_ids: Optional[Iterable[str]] = None, batch_size: int = 32, **kwargs: Any) langchain.vectorstores.clickhouse.Clickhouse[source]#

Create ClickHouse wrapper with existing texts

Parameters
  • embedding_function (Embeddings) – Function to extract text embedding

  • texts (Iterable[str]) – List or tuple of strings to be added

  • config (ClickHouseSettings, Optional) – ClickHouse configuration

  • text_ids (Optional[Iterable], optional) – IDs for the texts. Defaults to None.

  • batch_size (int, optional) – Batchsize when transmitting data to ClickHouse. Defaults to 32.

  • metadata (List[dict], optional) – metadata to texts. Defaults to None.

  • into (Other keyword arguments will pass) – [clickhouse-connect](https://clickhouse.com/docs/en/integrations/python#clickhouse-connect-driver-api)

Returns

ClickHouse Index

property metadata_column: str#

Perform a similarity search with ClickHouse

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of Documents

Return type

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Perform a similarity search with ClickHouse by vectors

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of (Document, similarity)

Return type

List[Document]

similarity_search_with_relevance_scores(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Perform a similarity search with ClickHouse

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of documents

Return type

List[Document]

pydantic settings langchain.vectorstores.ClickhouseSettings[source]#

ClickHouse Client Configuration

Attribute:
clickhouse_host (str)An URL to connect to MyScale backend.

Defaults to ‘localhost’.

clickhouse_port (int) : URL port to connect with HTTP. Defaults to 8443. username (str) : Username to login. Defaults to None. password (str) : Password to login. Defaults to None. index_type (str): index type string. index_param (list): index build parameter. index_query_params(dict): index query parameters. database (str) : Database name to find the table. Defaults to ‘default’. table (str) : Table name to operate on.

Defaults to ‘vector_table’.

metric (str)Metric to compute distance,

supported are (‘angular’, ‘euclidean’, ‘manhattan’, ‘hamming’, ‘dot’). Defaults to ‘angular’. spotify/annoy

column_map (Dict)Column type map to project column name onto langchain

semantics. Must have keys: text, id, vector, must be same size to number of columns. For example: .. code-block:: python

{

‘id’: ‘text_id’, ‘uuid’: ‘global_unique_id’ ‘embedding’: ‘text_embedding’, ‘document’: ‘text_plain’, ‘metadata’: ‘metadata_dictionary_in_json’,

}

Defaults to identity map.

Show JSON schema
{
   "title": "ClickhouseSettings",
   "description": "ClickHouse Client Configuration\n\nAttribute:\n    clickhouse_host (str) : An URL to connect to MyScale backend.\n                         Defaults to 'localhost'.\n    clickhouse_port (int) : URL port to connect with HTTP. Defaults to 8443.\n    username (str) : Username to login. Defaults to None.\n    password (str) : Password to login. Defaults to None.\n    index_type (str): index type string.\n    index_param (list): index build parameter.\n    index_query_params(dict): index query parameters.\n    database (str) : Database name to find the table. Defaults to 'default'.\n    table (str) : Table name to operate on.\n                  Defaults to 'vector_table'.\n    metric (str) : Metric to compute distance,\n                   supported are ('angular', 'euclidean', 'manhattan', 'hamming',\n                   'dot'). Defaults to 'angular'.\n                   https://github.com/spotify/annoy/blob/main/src/annoymodule.cc#L149-L169\n\n    column_map (Dict) : Column type map to project column name onto langchain\n                        semantics. Must have keys: `text`, `id`, `vector`,\n                        must be same size to number of columns. For example:\n                        .. code-block:: python\n\n                            {\n                                'id': 'text_id',\n                                'uuid': 'global_unique_id'\n                                'embedding': 'text_embedding',\n                                'document': 'text_plain',\n                                'metadata': 'metadata_dictionary_in_json',\n                            }\n\n                        Defaults to identity map.",
   "type": "object",
   "properties": {
      "host": {
         "title": "Host",
         "default": "localhost",
         "env_names": "{'clickhouse_host'}",
         "type": "string"
      },
      "port": {
         "title": "Port",
         "default": 8123,
         "env_names": "{'clickhouse_port'}",
         "type": "integer"
      },
      "username": {
         "title": "Username",
         "env_names": "{'clickhouse_username'}",
         "type": "string"
      },
      "password": {
         "title": "Password",
         "env_names": "{'clickhouse_password'}",
         "type": "string"
      },
      "index_type": {
         "title": "Index Type",
         "default": "annoy",
         "env_names": "{'clickhouse_index_type'}",
         "type": "string"
      },
      "index_param": {
         "title": "Index Param",
         "default": [
            100,
            "'L2Distance'"
         ],
         "env_names": "{'clickhouse_index_param'}",
         "anyOf": [
            {
               "type": "array",
               "items": {}
            },
            {
               "type": "object"
            }
         ]
      },
      "index_query_params": {
         "title": "Index Query Params",
         "default": {},
         "env_names": "{'clickhouse_index_query_params'}",
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "column_map": {
         "title": "Column Map",
         "default": {
            "id": "id",
            "uuid": "uuid",
            "document": "document",
            "embedding": "embedding",
            "metadata": "metadata"
         },
         "env_names": "{'clickhouse_column_map'}",
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "database": {
         "title": "Database",
         "default": "default",
         "env_names": "{'clickhouse_database'}",
         "type": "string"
      },
      "table": {
         "title": "Table",
         "default": "langchain",
         "env_names": "{'clickhouse_table'}",
         "type": "string"
      },
      "metric": {
         "title": "Metric",
         "default": "angular",
         "env_names": "{'clickhouse_metric'}",
         "type": "string"
      }
   },
   "additionalProperties": false
}

Config
  • env_file: str = .env

  • env_file_encoding: str = utf-8

  • env_prefix: str = clickhouse_

Fields
  • column_map (Dict[str, str])

  • database (str)

  • host (str)

  • index_param (Optional[Union[List, Dict]])

  • index_query_params (Dict[str, str])

  • index_type (str)

  • metric (str)

  • password (Optional[str])

  • port (int)

  • table (str)

  • username (Optional[str])

field column_map: Dict[str, str] = {'document': 'document', 'embedding': 'embedding', 'id': 'id', 'metadata': 'metadata', 'uuid': 'uuid'}#
field database: str = 'default'#
field host: str = 'localhost'#
field index_param: Optional[Union[List, Dict]] = [100, "'L2Distance'"]#
field index_query_params: Dict[str, str] = {}#
field index_type: str = 'annoy'#
field metric: str = 'angular'#
field password: Optional[str] = None#
field port: int = 8123#
field table: str = 'langchain'#
field username: Optional[str] = None#
class langchain.vectorstores.DeepLake(dataset_path: str = './deeplake/', token: Optional[str] = None, embedding_function: Optional[langchain.embeddings.base.Embeddings] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, num_workers: int = 0, verbose: bool = True, **kwargs: Any)[source]#

Wrapper around Deep Lake, a data lake for deep learning applications.

We implement naive similarity search and filtering for fast prototyping, but it can be extended with Tensor Query Language (TQL) for production use cases over billion rows.

Why Deep Lake?

  • Not only stores embeddings, but also the original data with version control.

  • Serverless, doesn’t require another service and can be used with major

    cloud providers (S3, GCS, etc.)

  • More than just a multi-modal vector store. You can use the dataset

    to fine-tune your own LLM models.

To use, you should have the deeplake python package installed.

Example

from langchain.vectorstores import DeepLake
from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = DeepLake("langchain_store", embeddings.embed_query)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts (Iterable[str]) – Texts to add to the vectorstore.

  • metadatas (Optional[List[dict]], optional) – Optional list of metadatas.

  • ids (Optional[List[str]], optional) – Optional list of IDs.

Returns

List of IDs of the added texts.

Return type

List[str]

delete(ids: Any[List[str], None] = None, filter: Any[Dict[str, str], None] = None, delete_all: Any[bool, None] = None) bool[source]#

Delete the entities in the dataset

Parameters
  • ids (Optional[List[str]], optional) – The document_ids to delete. Defaults to None.

  • filter (Optional[Dict[str, str]], optional) – The filter to delete by. Defaults to None.

  • delete_all (Optional[bool], optional) – Whether to drop the dataset. Defaults to None.

delete_dataset() None[source]#

Delete the collection.

classmethod force_delete_by_path(path: str) None[source]#

Force delete dataset by path

classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, dataset_path: str = './deeplake/', **kwargs: Any) langchain.vectorstores.deeplake.DeepLake[source]#

Create a Deep Lake dataset from a raw documents.

If a dataset_path is specified, the dataset will be persisted in that location, otherwise by default at ./deeplake

Parameters
  • path (str, pathlib.Path) –

    • The full path to the dataset. Can be:

    • Deep Lake cloud path of the form hub://username/dataset_name.

      To write to Deep Lake cloud datasets, ensure that you are logged in to Deep Lake (use ‘activeloop login’ from command line)

    • AWS S3 path of the form s3://bucketname/path/to/dataset.

      Credentials are required in either the environment

    • Google Cloud Storage path of the form

      gcs://bucketname/path/to/dataset Credentials are required in either the environment

    • Local file system path of the form ./path/to/dataset or

      ~/path/to/dataset or path/to/dataset.

    • In-memory path of the form mem://path/to/dataset which doesn’t

      save the dataset, but keeps it in memory instead. Should be used only for testing as it does not persist.

  • documents (List[Document]) – List of documents to add.

  • embedding (Optional[Embeddings]) – Embedding function. Defaults to None.

  • metadatas (Optional[List[dict]]) – List of metadatas. Defaults to None.

  • ids (Optional[List[str]]) – List of document IDs. Defaults to None.

Returns

Deep Lake dataset.

Return type

DeepLake

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree

of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

persist() None[source]#

Persist the collection.

Return docs most similar to query.

Parameters
  • query – text to embed and run the query on.

  • k – Number of Documents to return. Defaults to 4.

  • query – Text to look up documents similar to.

  • embedding – Embedding function to use. Defaults to None.

  • k – Number of Documents to return. Defaults to 4.

  • distance_metricL2 for Euclidean, L1 for Nuclear, max L-infinity distance, cos for cosine similarity, ‘dot’ for dot product Defaults to L2.

  • filter – Attribute filter by metadata example {‘key’: ‘value’}. Defaults to None.

  • maximal_marginal_relevance – Whether to use maximal marginal relevance. Defaults to False.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.

  • return_score – Whether to return the score. Defaults to False.

Returns

List of Documents most similar to the query vector.

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query vector.

similarity_search_with_score(query: str, distance_metric: str = 'L2', k: int = 4, filter: Optional[Dict[str, str]] = None) List[Tuple[langchain.schema.Document, float]][source]#

Run similarity search with Deep Lake with distance returned.

Parameters
  • query (str) – Query text to search for.

  • distance_metricL2 for Euclidean, L1 for Nuclear, max L-infinity distance, cos for cosine similarity, ‘dot’ for dot product. Defaults to L2.

  • k (int) – Number of results to return. Defaults to 4.

  • filter (Optional[Dict[str, str]]) – Filter by metadata. Defaults to None.

Returns

List of documents most similar to the query

text with distance in float.

Return type

List[Tuple[Document, float]]

class langchain.vectorstores.DocArrayHnswSearch(doc_index: BaseDocIndex, embedding: langchain.embeddings.base.Embeddings)[source]#

Wrapper around HnswLib storage.

To use it, you should have the docarray package with version >=0.32.0 installed. You can install it with pip install “langchain[docarray]”.

classmethod from_params(embedding: langchain.embeddings.base.Embeddings, work_dir: str, n_dim: int, dist_metric: Literal['cosine', 'ip', 'l2'] = 'cosine', max_elements: int = 1024, index: bool = True, ef_construction: int = 200, ef: int = 10, M: int = 16, allow_replace_deleted: bool = True, num_threads: int = 1, **kwargs: Any) langchain.vectorstores.docarray.hnsw.DocArrayHnswSearch[source]#

Initialize DocArrayHnswSearch store.

Parameters
  • embedding (Embeddings) – Embedding function.

  • work_dir (str) – path to the location where all the data will be stored.

  • n_dim (int) – dimension of an embedding.

  • dist_metric (str) – Distance metric for DocArrayHnswSearch can be one of: “cosine”, “ip”, and “l2”. Defaults to “cosine”.

  • max_elements (int) – Maximum number of vectors that can be stored. Defaults to 1024.

  • index (bool) – Whether an index should be built for this field. Defaults to True.

  • ef_construction (int) – defines a construction time/accuracy trade-off. Defaults to 200.

  • ef (int) – parameter controlling query time/accuracy trade-off. Defaults to 10.

  • M (int) – parameter that defines the maximum number of outgoing connections in the graph. Defaults to 16.

  • allow_replace_deleted (bool) – Enables replacing of deleted elements with new added ones. Defaults to True.

  • num_threads (int) – Sets the number of cpu threads to use. Defaults to 1.

  • **kwargs – Other keyword arguments to be passed to the get_doc_cls method.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, work_dir: Optional[str] = None, n_dim: Optional[int] = None, **kwargs: Any) langchain.vectorstores.docarray.hnsw.DocArrayHnswSearch[source]#

Create an DocArrayHnswSearch store and insert data.

Parameters
  • texts (List[str]) – Text data.

  • embedding (Embeddings) – Embedding function.

  • metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.

  • work_dir (str) – path to the location where all the data will be stored.

  • n_dim (int) – dimension of an embedding.

  • **kwargs – Other keyword arguments to be passed to the __init__ method.

Returns

DocArrayHnswSearch Vector Store

class langchain.vectorstores.DocArrayInMemorySearch(doc_index: BaseDocIndex, embedding: langchain.embeddings.base.Embeddings)[source]#

Wrapper around in-memory storage for exact search.

To use it, you should have the docarray package with version >=0.32.0 installed. You can install it with pip install “langchain[docarray]”.

classmethod from_params(embedding: langchain.embeddings.base.Embeddings, metric: Literal['cosine_sim', 'euclidian_dist', 'sgeuclidean_dist'] = 'cosine_sim', **kwargs: Any) langchain.vectorstores.docarray.in_memory.DocArrayInMemorySearch[source]#

Initialize DocArrayInMemorySearch store.

Parameters
  • embedding (Embeddings) – Embedding function.

  • metric (str) – metric for exact nearest-neighbor search. Can be one of: “cosine_sim”, “euclidean_dist” and “sqeuclidean_dist”. Defaults to “cosine_sim”.

  • **kwargs – Other keyword arguments to be passed to the get_doc_cls method.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, **kwargs: Any) langchain.vectorstores.docarray.in_memory.DocArrayInMemorySearch[source]#

Create an DocArrayInMemorySearch store and insert data.

Parameters
  • texts (List[str]) – Text data.

  • embedding (Embeddings) – Embedding function.

  • metadatas (Optional[List[Dict[Any, Any]]]) – Metadata for each text if it exists. Defaults to None.

  • metric (str) – metric for exact nearest-neighbor search. Can be one of: “cosine_sim”, “euclidean_dist” and “sqeuclidean_dist”. Defaults to “cosine_sim”.

Returns

DocArrayInMemorySearch Vector Store

class langchain.vectorstores.ElasticVectorSearch(elasticsearch_url: str, index_name: str, embedding: langchain.embeddings.base.Embeddings, *, ssl_verify: Optional[Dict[str, Any]] = None)[source]#

Wrapper around Elasticsearch as a vector database.

To connect to an Elasticsearch instance that does not require login credentials, pass the Elasticsearch URL and index name along with the embedding object to the constructor.

Example

from langchain import ElasticVectorSearch
from langchain.embeddings import OpenAIEmbeddings

embedding = OpenAIEmbeddings()
elastic_vector_search = ElasticVectorSearch(
    elasticsearch_url="http://localhost:9200",
    index_name="test_index",
    embedding=embedding
)

To connect to an Elasticsearch instance that requires login credentials, including Elastic Cloud, use the Elasticsearch URL format https://username:password@es_host:9243. For example, to connect to Elastic Cloud, create the Elasticsearch URL with the required authentication details and pass it to the ElasticVectorSearch constructor as the named parameter elasticsearch_url.

You can obtain your Elastic Cloud URL and login credentials by logging in to the Elastic Cloud console at https://cloud.elastic.co, selecting your deployment, and navigating to the “Deployments” page.

To obtain your Elastic Cloud password for the default “elastic” user:

  1. Log in to the Elastic Cloud console at https://cloud.elastic.co

  2. Go to “Security” > “Users”

  3. Locate the “elastic” user and click “Edit”

  4. Click “Reset password”

  5. Follow the prompts to reset the password

The format for Elastic Cloud URLs is https://username:password@cluster_id.region_id.gcp.cloud.es.io:9243.

Example

from langchain import ElasticVectorSearch
from langchain.embeddings import OpenAIEmbeddings

embedding = OpenAIEmbeddings()

elastic_host = "cluster_id.region_id.gcp.cloud.es.io"
elasticsearch_url = f"https://username:password@{elastic_host}:9243"
elastic_vector_search = ElasticVectorSearch(
    elasticsearch_url=elasticsearch_url,
    index_name="test_index",
    embedding=embedding
)
Parameters
  • elasticsearch_url (str) – The URL for the Elasticsearch instance.

  • index_name (str) – The name of the Elasticsearch index for the embeddings.

  • embedding (Embeddings) – An object that provides the ability to embed text. It should be an instance of a class that subclasses the Embeddings abstract base class, such as OpenAIEmbeddings()

Raises

ValueError – If the elasticsearch python package is not installed.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, refresh_indices: bool = True, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • refresh_indices – bool to refresh ElasticSearch indices

Returns

List of ids from adding the texts into the vectorstore.

create_index(client: Any, index_name: str, mapping: Dict) None[source]#
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, elasticsearch_url: Optional[str] = None, index_name: Optional[str] = None, refresh_indices: bool = True, **kwargs: Any) langchain.vectorstores.elastic_vector_search.ElasticVectorSearch[source]#

Construct ElasticVectorSearch wrapper from raw documents.

This is a user-friendly interface that:
  1. Embeds documents.

  2. Creates a new index for the embeddings in the Elasticsearch instance.

  3. Adds the documents to the newly created Elasticsearch index.

This is intended to be a quick way to get started.

Example

from langchain import ElasticVectorSearch
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
elastic_vector_search = ElasticVectorSearch.from_texts(
    texts,
    embeddings,
    elasticsearch_url="http://localhost:9200"
)

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

class langchain.vectorstores.FAISS(embedding_function: typing.Callable, index: typing.Any, docstore: langchain.docstore.base.Docstore, index_to_docstore_id: typing.Dict[int, str], relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_relevance_score_fn>, normalize_L2: bool = False)[source]#

Wrapper around FAISS vector database.

To use, you should have the faiss python package installed.

Example

from langchain import FAISS
faiss = FAISS(embedding_function, index, docstore, index_to_docstore_id)
add_embeddings(text_embeddings: Iterable[Tuple[str, List[float]]], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • text_embeddings – Iterable pairs of string and embedding to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of unique IDs.

Returns

List of ids from adding the texts into the vectorstore.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of unique IDs.

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_embeddings(text_embeddings: List[Tuple[str, List[float]]], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) langchain.vectorstores.faiss.FAISS[source]#

Construct FAISS wrapper from raw documents.

This is a user friendly interface that:
  1. Embeds documents.

  2. Creates an in memory docstore

  3. Initializes the FAISS database

This is intended to be a quick way to get started.

Example

from langchain import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
text_embeddings = embeddings.embed_documents(texts)
text_embedding_pairs = list(zip(texts, text_embeddings))
faiss = FAISS.from_embeddings(text_embedding_pairs, embeddings)
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) langchain.vectorstores.faiss.FAISS[source]#

Construct FAISS wrapper from raw documents.

This is a user friendly interface that:
  1. Embeds documents.

  2. Creates an in memory docstore

  3. Initializes the FAISS database

This is intended to be a quick way to get started.

Example

from langchain import FAISS
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
faiss = FAISS.from_texts(texts, embeddings)
classmethod load_local(folder_path: str, embeddings: langchain.embeddings.base.Embeddings, index_name: str = 'index') langchain.vectorstores.faiss.FAISS[source]#

Load FAISS index, docstore, and index_to_docstore_id from disk.

Parameters
  • folder_path – folder path to load index, docstore, and index_to_docstore_id from.

  • embeddings – Embeddings to use when generating queries

  • index_name – for saving with a specific index file name

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

merge_from(target: langchain.vectorstores.faiss.FAISS) None[source]#

Merge another FAISS object with the current one.

Add the target FAISS to the current one.

Parameters

target – FAISS object you wish to merge into the current one

Returns

None.

save_local(folder_path: str, index_name: str = 'index') None[source]#

Save FAISS index, docstore, and index_to_docstore_id to disk.

Parameters
  • folder_path – folder path to save index, docstore, and index_to_docstore_id to.

  • index_name – for saving with a specific index file name

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the embedding.

similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of documents most similar to the query text with L2 distance in float. Lower score represents more similarity.

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • embedding – Embedding vector to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of documents most similar to the query text and L2 distance in float for each. Lower score represents more similarity.

class langchain.vectorstores.LanceDB(connection: Any, embedding: langchain.embeddings.base.Embeddings, vector_key: Optional[str] = 'vector', id_key: Optional[str] = 'id', text_key: Optional[str] = 'text')[source]#

Wrapper around LanceDB vector database.

To use, you should have lancedb python package installed.

Example

db = lancedb.connect('./lancedb')
table = db.open_table('my_table')
vectorstore = LanceDB(table, embedding_function)
vectorstore.add_texts(['text1', 'text2'])
result = vectorstore.similarity_search('text1')
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Turn texts into embedding and add it to the database

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of ids to associate with the texts.

Returns

List of ids of the added texts.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, connection: Any = None, vector_key: Optional[str] = 'vector', id_key: Optional[str] = 'id', text_key: Optional[str] = 'text', **kwargs: Any) langchain.vectorstores.lancedb.LanceDB[source]#

Return VectorStore initialized from texts and embeddings.

Return documents most similar to the query

Parameters
  • query – String to query the vectorstore with.

  • k – Number of documents to return.

Returns

List of documents most similar to the query.

class langchain.vectorstores.MatchingEngine(project_id: str, index: MatchingEngineIndex, endpoint: MatchingEngineIndexEndpoint, embedding: Embeddings, gcs_client: storage.Client, gcs_bucket_name: str, credentials: Optional[Credentials] = None)[source]#

Vertex Matching Engine implementation of the vector store.

While the embeddings are stored in the Matching Engine, the embedded documents will be stored in GCS.

An existing Index and corresponding Endpoint are preconditions for using this module.

See usage in docs/modules/indexes/vectorstores/examples/matchingengine.ipynb

Note that this implementation is mostly meant for reading if you are planning to do a real time implementation. While reading is a real time operation, updating the index takes close to one hour.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters.

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_components(project_id: str, region: str, gcs_bucket_name: str, index_id: str, endpoint_id: str, credentials_path: Optional[str] = None, embedding: Optional[langchain.embeddings.base.Embeddings] = None) langchain.vectorstores.matching_engine.MatchingEngine[source]#

Takes the object creation out of the constructor.

Parameters
  • project_id – The GCP project id.

  • region – The default location making the API calls. It must have

  • regional. (the same location as the GCS bucket and must be) –

  • gcs_bucket_name – The location where the vectors will be stored in

  • created. (order for the index to be) –

  • index_id – The id of the created index.

  • endpoint_id – The id of the created endpoint.

  • credentials_path – (Optional) The path of the Google credentials on

  • system. (the local file) –

  • embedding – The Embeddings that will be used for

  • texts. (embedding the) –

Returns

A configured MatchingEngine with the texts added to the index.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.matching_engine.MatchingEngine[source]#

Use from components instead.

Return docs most similar to query.

Parameters
  • query – The string that will be used to search for similar documents.

  • k – The amount of neighbors that will be retrieved.

Returns

A list of k matching documents.

class langchain.vectorstores.Milvus(embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'LangChainCollection', connection_args: Optional[dict[str, Any]] = None, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]#

Wrapper around the Milvus vector database.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, timeout: Optional[int] = None, batch_size: int = 1000, **kwargs: Any) List[str][source]#

Insert text data into Milvus.

Inserting data when the collection has not be made yet will result in creating a new Collection. The data of the first entity decides the schema of the new collection, the dim is extracted from the first embedding and the columns are decided by the first metadata dict. Metada keys will need to be present for all inserted values. At the moment there is no None equivalent in Milvus.

Parameters
  • texts (Iterable[str]) – The texts to embed, it is assumed that they all fit in memory.

  • metadatas (Optional[List[dict]]) – Metadata dicts attached to each of the texts. Defaults to None.

  • timeout (Optional[int]) – Timeout for each batch insert. Defaults to None.

  • batch_size (int, optional) – Batch size to use for insertion. Defaults to 1000.

Raises

MilvusException – Failure to add texts

Returns

The resulting keys for each inserted element.

Return type

List[str]

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'LangChainCollection', connection_args: dict[str, Any] = {'host': 'localhost', 'password': '', 'port': '19530', 'secure': False, 'user': ''}, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: bool = False, **kwargs: Any) langchain.vectorstores.milvus.Milvus[source]#

Create a Milvus collection, indexes it with HNSW, and insert data.

Parameters
  • texts (List[str]) – Text data.

  • embedding (Embeddings) – Embedding function.

  • metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.

  • collection_name (str, optional) – Collection name to use. Defaults to “LangChainCollection”.

  • connection_args (dict[str, Any], optional) – Connection args to use. Defaults to DEFAULT_MILVUS_CONNECTION.

  • consistency_level (str, optional) – Which consistency level to use. Defaults to “Session”.

  • index_params (Optional[dict], optional) – Which index_params to use. Defaults to None.

  • search_params (Optional[dict], optional) – Which search params to use. Defaults to None.

  • drop_old (Optional[bool], optional) – Whether to drop the collection with that name if it exists. Defaults to False.

Returns

Milvus Vector Store

Return type

Milvus

Perform a search and return results that are reordered by MMR.

Parameters
  • query (str) – The text being searched.

  • k (int, optional) – How many results to give. Defaults to 4.

  • fetch_k (int, optional) – Total results to select k from. Defaults to 20.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5

  • param (dict, optional) – The search params for the specified index. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Returns

Document results for search.

Return type

List[Document]

max_marginal_relevance_search_by_vector(embedding: list[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Perform a search and return results that are reordered by MMR.

Parameters
  • embedding (str) – The embedding vector being searched.

  • k (int, optional) – How many results to give. Defaults to 4.

  • fetch_k (int, optional) – Total results to select k from. Defaults to 20.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5

  • param (dict, optional) – The search params for the specified index. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Returns

Document results for search.

Return type

List[Document]

Perform a similarity search against the query string.

Parameters
  • query (str) – The text to search.

  • k (int, optional) – How many results to return. Defaults to 4.

  • param (dict, optional) – The search params for the index type. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Returns

Document results for search.

Return type

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Perform a similarity search against the query string.

Parameters
  • embedding (List[float]) – The embedding vector to search.

  • k (int, optional) – How many results to return. Defaults to 4.

  • param (dict, optional) – The search params for the index type. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Returns

Document results for search.

Return type

List[Document]

similarity_search_with_score(query: str, k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Perform a search on a query string and return results with score.

For more information about the search parameters, take a look at the pymilvus documentation found here: https://milvus.io/api-reference/pymilvus/v2.2.6/Collection/search().md

Parameters
  • query (str) – The text being searched.

  • k (int, optional) – The amount of results ot return. Defaults to 4.

  • param (dict) – The search params for the specified index. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Return type

List[float], List[Tuple[Document, any, any]]

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, param: Optional[dict] = None, expr: Optional[str] = None, timeout: Optional[int] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Perform a search on a query string and return results with score.

For more information about the search parameters, take a look at the pymilvus documentation found here: https://milvus.io/api-reference/pymilvus/v2.2.6/Collection/search().md

Parameters
  • embedding (List[float]) – The embedding vector being searched.

  • k (int, optional) – The amount of results ot return. Defaults to 4.

  • param (dict) – The search params for the specified index. Defaults to None.

  • expr (str, optional) – Filtering expression. Defaults to None.

  • timeout (int, optional) – How long to wait before timeout error. Defaults to None.

  • kwargs – Collection.search() keyword arguments.

Returns

Result doc and score.

Return type

List[Tuple[Document, float]]

class langchain.vectorstores.MongoDBAtlasVectorSearch(collection: Collection[MongoDBDocumentType], embedding: Embeddings, *, index_name: str = 'default', text_key: str = 'text', embedding_key: str = 'embedding')[source]#

Wrapper around MongoDB Atlas Vector Search.

To use, you should have both: - the pymongo python package installed - a connection string associated with a MongoDB Atlas Cluster having deployed an

Atlas Search index

Example

from langchain.vectorstores import MongoDBAtlasVectorSearch
from langchain.embeddings.openai import OpenAIEmbeddings
from pymongo import MongoClient

mongo_client = MongoClient("<YOUR-CONNECTION-STRING>")
collection = mongo_client["<db_name>"]["<collection_name>"]
embeddings = OpenAIEmbeddings()
vectorstore = MongoDBAtlasVectorSearch(collection, embeddings)
add_texts(texts: Iterable[str], metadatas: Optional[List[Dict[str, Any]]] = None, **kwargs: Any) List[source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_connection_string(connection_string: str, namespace: str, embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.mongodb_atlas.MongoDBAtlasVectorSearch[source]#
classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, collection: Optional[Collection[MongoDBDocumentType]] = None, **kwargs: Any) MongoDBAtlasVectorSearch[source]#

Construct MongoDBAtlasVectorSearch wrapper from raw documents.

This is a user-friendly interface that:
  1. Embeds documents.

  2. Adds the documents to a provided MongoDB Atlas Vector Search index

    (Lucene)

This is intended to be a quick way to get started.

Example

Return MongoDB documents most similar to query.

Use the knnBeta Operator available in MongoDB Atlas Search This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes. For more: https://www.mongodb.com/docs/atlas/atlas-search/knn-beta

Parameters
  • query – Text to look up documents similar to.

  • k – Optional Number of Documents to return. Defaults to 4.

  • pre_filter – Optional Dictionary of argument(s) to prefilter on document fields.

  • post_filter_pipeline – Optional Pipeline of MongoDB aggregation stages following the knnBeta search.

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score(query: str, *, k: int = 4, pre_filter: Optional[dict] = None, post_filter_pipeline: Optional[List[Dict]] = None) List[Tuple[langchain.schema.Document, float]][source]#

Return MongoDB documents most similar to query, along with scores.

Use the knnBeta Operator available in MongoDB Atlas Search This feature is in early access and available only for evaluation purposes, to validate functionality, and to gather feedback from a small closed group of early access users. It is not recommended for production deployments as we may introduce breaking changes. For more: https://www.mongodb.com/docs/atlas/atlas-search/knn-beta

Parameters
  • query – Text to look up documents similar to.

  • k – Optional Number of Documents to return. Defaults to 4.

  • pre_filter – Optional Dictionary of argument(s) to prefilter on document fields.

  • post_filter_pipeline – Optional Pipeline of MongoDB aggregation stages following the knnBeta search.

Returns

List of Documents most similar to the query and score for each

class langchain.vectorstores.MyScale(embedding: langchain.embeddings.base.Embeddings, config: Optional[langchain.vectorstores.myscale.MyScaleSettings] = None, **kwargs: Any)[source]#

Wrapper around MyScale vector database

You need a clickhouse-connect python package, and a valid account to connect to MyScale.

MyScale can not only search with simple vector indexes, it also supports complex query with multiple conditions, constraints and even sub-queries.

For more information, please visit

[myscale official site](https://docs.myscale.com/en/overview/)

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, batch_size: int = 32, ids: Optional[Iterable[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • ids – Optional list of ids to associate with the texts.

  • batch_size – Batch size of insertion

  • metadata – Optional column data to be inserted

Returns

List of ids from adding the texts into the vectorstore.

drop() None[source]#

Helper function: Drop data

escape_str(value: str) str[source]#
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[Dict[Any, Any]]] = None, config: Optional[langchain.vectorstores.myscale.MyScaleSettings] = None, text_ids: Optional[Iterable[str]] = None, batch_size: int = 32, **kwargs: Any) langchain.vectorstores.myscale.MyScale[source]#

Create Myscale wrapper with existing texts

Parameters
  • embedding_function (Embeddings) – Function to extract text embedding

  • texts (Iterable[str]) – List or tuple of strings to be added

  • config (MyScaleSettings, Optional) – Myscale configuration

  • text_ids (Optional[Iterable], optional) – IDs for the texts. Defaults to None.

  • batch_size (int, optional) – Batchsize when transmitting data to MyScale. Defaults to 32.

  • metadata (List[dict], optional) – metadata to texts. Defaults to None.

  • into (Other keyword arguments will pass) – [clickhouse-connect](https://clickhouse.com/docs/en/integrations/python#clickhouse-connect-driver-api)

Returns

MyScale Index

property metadata_column: str#

Perform a similarity search with MyScale

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of Documents

Return type

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[langchain.schema.Document][source]#

Perform a similarity search with MyScale by vectors

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of (Document, similarity)

Return type

List[Document]

similarity_search_with_relevance_scores(query: str, k: int = 4, where_str: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Perform a similarity search with MyScale

Parameters
  • query (str) – query string

  • k (int, optional) – Top K neighbors to retrieve. Defaults to 4.

  • where_str (Optional[str], optional) – where condition string. Defaults to None.

  • NOTE – Please do not let end-user to fill this and always be aware of SQL injection. When dealing with metadatas, remember to use {self.metadata_column}.attribute instead of attribute alone. The default name for it is metadata.

Returns

List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.

Return type

List[Document]

pydantic settings langchain.vectorstores.MyScaleSettings[source]#

MyScale Client Configuration

Attribute:
myscale_host (str)An URL to connect to MyScale backend.

Defaults to ‘localhost’.

myscale_port (int) : URL port to connect with HTTP. Defaults to 8443. username (str) : Username to login. Defaults to None. password (str) : Password to login. Defaults to None. index_type (str): index type string. index_param (dict): index build parameter. database (str) : Database name to find the table. Defaults to ‘default’. table (str) : Table name to operate on.

Defaults to ‘vector_table’.

metric (str)Metric to compute distance,

supported are (‘l2’, ‘cosine’, ‘ip’). Defaults to ‘cosine’.

column_map (Dict)Column type map to project column name onto langchain

semantics. Must have keys: text, id, vector, must be same size to number of columns. For example: .. code-block:: python

{

‘id’: ‘text_id’, ‘vector’: ‘text_embedding’, ‘text’: ‘text_plain’, ‘metadata’: ‘metadata_dictionary_in_json’,

}

Defaults to identity map.

Show JSON schema
{
   "title": "MyScaleSettings",
   "description": "MyScale Client Configuration\n\nAttribute:\n    myscale_host (str) : An URL to connect to MyScale backend.\n                         Defaults to 'localhost'.\n    myscale_port (int) : URL port to connect with HTTP. Defaults to 8443.\n    username (str) : Username to login. Defaults to None.\n    password (str) : Password to login. Defaults to None.\n    index_type (str): index type string.\n    index_param (dict): index build parameter.\n    database (str) : Database name to find the table. Defaults to 'default'.\n    table (str) : Table name to operate on.\n                  Defaults to 'vector_table'.\n    metric (str) : Metric to compute distance,\n                   supported are ('l2', 'cosine', 'ip'). Defaults to 'cosine'.\n    column_map (Dict) : Column type map to project column name onto langchain\n                        semantics. Must have keys: `text`, `id`, `vector`,\n                        must be same size to number of columns. For example:\n                        .. code-block:: python\n\n                            {\n                                'id': 'text_id',\n                                'vector': 'text_embedding',\n                                'text': 'text_plain',\n                                'metadata': 'metadata_dictionary_in_json',\n                            }\n\n                        Defaults to identity map.",
   "type": "object",
   "properties": {
      "host": {
         "title": "Host",
         "default": "localhost",
         "env_names": "{'myscale_host'}",
         "type": "string"
      },
      "port": {
         "title": "Port",
         "default": 8443,
         "env_names": "{'myscale_port'}",
         "type": "integer"
      },
      "username": {
         "title": "Username",
         "env_names": "{'myscale_username'}",
         "type": "string"
      },
      "password": {
         "title": "Password",
         "env_names": "{'myscale_password'}",
         "type": "string"
      },
      "index_type": {
         "title": "Index Type",
         "default": "IVFFLAT",
         "env_names": "{'myscale_index_type'}",
         "type": "string"
      },
      "index_param": {
         "title": "Index Param",
         "env_names": "{'myscale_index_param'}",
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "column_map": {
         "title": "Column Map",
         "default": {
            "id": "id",
            "text": "text",
            "vector": "vector",
            "metadata": "metadata"
         },
         "env_names": "{'myscale_column_map'}",
         "type": "object",
         "additionalProperties": {
            "type": "string"
         }
      },
      "database": {
         "title": "Database",
         "default": "default",
         "env_names": "{'myscale_database'}",
         "type": "string"
      },
      "table": {
         "title": "Table",
         "default": "langchain",
         "env_names": "{'myscale_table'}",
         "type": "string"
      },
      "metric": {
         "title": "Metric",
         "default": "cosine",
         "env_names": "{'myscale_metric'}",
         "type": "string"
      }
   },
   "additionalProperties": false
}

Config
  • env_file: str = .env

  • env_file_encoding: str = utf-8

  • env_prefix: str = myscale_

Fields
  • column_map (Dict[str, str])

  • database (str)

  • host (str)

  • index_param (Optional[Dict[str, str]])

  • index_type (str)

  • metric (str)

  • password (Optional[str])

  • port (int)

  • table (str)

  • username (Optional[str])

field column_map: Dict[str, str] = {'id': 'id', 'metadata': 'metadata', 'text': 'text', 'vector': 'vector'}#
field database: str = 'default'#
field host: str = 'localhost'#
field index_param: Optional[Dict[str, str]] = None#
field index_type: str = 'IVFFLAT'#
field metric: str = 'cosine'#
field password: Optional[str] = None#
field port: int = 8443#
field table: str = 'langchain'#
field username: Optional[str] = None#
class langchain.vectorstores.OpenSearchVectorSearch(opensearch_url: str, index_name: str, embedding_function: langchain.embeddings.base.Embeddings, **kwargs: Any)[source]#

Wrapper around OpenSearch as a vector database.

Example

from langchain import OpenSearchVectorSearch
opensearch_vector_search = OpenSearchVectorSearch(
    "http://localhost:9200",
    "embeddings",
    embedding_function
)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, bulk_size: int = 500, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • bulk_size – Bulk API request count; Default: 500

Returns

List of ids from adding the texts into the vectorstore.

Optional Args:

vector_field: Document field embeddings are stored in. Defaults to “vector_field”.

text_field: Document field the text of the document is stored in. Defaults to “text”.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, bulk_size: int = 500, **kwargs: Any) langchain.vectorstores.opensearch_vector_search.OpenSearchVectorSearch[source]#

Construct OpenSearchVectorSearch wrapper from raw documents.

Example

from langchain import OpenSearchVectorSearch
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
opensearch_vector_search = OpenSearchVectorSearch.from_texts(
    texts,
    embeddings,
    opensearch_url="http://localhost:9200"
)

OpenSearch by default supports Approximate Search powered by nmslib, faiss and lucene engines recommended for large datasets. Also supports brute force search through Script Scoring and Painless Scripting.

Optional Args:

vector_field: Document field embeddings are stored in. Defaults to “vector_field”.

text_field: Document field the text of the document is stored in. Defaults to “text”.

Optional Keyword Args for Approximate Search:

engine: “nmslib”, “faiss”, “lucene”; default: “nmslib”

space_type: “l2”, “l1”, “cosinesimil”, “linf”, “innerproduct”; default: “l2”

ef_search: Size of the dynamic list used during k-NN searches. Higher values lead to more accurate but slower searches; default: 512

ef_construction: Size of the dynamic list used during k-NN graph creation. Higher values lead to more accurate graph but slower indexing speed; default: 512

m: Number of bidirectional links created for each new element. Large impact on memory consumption. Between 2 and 100; default: 16

Keyword Args for Script Scoring or Painless Scripting:

is_appx_search: False

Return docs most similar to query.

By default supports Approximate Search. Also supports Script Scoring and Painless Scripting.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

Optional Args:

vector_field: Document field embeddings are stored in. Defaults to “vector_field”.

text_field: Document field the text of the document is stored in. Defaults to “text”.

metadata_field: Document field that metadata is stored in. Defaults to “metadata”. Can be set to a special value “*” to include the entire document.

Optional Args for Approximate Search:

search_type: “approximate_search”; default: “approximate_search”

boolean_filter: A Boolean filter consists of a Boolean query that contains a k-NN query and a filter.

subquery_clause: Query clause on the knn vector field; default: “must”

lucene_filter: the Lucene algorithm decides whether to perform an exact k-NN search with pre-filtering or an approximate search with modified post-filtering.

Optional Args for Script Scoring Search:

search_type: “script_scoring”; default: “approximate_search”

space_type: “l2”, “l1”, “linf”, “cosinesimil”, “innerproduct”, “hammingbit”; default: “l2”

pre_filter: script_score query to pre-filter documents before identifying nearest neighbors; default: {“match_all”: {}}

Optional Args for Painless Scripting Search:

search_type: “painless_scripting”; default: “approximate_search”

space_type: “l2Squared”, “l1Norm”, “cosineSimilarity”; default: “l2Squared”

pre_filter: script_score query to pre-filter documents before identifying nearest neighbors; default: {“match_all”: {}}

similarity_search_with_score(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return docs and it’s scores most similar to query.

By default supports Approximate Search. Also supports Script Scoring and Painless Scripting.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents along with its scores most similar to the query.

Optional Args:

same as similarity_search

class langchain.vectorstores.Pinecone(index: Any, embedding_function: Callable, text_key: str, namespace: Optional[str] = None)[source]#

Wrapper around Pinecone vector database.

To use, you should have the pinecone-client python package installed.

Example

from langchain.vectorstores import Pinecone
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

# The environment should be the one specified next to the API key
# in your Pinecone console
pinecone.init(api_key="***", environment="...")
index = pinecone.Index("langchain-demo")
embeddings = OpenAIEmbeddings()
vectorstore = Pinecone(index, embeddings.embed_query, "text")
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, namespace: Optional[str] = None, batch_size: int = 32, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of ids to associate with the texts.

  • namespace – Optional pinecone namespace to add the texts to.

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_existing_index(index_name: str, embedding: langchain.embeddings.base.Embeddings, text_key: str = 'text', namespace: Optional[str] = None) langchain.vectorstores.pinecone.Pinecone[source]#

Load pinecone vectorstore from index name.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, batch_size: int = 32, text_key: str = 'text', index_name: Optional[str] = None, namespace: Optional[str] = None, **kwargs: Any) langchain.vectorstores.pinecone.Pinecone[source]#

Construct Pinecone wrapper from raw documents.

This is a user friendly interface that:
  1. Embeds documents.

  2. Adds the documents to a provided Pinecone index

This is intended to be a quick way to get started.

Example

from langchain import Pinecone
from langchain.embeddings import OpenAIEmbeddings
import pinecone

# The environment should be the one specified next to the API key
# in your Pinecone console
pinecone.init(api_key="***", environment="...")
embeddings = OpenAIEmbeddings()
pinecone = Pinecone.from_texts(
    texts,
    embeddings,
    index_name="langchain-demo"
)

Return pinecone documents most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – Dictionary of argument(s) to filter on metadata

  • namespace – Namespace to search in. Default will search in ‘’ namespace.

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score(query: str, k: int = 4, filter: Optional[dict] = None, namespace: Optional[str] = None) List[Tuple[langchain.schema.Document, float]][source]#

Return pinecone documents most similar to query, along with scores.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – Dictionary of argument(s) to filter on metadata

  • namespace – Namespace to search in. Default will search in ‘’ namespace.

Returns

List of Documents most similar to the query and score for each

class langchain.vectorstores.Qdrant(client: Any, collection_name: str, embeddings: Optional[langchain.embeddings.base.Embeddings] = None, content_payload_key: str = 'page_content', metadata_payload_key: str = 'metadata', embedding_function: Optional[Callable] = None)[source]#

Wrapper around Qdrant vector database.

To use you should have the qdrant-client package installed.

Example

from qdrant_client import QdrantClient
from langchain import Qdrant

client = QdrantClient()
collection_name = "MyCollection"
qdrant = Qdrant(client, collection_name, embedding_function)
CONTENT_KEY = 'page_content'#
METADATA_KEY = 'metadata'#
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[Sequence[str]] = None, batch_size: int = 64, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of ids to associate with the texts. Ids have to be uuid-like strings.

  • batch_size – How many vectors upload per-request. Default: 64

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[Sequence[str]] = None, location: Optional[str] = None, url: Optional[str] = None, port: Optional[int] = 6333, grpc_port: int = 6334, prefer_grpc: bool = False, https: Optional[bool] = None, api_key: Optional[str] = None, prefix: Optional[str] = None, timeout: Optional[float] = None, host: Optional[str] = None, path: Optional[str] = None, collection_name: Optional[str] = None, distance_func: str = 'Cosine', content_payload_key: str = 'page_content', metadata_payload_key: str = 'metadata', batch_size: int = 64, shard_number: Optional[int] = None, replication_factor: Optional[int] = None, write_consistency_factor: Optional[int] = None, on_disk_payload: Optional[bool] = None, hnsw_config: Optional[common_types.HnswConfigDiff] = None, optimizers_config: Optional[common_types.OptimizersConfigDiff] = None, wal_config: Optional[common_types.WalConfigDiff] = None, quantization_config: Optional[common_types.QuantizationConfig] = None, init_from: Optional[common_types.InitFrom] = None, **kwargs: Any) Qdrant[source]#

Construct Qdrant wrapper from a list of texts.

Parameters
  • texts – A list of texts to be indexed in Qdrant.

  • embedding – A subclass of Embeddings, responsible for text vectorization.

  • metadatas – An optional list of metadata. If provided it has to be of the same length as a list of texts.

  • ids – Optional list of ids to associate with the texts. Ids have to be uuid-like strings.

  • location – If :memory: - use in-memory Qdrant instance. If str - use it as a url parameter. If None - fallback to relying on host and port parameters.

  • url – either host or str of “Optional[scheme], host, Optional[port], Optional[prefix]”. Default: None

  • port – Port of the REST API interface. Default: 6333

  • grpc_port – Port of the gRPC interface. Default: 6334

  • prefer_grpc – If true - use gPRC interface whenever possible in custom methods. Default: False

  • https – If true - use HTTPS(SSL) protocol. Default: None

  • api_key – API key for authentication in Qdrant Cloud. Default: None

  • prefix

    If not None - add prefix to the REST URL path. Example: service/v1 will result in

    http://localhost:6333/service/v1/{qdrant-endpoint} for REST API.

    Default: None

  • timeout – Timeout for REST and gRPC API requests. Default: 5.0 seconds for REST and unlimited for gRPC

  • host – Host name of Qdrant service. If url and host are None, set to ‘localhost’. Default: None

  • path – Path in which the vectors will be stored while using local mode. Default: None

  • collection_name – Name of the Qdrant collection to be used. If not provided, it will be created randomly. Default: None

  • distance_func – Distance function. One of: “Cosine” / “Euclid” / “Dot”. Default: “Cosine”

  • content_payload_key – A payload key used to store the content of the document. Default: “page_content”

  • metadata_payload_key – A payload key used to store the metadata of the document. Default: “metadata”

  • batch_size – How many vectors upload per-request. Default: 64

  • shard_number – Number of shards in collection. Default is 1, minimum is 1.

  • replication_factor – Replication factor for collection. Default is 1, minimum is 1. Defines how many copies of each shard will be created. Have effect only in distributed mode.

  • write_consistency_factor – Write consistency factor for collection. Default is 1, minimum is 1. Defines how many replicas should apply the operation for us to consider it successful. Increasing this number will make the collection more resilient to inconsistencies, but will also make it fail if not enough replicas are available. Does not have any performance impact. Have effect only in distributed mode.

  • on_disk_payload – If true - point`s payload will not be stored in memory. It will be read from the disk every time it is requested. This setting saves RAM by (slightly) increasing the response time. Note: those payload values that are involved in filtering and are indexed - remain in RAM.

  • hnsw_config – Params for HNSW index

  • optimizers_config – Params for optimizer

  • wal_config – Params for Write-Ahead-Log

  • quantization_config – Params for quantization, if None - quantization will be disabled

  • init_from – Use data stored in another collection to initialize this collection

  • **kwargs – Additional arguments passed directly into REST client initialization

This is a user-friendly interface that: 1. Creates embeddings, one for each text 2. Initializes the Qdrant database as an in-memory docstore by default

(and overridable to a remote docstore)

  1. Adds the text embeddings to the Qdrant database

This is intended to be a quick way to get started.

Example

from langchain import Qdrant
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
qdrant = Qdrant.from_texts(texts, embeddings, "localhost")

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm. Defaults to 20.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – Filter by metadata. Defaults to None.

  • search_params – Additional search params

  • offset – Offset of the first result to return. May be used to paginate results. Note: large offset values may cause performance issues.

  • score_threshold – Define a minimal score threshold for the result. If defined, less similar results will not be returned. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.

  • consistency

    Read consistency of the search. Defines how many replicas should be queried before returning the result. Values: - int - number of replicas to query, values should present in all

    queried replicas

    • ’majority’ - query all replicas, but return values present in the

      majority of replicas

    • ’quorum’ - query the majority of replicas, return values present in

      all of them

    • ’all’ - query all replicas, and return values present in all replicas

Returns

List of Documents most similar to the query.

similarity_search_with_score(query: str, k: int = 4, filter: Optional[MetadataFilter] = None, search_params: Optional[common_types.SearchParams] = None, offset: int = 0, score_threshold: Optional[float] = None, consistency: Optional[common_types.ReadConsistency] = None, **kwargs: Any) List[Tuple[Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – Filter by metadata. Defaults to None.

  • search_params – Additional search params

  • offset – Offset of the first result to return. May be used to paginate results. Note: large offset values may cause performance issues.

  • score_threshold – Define a minimal score threshold for the result. If defined, less similar results will not be returned. Score of the returned result might be higher or smaller than the threshold depending on the Distance function used. E.g. for cosine similarity only higher scores will be returned.

  • consistency

    Read consistency of the search. Defines how many replicas should be queried before returning the result. Values: - int - number of replicas to query, values should present in all

    queried replicas

    • ’majority’ - query all replicas, but return values present in the

      majority of replicas

    • ’quorum’ - query the majority of replicas, return values present in

      all of them

    • ’all’ - query all replicas, and return values present in all replicas

Returns

List of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.

class langchain.vectorstores.Redis(redis_url: str, index_name: str, embedding_function: typing.Callable, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_relevance_score>, **kwargs: typing.Any)[source]#

Wrapper around Redis vector database.

To use, you should have the redis python package installed.

Example

from langchain.vectorstores import Redis
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()
vectorstore = Redis(
    redis_url="redis://username:password@localhost:6379"
    index_name="my-index",
    embedding_function=embeddings.embed_query,
)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, embeddings: Optional[List[List[float]]] = None, keys: Optional[List[str]] = None, batch_size: int = 1000, **kwargs: Any) List[str][source]#

Add more texts to the vectorstore.

Parameters
  • texts (Iterable[str]) – Iterable of strings/text to add to the vectorstore.

  • metadatas (Optional[List[dict]], optional) – Optional list of metadatas. Defaults to None.

  • embeddings (Optional[List[List[float]]], optional) – Optional pre-generated embeddings. Defaults to None.

  • keys (Optional[List[str]], optional) – Optional key values to use as ids. Defaults to None.

  • batch_size (int, optional) – Batch size to use for writes. Defaults to 1000.

Returns

List of ids added to the vectorstore

Return type

List[str]

as_retriever(**kwargs: Any) langchain.vectorstores.redis.RedisVectorStoreRetriever[source]#
static drop_index(index_name: str, delete_documents: bool, **kwargs: Any) bool[source]#

Drop a Redis search index.

Parameters
  • index_name (str) – Name of the index to drop.

  • delete_documents (bool) – Whether to drop the associated documents.

Returns

Whether or not the drop was successful.

Return type

bool

classmethod from_existing_index(embedding: langchain.embeddings.base.Embeddings, index_name: str, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', **kwargs: Any) langchain.vectorstores.redis.Redis[source]#

Connect to an existing Redis index.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: Optional[str] = None, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', **kwargs: Any) langchain.vectorstores.redis.Redis[source]#

Create a Redis vectorstore from raw documents. This is a user-friendly interface that:

  1. Embeds documents.

  2. Creates a new index for the embeddings in Redis.

  3. Adds the documents to the newly created Redis index.

This is intended to be a quick way to get started. .. rubric:: Example

classmethod from_texts_return_keys(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: Optional[str] = None, content_key: str = 'content', metadata_key: str = 'metadata', vector_key: str = 'content_vector', distance_metric: Literal['COSINE', 'IP', 'L2'] = 'COSINE', **kwargs: Any) Tuple[langchain.vectorstores.redis.Redis, List[str]][source]#

Create a Redis vectorstore from raw documents. This is a user-friendly interface that:

  1. Embeds documents.

  2. Creates a new index for the embeddings in Redis.

  3. Adds the documents to the newly created Redis index.

This is intended to be a quick way to get started. .. rubric:: Example

Returns the most similar indexed documents to the query text.

Parameters
  • query (str) – The query text for which to find similar documents.

  • k (int) – The number of documents to return. Default is 4.

Returns

A list of documents that are most similar to the query text.

Return type

List[Document]

similarity_search_limit_score(query: str, k: int = 4, score_threshold: float = 0.2, **kwargs: Any) List[langchain.schema.Document][source]#

Returns the most similar indexed documents to the query text within the score_threshold range.

Parameters
  • query (str) – The query text for which to find similar documents.

  • k (int) – The number of documents to return. Default is 4.

  • score_threshold (float) – The minimum matching score required for a document

  • 0.2. (to be considered a match. Defaults to) –

  • similarity (Because the similarity calculation algorithm is based on cosine) –

:param : :param the smaller the angle: :param the higher the similarity.:

Returns

A list of documents that are most similar to the query text, including the match score for each document.

Return type

List[Document]

Note

If there are no documents that satisfy the score_threshold value, an empty list is returned.

similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query and score for each

class langchain.vectorstores.SKLearnVectorStore(embedding: langchain.embeddings.base.Embeddings, *, persist_path: Optional[str] = None, serializer: Literal['json', 'bson', 'parquet'] = 'json', metric: str = 'cosine', **kwargs: Any)[source]#

A simple in-memory vector store based on the scikit-learn library NearestNeighbors implementation.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, persist_path: Optional[str] = None, **kwargs: Any) langchain.vectorstores.sklearn.SKLearnVectorStore[source]#

Return VectorStore initialized from texts and embeddings.

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param query: Text to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree

of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance. Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents. :param embedding: Embedding to look up documents similar to. :param k: Number of Documents to return. Defaults to 4. :param fetch_k: Number of Documents to fetch to pass to MMR algorithm. :param lambda_mult: Number between 0 and 1 that determines the degree

of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

persist() None[source]#

Return docs most similar to query.

similarity_search_with_score(query: str, *, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#
class langchain.vectorstores.SingleStoreDB(embedding: langchain.embeddings.base.Embeddings, *, table_name: str = 'embeddings', content_field: str = 'content', metadata_field: str = 'metadata', vector_field: str = 'vector', pool_size: int = 5, max_overflow: int = 10, timeout: float = 30, **kwargs: Any)[source]#

This class serves as a Pythonic interface to the SingleStore DB database. The prerequisite for using this class is the installation of the singlestoredb Python package.

The SingleStoreDB vectorstore can be created by providing an embedding function and the relevant parameters for the database connection, connection pool, and optionally, the names of the table and the fields to use.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, embeddings: Optional[List[List[float]]] = None, **kwargs: Any) List[str][source]#

Add more texts to the vectorstore.

Parameters
  • texts (Iterable[str]) – Iterable of strings/text to add to the vectorstore.

  • metadatas (Optional[List[dict]], optional) – Optional list of metadatas. Defaults to None.

  • embeddings (Optional[List[List[float]]], optional) – Optional pre-generated embeddings. Defaults to None.

Returns

empty list

Return type

List[str]

as_retriever(**kwargs: Any) langchain.vectorstores.singlestoredb.SingleStoreDBRetriever[source]#
connection_kwargs#

Create connection pool.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, table_name: str = 'embeddings', content_field: str = 'content', metadata_field: str = 'metadata', vector_field: str = 'vector', pool_size: int = 5, max_overflow: int = 10, timeout: float = 30, **kwargs: Any) langchain.vectorstores.singlestoredb.SingleStoreDB[source]#

Create a SingleStoreDB vectorstore from raw documents. This is a user-friendly interface that:

  1. Embeds documents.

  2. Creates a new table for the embeddings in SingleStoreDB.

  3. Adds the documents to the newly created table.

This is intended to be a quick way to get started. .. rubric:: Example

Returns the most similar indexed documents to the query text.

Uses cosine similarity.

Parameters
  • query (str) – The query text for which to find similar documents.

  • k (int) – The number of documents to return. Default is 4.

Returns

A list of documents that are most similar to the query text.

Return type

List[Document]

similarity_search_with_score(query: str, k: int = 4) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query. Uses cosine similarity.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query and score for each

vector_field#

Pass the rest of the kwargs to the connection.

class langchain.vectorstores.SupabaseVectorStore(client: supabase.client.Client, embedding: Embeddings, table_name: str, query_name: Union[str, None] = None)[source]#

VectorStore for a Supabase postgres database. Assumes you have the pgvector extension installed and a match_documents (or similar) function. For more details: https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/supabase

You can implement your own match_documents function in order to limit the search space to a subset of documents based on your own authorization or business logic.

Note that the Supabase Python client does not yet support async operations.

If you’d like to use max_marginal_relevance_search, please review the instructions below on modifying the match_documents function to return matched embeddings.

add_texts(texts: Iterable[str], metadatas: Optional[List[dict[Any, Any]]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

add_vectors(vectors: List[List[float]], documents: List[langchain.schema.Document]) List[str][source]#
classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, client: Optional[supabase.client.Client] = None, table_name: Optional[str] = 'documents', query_name: Union[str, None] = 'match_documents', **kwargs: Any) SupabaseVectorStore[source]#

Return VectorStore initialized from texts and embeddings.

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search requires that query_name returns matched embeddings alongside the match documents. The following function demonstrates how to do this:

```sql CREATE FUNCTION match_documents_embeddings(query_embedding vector(1536),

match_count int)

RETURNS TABLE(

id bigint, content text, metadata jsonb, embedding vector(1536), similarity float)

LANGUAGE plpgsql AS $$ # variable_conflict use_column

BEGIN

RETURN query SELECT

id, content, metadata, embedding, 1 -(docstore.embedding <=> query_embedding) AS similarity

FROM

docstore

ORDER BY

docstore.embedding <=> query_embedding

LIMIT match_count;

END; $$; ```

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

query_name: str#

Return docs most similar to query.

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query vector.

similarity_search_by_vector_returning_embeddings(query: List[float], k: int) List[Tuple[langchain.schema.Document, float, numpy.ndarray[numpy.float32, Any]]][source]#
similarity_search_by_vector_with_relevance_scores(query: List[float], k: int) List[Tuple[langchain.schema.Document, float]][source]#
similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters
  • query – input text

  • k – Number of Documents to return. Defaults to 4.

  • **kwargs

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs

Returns

List of Tuples of (doc, similarity_score)

table_name: str#
class langchain.vectorstores.Tair(embedding_function: langchain.embeddings.base.Embeddings, url: str, index_name: str, content_key: str = 'content', metadata_key: str = 'metadata', search_params: Optional[dict] = None, **kwargs: Any)[source]#
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Add texts data to an existing index.

create_index_if_not_exist(dim: int, distance_type: str, index_type: str, data_type: str, **kwargs: Any) bool[source]#
static drop_index(index_name: str = 'langchain', **kwargs: Any) bool[source]#

Drop an existing index.

Parameters

index_name (str) – Name of the index to drop.

Returns

True if the index is dropped successfully.

Return type

bool

classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair[source]#

Return VectorStore initialized from documents and embeddings.

classmethod from_existing_index(embedding: langchain.embeddings.base.Embeddings, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair[source]#

Connect to an existing Tair index.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, index_name: str = 'langchain', content_key: str = 'content', metadata_key: str = 'metadata', **kwargs: Any) langchain.vectorstores.tair.Tair[source]#

Return VectorStore initialized from texts and embeddings.

Returns the most similar indexed documents to the query text.

Parameters
  • query (str) – The query text for which to find similar documents.

  • k (int) – The number of documents to return. Default is 4.

Returns

A list of documents that are most similar to the query text.

Return type

List[Document]

class langchain.vectorstores.Tigris(client: TigrisClient, embeddings: Embeddings, index_name: str)[source]#
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of ids for documents. Ids will be autogenerated if not provided.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, client: Optional[TigrisClient] = None, index_name: Optional[str] = None, **kwargs: Any) Tigris[source]#

Return VectorStore initialized from texts and embeddings.

property search_index: TigrisVectorStore#

Return docs most similar to query.

similarity_search_with_score(query: str, k: int = 4, filter: Optional[TigrisFilter] = None) List[Tuple[Document, float]][source]#

Run similarity search with Chroma with distance.

Parameters
  • query (str) – Query text to search for.

  • k (int) – Number of results to return. Defaults to 4.

  • filter (Optional[TigrisFilter]) – Filter by metadata. Defaults to None.

Returns

List of documents most similar to the query

text with distance in float.

Return type

List[Tuple[Document, float]]

class langchain.vectorstores.Typesense(typesense_client: Client, embedding: Embeddings, *, typesense_collection_name: Optional[str] = None, text_key: str = 'text')[source]#

Wrapper around Typesense vector search.

To use, you should have the typesense python package installed.

Example

from langchain.embedding.openai import OpenAIEmbeddings
from langchain.vectorstores import Typesense
import typesense

node = {
    "host": "localhost",  # For Typesense Cloud use xxx.a1.typesense.net
    "port": "8108",       # For Typesense Cloud use 443
    "protocol": "http"    # For Typesense Cloud use https
}
typesense_client = typesense.Client(
    {
      "nodes": [node],
      "api_key": "<API_KEY>",
      "connection_timeout_seconds": 2
    }
)
typesense_collection_name = "langchain-memory"

embedding = OpenAIEmbeddings()
vectorstore = Typesense(
    typesense_client,
    typesense_collection_name,
    embedding.embed_query,
    "text",
)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embedding and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • ids – Optional list of ids to associate with the texts.

Returns

List of ids from adding the texts into the vectorstore.

classmethod from_client_params(embedding: langchain.embeddings.base.Embeddings, *, host: str = 'localhost', port: Union[str, int] = '8108', protocol: str = 'http', typesense_api_key: Optional[str] = None, connection_timeout_seconds: int = 2, **kwargs: Any) langchain.vectorstores.typesense.Typesense[source]#

Initialize Typesense directly from client parameters.

Example

from langchain.embedding.openai import OpenAIEmbeddings
from langchain.vectorstores import Typesense

# Pass in typesense_api_key as kwarg or set env var "TYPESENSE_API_KEY".
vectorstore = Typesense(
    OpenAIEmbeddings(),
    host="localhost",
    port="8108",
    protocol="http",
    typesense_collection_name="langchain-memory",
)
classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: Optional[List[dict]] = None, ids: Optional[List[str]] = None, typesense_client: Optional[Client] = None, typesense_client_params: Optional[dict] = None, typesense_collection_name: Optional[str] = None, text_key: str = 'text', **kwargs: Any) Typesense[source]#

Construct Typesense wrapper from raw text.

Return typesense documents most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – typesense filter_by expression to filter documents on

Returns

List of Documents most similar to the query and score for each

similarity_search_with_score(query: str, k: int = 4, filter: Optional[str] = '') List[Tuple[langchain.schema.Document, float]][source]#

Return typesense documents most similar to query, along with scores.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • filter – typesense filter_by expression to filter documents on

Returns

List of Documents most similar to the query and score for each

class langchain.vectorstores.Vectara(vectara_customer_id: Optional[str] = None, vectara_corpus_id: Optional[str] = None, vectara_api_key: Optional[str] = None)[source]#

Implementation of Vector Store using Vectara (https://vectara.com). .. rubric:: Example

from langchain.vectorstores import Vectara

vectorstore = Vectara(
    vectara_customer_id=vectara_customer_id,
    vectara_corpus_id=vectara_corpus_id,
    vectara_api_key=vectara_api_key
)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

Returns

List of ids from adding the texts into the vectorstore.

as_retriever(**kwargs: Any) langchain.vectorstores.vectara.VectaraRetriever[source]#
classmethod from_texts(texts: List[str], embedding: Optional[langchain.embeddings.base.Embeddings] = None, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.vectara.Vectara[source]#

Construct Vectara wrapper from raw documents. This is intended to be a quick way to get started. .. rubric:: Example

from langchain import Vectara
vectara = Vectara.from_texts(
    texts,
    vectara_customer_id=customer_id,
    vectara_corpus_id=corpus_id,
    vectara_api_key=api_key,
)

Return Vectara documents most similar to query, along with scores.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 5.

  • filter – Dictionary of argument(s) to filter on metadata. For example a filter can be “doc.rating > 3.0 and part.lang = ‘deu’”} see https://docs.vectara.com/docs/search-apis/sql/filter-overview for more details.

Returns

List of Documents most similar to the query

similarity_search_with_score(query: str, k: int = 5, alpha: float = 0.025, filter: Optional[str] = None, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return Vectara documents most similar to query, along with scores.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 5.

  • alpha – parameter for hybrid search (called “lambda” in Vectara documentation).

  • filter – Dictionary of argument(s) to filter on metadata. For example a filter can be “doc.rating > 3.0 and part.lang = ‘deu’”} see https://docs.vectara.com/docs/search-apis/sql/filter-overview for more details.

Returns

List of Documents most similar to the query and score for each.

class langchain.vectorstores.VectorStore[source]#

Interface for vector stores.

async aadd_documents(documents: List[langchain.schema.Document], **kwargs: Any) List[str][source]#

Run more documents through the embeddings and add to the vectorstore.

Parameters

(List[Document] (documents) – Documents to add to the vectorstore.

Returns

List of IDs of the added texts.

Return type

List[str]

async aadd_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

add_documents(documents: List[langchain.schema.Document], **kwargs: Any) List[str][source]#

Run more documents through the embeddings and add to the vectorstore.

Parameters

(List[Document] (documents) – Documents to add to the vectorstore.

Returns

List of IDs of the added texts.

Return type

List[str]

abstract add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Run more texts through the embeddings and add to the vectorstore.

Parameters
  • texts – Iterable of strings to add to the vectorstore.

  • metadatas – Optional list of metadatas associated with the texts.

  • kwargs – vectorstore specific parameters

Returns

List of ids from adding the texts into the vectorstore.

async classmethod afrom_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.base.VST[source]#

Return VectorStore initialized from documents and embeddings.

async classmethod afrom_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.base.VST[source]#

Return VectorStore initialized from texts and embeddings.

Return docs selected using the maximal marginal relevance.

async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

as_retriever(**kwargs: Any) langchain.vectorstores.base.VectorStoreRetriever[source]#
async asearch(query: str, search_type: str, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to query using specified search type.

Return docs most similar to query.

async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return docs most similar to query.

classmethod from_documents(documents: List[langchain.schema.Document], embedding: langchain.embeddings.base.Embeddings, **kwargs: Any) langchain.vectorstores.base.VST[source]#

Return VectorStore initialized from documents and embeddings.

abstract classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.base.VST[source]#

Return VectorStore initialized from texts and embeddings.

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

search(query: str, search_type: str, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to query using specified search type.

Return docs most similar to query.

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to embedding vector.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query vector.

similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters
  • query – input text

  • k – Number of Documents to return. Defaults to 4.

  • **kwargs

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs

Returns

List of Tuples of (doc, similarity_score)

class langchain.vectorstores.Weaviate(client: typing.Any, index_name: str, text_key: str, embedding: typing.Optional[langchain.embeddings.base.Embeddings] = None, attributes: typing.Optional[typing.List[str]] = None, relevance_score_fn: typing.Optional[typing.Callable[[float], float]] = <function _default_score_normalizer>, by_text: bool = True)[source]#

Wrapper around Weaviate vector database.

To use, you should have the weaviate-client python package installed.

Example

import weaviate
from langchain.vectorstores import Weaviate
client = weaviate.Client(url=os.environ["WEAVIATE_URL"], ...)
weaviate = Weaviate(client, index_name, text_key)
add_texts(texts: Iterable[str], metadatas: Optional[List[dict]] = None, **kwargs: Any) List[str][source]#

Upload texts with metadata (properties) to Weaviate.

classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, **kwargs: Any) langchain.vectorstores.weaviate.Weaviate[source]#

Construct Weaviate wrapper from raw documents.

This is a user-friendly interface that:
  1. Embeds documents.

  2. Creates a new index for the embeddings in the Weaviate instance.

  3. Adds the documents to the newly created Weaviate index.

This is intended to be a quick way to get started.

Example

from langchain.vectorstores.weaviate import Weaviate
from langchain.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
weaviate = Weaviate.from_texts(
    texts,
    embeddings,
    weaviate_url="http://localhost:8080"
)

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters
  • embedding – Embedding to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

  • fetch_k – Number of Documents to fetch to pass to MMR algorithm.

  • lambda_mult – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

Returns

List of Documents selected by maximal marginal relevance.

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

similarity_search_by_text(query: str, k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Return docs most similar to query.

Parameters
  • query – Text to look up documents similar to.

  • k – Number of Documents to return. Defaults to 4.

Returns

List of Documents most similar to the query.

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[langchain.schema.Document][source]#

Look up similar documents by embedding vector in Weaviate.

similarity_search_with_score(query: str, k: int = 4, **kwargs: Any) List[Tuple[langchain.schema.Document, float]][source]#

Return list of documents most similar to the query text and cosine distance in float for each. Lower score represents more similarity.

class langchain.vectorstores.Zilliz(embedding_function: langchain.embeddings.base.Embeddings, collection_name: str = 'LangChainCollection', connection_args: Optional[dict[str, Any]] = None, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: Optional[bool] = False)[source]#
classmethod from_texts(texts: List[str], embedding: langchain.embeddings.base.Embeddings, metadatas: Optional[List[dict]] = None, collection_name: str = 'LangChainCollection', connection_args: dict[str, Any] = {}, consistency_level: str = 'Session', index_params: Optional[dict] = None, search_params: Optional[dict] = None, drop_old: bool = False, **kwargs: Any) langchain.vectorstores.zilliz.Zilliz[source]#

Create a Zilliz collection, indexes it with HNSW, and insert data.

Parameters
  • texts (List[str]) – Text data.

  • embedding (Embeddings) – Embedding function.

  • metadatas (Optional[List[dict]]) – Metadata for each text if it exists. Defaults to None.

  • collection_name (str, optional) – Collection name to use. Defaults to “LangChainCollection”.

  • connection_args (dict[str, Any], optional) – Connection args to use. Defaults to DEFAULT_MILVUS_CONNECTION.

  • consistency_level (str, optional) – Which consistency level to use. Defaults to “Session”.

  • index_params (Optional[dict], optional) – Which index_params to use. Defaults to None.

  • search_params (Optional[dict], optional) – Which search params to use. Defaults to None.

  • drop_old (Optional[bool], optional) – Whether to drop the collection with that name if it exists. Defaults to False.

Returns

Zilliz Vector Store

Return type

Zilliz