InMemoryVectorStore#

class langchain_core.vectorstores.in_memory.InMemoryVectorStore(embedding: Embeddings)[source]#

In-memory vector store implementation.

Uses a dictionary, and computes cosine similarity for search using numpy.

Setup:

Install langchain-core.

pip install -U langchain-core
Key init args — indexing params:
embedding_function: Embeddings

Embedding function to use.

Instantiate:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vector_store = InMemoryVectorStore(OpenAIEmbeddings())
Add Documents:
from langchain_core.documents import Document

document_1 = Document(id="1", page_content="foo", metadata={"baz": "bar"})
document_2 = Document(id="2", page_content="thud", metadata={"bar": "baz"})
document_3 = Document(id="3", page_content="i will be deleted :(")

documents = [document_1, document_2, document_3]
vector_store.add_documents(documents=documents)
Delete Documents:
vector_store.delete(ids=["3"])
Search:
results = vector_store.similarity_search(query="thud",k=1)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with filter:
def _filter_function(doc: Document) -> bool:
    return doc.metadata.get("bar") == "baz"

results = vector_store.similarity_search(
    query="thud", k=1, filter=_filter_function
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")
* thud [{'bar': 'baz'}]
Search with score:
results = vector_store.similarity_search_with_score(
    query="qux", k=1
)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.832268] foo [{'baz': 'bar'}]
Async:
# add documents
# await vector_store.aadd_documents(documents=documents)

# delete documents
# await vector_store.adelete(ids=["3"])

# search
# results = vector_store.asimilarity_search(query="thud", k=1)

# search with score
results = await vector_store.asimilarity_search_with_score(query="qux", k=1)
for doc,score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
* [SIM=0.832268] foo [{'baz': 'bar'}]
Use as Retriever:
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1, "fetch_k": 2, "lambda_mult": 0.5},
)
retriever.invoke("thud")
[Document(id='2', metadata={'bar': 'baz'}, page_content='thud')]

Initialize with the given embedding function.

Parameters:

embedding (Embeddings) – embedding function to use.

Attributes

embeddings

Access the query embedding object if available.

Methods

__init__(embedding)

Initialize with the given embedding function.

aadd_documents(documents[, ids])

Add documents to the store.

aadd_texts(texts[, metadatas])

Async run more texts through the embeddings and add to the vectorstore.

add_documents(documents[, ids])

Add documents to the store.

add_texts(texts[, metadatas])

Run more texts through the embeddings and add to the vectorstore.

adelete([ids])

Async delete by vector ID or other criteria.

afrom_documents(documents, embedding, **kwargs)

Async return VectorStore initialized from documents and embeddings.

afrom_texts(texts, embedding[, metadatas])

Async return VectorStore initialized from texts and embeddings.

aget_by_ids(ids, /)

Async get documents by their ids.

amax_marginal_relevance_search(query[, k, ...])

Async return docs selected using the maximal marginal relevance.

amax_marginal_relevance_search_by_vector(...)

Async return docs selected using the maximal marginal relevance.

as_retriever(**kwargs)

Return VectorStoreRetriever initialized from this VectorStore.

asearch(query, search_type, **kwargs)

Async return docs most similar to query using a specified search type.

asimilarity_search(query[, k])

Async return docs most similar to query.

asimilarity_search_by_vector(embedding[, k])

Async return docs most similar to embedding vector.

asimilarity_search_with_relevance_scores(query)

Async return docs and relevance scores in the range [0, 1].

asimilarity_search_with_score(query[, k])

Async run similarity search with distance.

aupsert(items, /, **kwargs)

Deprecated since version langchain-core==0.2.29: This was a beta API that was added in 0.2.11. It'll be removed in 0.3.0. Use VectorStore.aadd_documents instead.

delete([ids])

Delete by vector ID or other criteria.

dump(path)

Dump the vector store to a file.

from_documents(documents, embedding, **kwargs)

Return VectorStore initialized from documents and embeddings.

from_texts(texts, embedding[, metadatas])

Return VectorStore initialized from texts and embeddings.

get_by_ids(ids, /)

Get documents by their ids.

load(path, embedding, **kwargs)

Load a vector store from a file.

max_marginal_relevance_search(query[, k, ...])

Return docs selected using the maximal marginal relevance.

max_marginal_relevance_search_by_vector(...)

Return docs selected using the maximal marginal relevance.

search(query, search_type, **kwargs)

Return docs most similar to query using a specified search type.

similarity_search(query[, k])

Return docs most similar to query.

similarity_search_by_vector(embedding[, k])

Return docs most similar to embedding vector.

similarity_search_with_relevance_scores(query)

Return docs and relevance scores in the range [0, 1].

similarity_search_with_score(query[, k])

Run similarity search with distance.

similarity_search_with_score_by_vector(embedding)

upsert(items, /, **kwargs)

Deprecated since version langchain-core==0.2.29: This was a beta API that was added in 0.2.11. It'll be removed in 0.3.0. Use VectorStore.add_documents instead.

__init__(embedding: Embeddings) None[source]#

Initialize with the given embedding function.

Parameters:

embedding (Embeddings) – embedding function to use.

Return type:

None

async aadd_documents(documents: List[Document], ids: List[str] | None = None, **kwargs: Any) List[str][source]#

Add documents to the store.

Parameters:
  • documents (List[Document]) –

  • ids (List[str] | None) –

  • kwargs (Any) –

Return type:

List[str]

async aadd_texts(texts: Iterable[str], metadatas: List[dict] | None = None, **kwargs: Any) List[str]#

Async run more texts through the embeddings and add to the vectorstore.

Parameters:
  • texts (Iterable[str]) – Iterable of strings to add to the vectorstore.

  • metadatas (List[dict] | None) – Optional list of metadatas associated with the texts. Default is None.

  • **kwargs (Any) – vectorstore specific parameters.

Returns:

List of ids from adding the texts into the vectorstore.

Raises:
  • ValueError – If the number of metadatas does not match the number of texts.

  • ValueError – If the number of ids does not match the number of texts.

Return type:

List[str]

add_documents(documents: List[Document], ids: List[str] | None = None, **kwargs: Any) List[str][source]#

Add documents to the store.

Parameters:
  • documents (List[Document]) –

  • ids (List[str] | None) –

  • kwargs (Any) –

Return type:

List[str]

add_texts(texts: Iterable[str], metadatas: List[dict] | None = None, **kwargs: Any) List[str]#

Run more texts through the embeddings and add to the vectorstore.

Parameters:
  • texts (Iterable[str]) – Iterable of strings to add to the vectorstore.

  • metadatas (List[dict] | None) – Optional list of metadatas associated with the texts.

  • **kwargs (Any) – vectorstore specific parameters. One of the kwargs should be ids which is a list of ids associated with the texts.

Returns:

List of ids from adding the texts into the vectorstore.

Raises:
  • ValueError – If the number of metadatas does not match the number of texts.

  • ValueError – If the number of ids does not match the number of texts.

Return type:

List[str]

async adelete(ids: Sequence[str] | None = None, **kwargs: Any) None[source]#

Async delete by vector ID or other criteria.

Parameters:
  • ids (Sequence[str] | None) – List of ids to delete. If None, delete all. Default is None.

  • **kwargs (Any) – Other keyword arguments that subclasses might use.

Returns:

True if deletion is successful, False otherwise, None if not implemented.

Return type:

Optional[bool]

async classmethod afrom_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST#

Async return VectorStore initialized from documents and embeddings.

Parameters:
  • documents (List[Document]) – List of Documents to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • kwargs (Any) – Additional keyword arguments.

Returns:

VectorStore initialized from documents and embeddings.

Return type:

VectorStore

async classmethod afrom_texts(texts: List[str], embedding: Embeddings, metadatas: List[dict] | None = None, **kwargs: Any) InMemoryVectorStore[source]#

Async return VectorStore initialized from texts and embeddings.

Parameters:
  • texts (List[str]) – Texts to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • metadatas (List[dict] | None) – Optional list of metadatas associated with the texts. Default is None.

  • kwargs (Any) – Additional keyword arguments.

Returns:

VectorStore initialized from texts and embeddings.

Return type:

VectorStore

async aget_by_ids(ids: Sequence[str], /) List[Document][source]#

Async get documents by their ids.

Parameters:

ids (Sequence[str]) – The ids of the documents to get.

Returns:

A list of Document objects.

Return type:

List[Document]

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:
  • query (str) – Text to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • kwargs (Any) –

Returns:

List of Documents selected by maximal marginal relevance.

Return type:

List[Document]

async amax_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document]#

Async return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents selected by maximal marginal relevance.

Return type:

List[Document]

as_retriever(**kwargs: Any) VectorStoreRetriever#

Return VectorStoreRetriever initialized from this VectorStore.

Parameters:

**kwargs (Any) –

Keyword arguments to pass to the search function. Can include: search_type (Optional[str]): Defines the type of search that

the Retriever should perform. Can be “similarity” (default), “mmr”, or “similarity_score_threshold”.

search_kwargs (Optional[Dict]): Keyword arguments to pass to the
search function. Can include things like:

k: Amount of documents to return (Default: 4) score_threshold: Minimum relevance threshold

for similarity_score_threshold

fetch_k: Amount of documents to pass to MMR algorithm

(Default: 20)

lambda_mult: Diversity of results returned by MMR;

1 for minimum diversity and 0 for maximum. (Default: 0.5)

filter: Filter by document metadata

Returns:

Retriever class for VectorStore.

Return type:

VectorStoreRetriever

Examples:

# Retrieve more documents with higher diversity
# Useful if your dataset has many similar documents
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 6, 'lambda_mult': 0.25}
)

# Fetch more documents for the MMR algorithm to consider
# But only return the top 5
docsearch.as_retriever(
    search_type="mmr",
    search_kwargs={'k': 5, 'fetch_k': 50}
)

# Only retrieve documents that have a relevance score
# Above a certain threshold
docsearch.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.8}
)

# Only get the single most similar document from the dataset
docsearch.as_retriever(search_kwargs={'k': 1})

# Use a filter to only retrieve documents from a specific paper
docsearch.as_retriever(
    search_kwargs={'filter': {'paper_title':'GPT-4 Technical Report'}}
)
async asearch(query: str, search_type: str, **kwargs: Any) List[Document]#

Async return docs most similar to query using a specified search type.

Parameters:
  • query (str) – Input text.

  • search_type (str) – Type of search to perform. Can be “similarity”, “mmr”, or “similarity_score_threshold”.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query.

Raises:

ValueError – If search_type is not one of “similarity”, “mmr”, or “similarity_score_threshold”.

Return type:

List[Document]

Async return docs most similar to query.

Parameters:
  • query (str) – Input text.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query.

Return type:

List[Document]

async asimilarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document][source]#

Async return docs most similar to embedding vector.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query vector.

Return type:

List[Document]

async asimilarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]#

Async return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters:
  • query (str) – Input text.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) –

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs

Returns:

List of Tuples of (doc, similarity_score)

Return type:

List[Tuple[Document, float]]

async asimilarity_search_with_score(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]][source]#

Async run similarity search with distance.

Parameters:
  • *args – Arguments to pass to the search method.

  • **kwargs (Any) – Arguments to pass to the search method.

  • query (str) –

  • k (int) –

  • **kwargs

Returns:

List of Tuples of (doc, similarity_score).

Return type:

List[Tuple[Document, float]]

async aupsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse[source]#

Deprecated since version langchain-core==0.2.29: This was a beta API that was added in 0.2.11. It’ll be removed in 0.3.0. Use VectorStore.aadd_documents instead.

Parameters:
  • items (Sequence[Document]) –

  • kwargs (Any) –

Return type:

UpsertResponse

delete(ids: Sequence[str] | None = None, **kwargs: Any) None[source]#

Delete by vector ID or other criteria.

Parameters:
  • ids (Sequence[str] | None) – List of ids to delete. If None, delete all. Default is None.

  • **kwargs (Any) – Other keyword arguments that subclasses might use.

Returns:

True if deletion is successful, False otherwise, None if not implemented.

Return type:

Optional[bool]

dump(path: str) None[source]#

Dump the vector store to a file.

Parameters:

path (str) – The path to dump the vector store to.

Return type:

None

classmethod from_documents(documents: List[Document], embedding: Embeddings, **kwargs: Any) VST#

Return VectorStore initialized from documents and embeddings.

Parameters:
  • documents (List[Document]) – List of Documents to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • kwargs (Any) – Additional keyword arguments.

Returns:

VectorStore initialized from documents and embeddings.

Return type:

VectorStore

classmethod from_texts(texts: List[str], embedding: Embeddings, metadatas: List[dict] | None = None, **kwargs: Any) InMemoryVectorStore[source]#

Return VectorStore initialized from texts and embeddings.

Parameters:
  • texts (List[str]) – Texts to add to the vectorstore.

  • embedding (Embeddings) – Embedding function to use.

  • metadatas (List[dict] | None) – Optional list of metadatas associated with the texts. Default is None.

  • kwargs (Any) – Additional keyword arguments.

Returns:

VectorStore initialized from texts and embeddings.

Return type:

VectorStore

get_by_ids(ids: Sequence[str], /) List[Document][source]#

Get documents by their ids.

Parameters:

ids (Sequence[str]) – The ids of the documents to get.

Returns:

A list of Document objects.

Return type:

List[Document]

classmethod load(path: str, embedding: Embeddings, **kwargs: Any) InMemoryVectorStore[source]#

Load a vector store from a file.

Parameters:
  • path (str) – The path to load the vector store from.

  • embedding (Embeddings) – The embedding to use.

  • kwargs (Any) – Additional arguments to pass to the constructor.

Returns:

A VectorStore object.

Return type:

InMemoryVectorStore

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:
  • query (str) – Text to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents selected by maximal marginal relevance.

Return type:

List[Document]

max_marginal_relevance_search_by_vector(embedding: List[float], k: int = 4, fetch_k: int = 20, lambda_mult: float = 0.5, **kwargs: Any) List[Document][source]#

Return docs selected using the maximal marginal relevance.

Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • fetch_k (int) – Number of Documents to fetch to pass to MMR algorithm. Default is 20.

  • lambda_mult (float) – Number between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity. Defaults to 0.5.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents selected by maximal marginal relevance.

Return type:

List[Document]

search(query: str, search_type: str, **kwargs: Any) List[Document]#

Return docs most similar to query using a specified search type.

Parameters:
  • query (str) – Input text

  • search_type (str) – Type of search to perform. Can be “similarity”, “mmr”, or “similarity_score_threshold”.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query.

Raises:

ValueError – If search_type is not one of “similarity”, “mmr”, or “similarity_score_threshold”.

Return type:

List[Document]

Return docs most similar to query.

Parameters:
  • query (str) – Input text.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query.

Return type:

List[Document]

similarity_search_by_vector(embedding: List[float], k: int = 4, **kwargs: Any) List[Document][source]#

Return docs most similar to embedding vector.

Parameters:
  • embedding (List[float]) – Embedding to look up documents similar to.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) – Arguments to pass to the search method.

Returns:

List of Documents most similar to the query vector.

Return type:

List[Document]

similarity_search_with_relevance_scores(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]]#

Return docs and relevance scores in the range [0, 1].

0 is dissimilar, 1 is most similar.

Parameters:
  • query (str) – Input text.

  • k (int) – Number of Documents to return. Defaults to 4.

  • **kwargs (Any) –

    kwargs to be passed to similarity search. Should include: score_threshold: Optional, a floating point value between 0 to 1 to

    filter the resulting set of retrieved docs.

Returns:

List of Tuples of (doc, similarity_score).

Return type:

List[Tuple[Document, float]]

similarity_search_with_score(query: str, k: int = 4, **kwargs: Any) List[Tuple[Document, float]][source]#

Run similarity search with distance.

Parameters:
  • *args – Arguments to pass to the search method.

  • **kwargs (Any) – Arguments to pass to the search method.

  • query (str) –

  • k (int) –

  • **kwargs

Returns:

List of Tuples of (doc, similarity_score).

Return type:

List[Tuple[Document, float]]

similarity_search_with_score_by_vector(embedding: List[float], k: int = 4, filter: Callable[[Document], bool] | None = None, **kwargs: Any) List[Tuple[Document, float]][source]#
Parameters:
  • embedding (List[float]) –

  • k (int) –

  • filter (Callable[[Document], bool] | None) –

  • kwargs (Any) –

Return type:

List[Tuple[Document, float]]

upsert(items: Sequence[Document], /, **kwargs: Any) UpsertResponse[source]#

Deprecated since version langchain-core==0.2.29: This was a beta API that was added in 0.2.11. It’ll be removed in 0.3.0. Use VectorStore.add_documents instead.

Parameters:
  • items (Sequence[Document]) –

  • kwargs (Any) –

Return type:

UpsertResponse

Examples using InMemoryVectorStore