Skip to main content


Recall, understand, and extract data from chat histories. Power personalized AI experiences.

Zep is a long-term memory service for AI Assistant apps. With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant, while also reducing hallucinations, latency, and cost.

How Zep works​

Zep persists and recalls chat histories, and automatically generates summaries and other artifacts from these chat histories. It also embeds messages and summaries, enabling you to search Zep for relevant context from past conversations. Zep does all of this asynchronously, ensuring these operations don't impact your user's chat experience. Data is persisted to database, allowing you to scale out when growth demands.

Zep also provides a simple, easy to use abstraction for document vector search called Document Collections. This is designed to complement Zep's core memory features, but is not designed to be a general purpose vector database.

Zep allows you to be more intentional about constructing your prompt:

  • automatically adding a few recent messages, with the number customized for your app;
  • a summary of recent conversations prior to the messages above;
  • and/or contextually relevant summaries or messages surfaced from the entire chat session.
  • and/or relevant Business data from Zep Document Collections.

What is Zep Cloud?​

Zep Cloud is a managed service with Zep Open Source at its core. In addition to Zep Open Source's memory management features, Zep Cloud offers:

  • Fact Extraction: Automatically build fact tables from conversations, without having to define a data schema upfront.
  • Dialog Classification: Instantly and accurately classify chat dialog. Understand user intent and emotion, segment users, and more. Route chains based on semantic context, and trigger events.
  • Structured Data Extraction: Quickly extract business data from chat conversations using a schema you define. Understand what your Assistant should ask for next in order to complete its task.

Interested in Zep Cloud? See Zep Cloud Installation Guide, Zep Cloud Message History Example, Zep Cloud Vector Store Example

Open Source Installation and Setup​

Zep Open Source project:

Zep Open Source Docs:

  1. Install the Zep service. See the Zep Quick Start Guide.

  2. Install the Zep Python SDK:

pip install zep_python


Zep's Memory API persists your app's chat history and metadata to a Session, enriches the memory, automatically generates summaries, and enables vector similarity search over historical chat messages and summaries.

There are two approaches to populating your prompt with chat history:

  1. Retrieve the most recent N messages (and potentionally a summary) from a Session and use them to construct your prompt.
  2. Search over the Session's chat history for messages that are relevant and use them to construct your prompt.

Both of these approaches may be useful, with the first providing the LLM with context as to the most recent interactions with a human. The second approach enables you to look back further in the chat history and retrieve messages that are relevant to the current conversation in a token-efficient manner.

from langchain.memory import ZepMemory

API Reference:

See a RAG App Example here.


Zep's Memory Retriever is a LangChain Retriever that enables you to retrieve messages from a Zep Session and use them to construct your prompt.

The Retriever supports searching over both individual messages and summaries of conversations. The latter is useful for providing rich, but succinct context to the LLM as to relevant past conversations.

Zep's Memory Retriever supports both similarity search and Maximum Marginal Relevance (MMR) reranking. MMR search is useful for ensuring that the retrieved messages are diverse and not too similar to each other

See a usage example.

from langchain_community.retrievers import ZepRetriever

API Reference:

Vector store​

Zep's Document VectorStore API enables you to store and retrieve documents using vector similarity search. Zep doesn't require you to understand distance functions, types of embeddings, or indexing best practices. You just pass in your chunked documents, and Zep handles the rest.

Zep supports both similarity search and Maximum Marginal Relevance (MMR) reranking. MMR search is useful for ensuring that the retrieved documents are diverse and not too similar to each other.

from langchain_community.vectorstores import ZepVectorStore

API Reference:

See a usage example.

Help us out by providing feedback on this documentation page: