Skip to main content


Recall, understand, and extract data from chat histories. Power personalized AI experiences.

Zep is a long-term memory service for AI Assistant apps. With Zep, you can provide AI assistants with the ability to recall past conversations, no matter how distant, while also reducing hallucinations, latency, and cost.

How Zep works​

Zep persists and recalls chat histories, and automatically generates summaries and other artifacts from these chat histories. It also embeds messages and summaries, enabling you to search Zep for relevant context from past conversations. Zep does all of this asynchronously, ensuring these operations don't impact your user's chat experience. Data is persisted to database, allowing you to scale out when growth demands.

Zep also provides a simple, easy to use abstraction for document vector search called Document Collections. This is designed to complement Zep's core memory features, but is not designed to be a general purpose vector database.

Zep allows you to be more intentional about constructing your prompt:

  • automatically adding a few recent messages, with the number customized for your app;
  • a summary of recent conversations prior to the messages above;
  • and/or contextually relevant summaries or messages surfaced from the entire chat session.
  • and/or relevant Business data from Zep Document Collections.

What is Zep Cloud?​

Zep Cloud is a managed service with Zep Open Source at its core. In addition to Zep Open Source's memory management features, Zep Cloud offers:

  • Fact Extraction: Automatically build fact tables from conversations, without having to define a data schema upfront.
  • Dialog Classification: Instantly and accurately classify chat dialog. Understand user intent and emotion, segment users, and more. Route chains based on semantic context, and trigger events.
  • Structured Data Extraction: Quickly extract business data from chat conversations using a schema you define. Understand what your Assistant should ask for next in order to complete its task.

Zep Open Source​

Zep offers an open source version with a self-hosted option. Please refer to the Zep Open Source repo for more information. You can also find Zep Open Source compatible Retriever, Vector Store and Memory examples

Zep Cloud Installation and Setup​

Zep Cloud Docs

  1. Install the Zep Cloud SDK:
pip install zep_cloud


poetry add zep_cloud


Zep's Memory API persists your users' chat history and metadata to a Session, enriches the memory, and enables vector similarity search over historical chat messages and dialog summaries.

Zep offers several approaches to populating prompts with context from historical conversations.

Perpetual Memory​

This is the default memory type. Salient facts from the dialog are extracted and stored in a Fact Table. This is updated in real-time as new messages are added to the Session. Every time you call the Memory API to get a Memory, Zep returns the Fact Table, the most recent messages (per your Message Window setting), and a summary of the most recent messages prior to the Message Window. The combination of the Fact Table, summary, and the most recent messages in a prompts provides both factual context and nuance to the LLM.

Summary Retriever Memory​

Returns the most recent messages and a summary of past messages relevant to the current conversation, enabling you to provide your Assistant with helpful context from past conversations

Message Window Buffer Memory​

Returns the most recent N messages from the current conversation.

Additionally, Zep enables vector similarity searches for Messages or Summaries stored within its system.

This feature lets you populate prompts with past conversations that are contextually similar to a specific query, organizing the results by a similarity Score.

ZepCloudChatMessageHistory and ZepCloudMemory classes can be imported to interact with Zep Cloud APIs.

ZepCloudChatMessageHistory is compatible with RunnableWithMessageHistory.

from langchain_community.chat_message_histories import ZepCloudChatMessageHistory

See a Perpetual Memory Example here.

You can use ZepCloudMemory together with agents that support Memory.

from langchain.memory import ZepCloudMemory

See a Memory RAG Example here.


Zep's Memory Retriever is a LangChain Retriever that enables you to retrieve messages from a Zep Session and use them to construct your prompt.

The Retriever supports searching over both individual messages and summaries of conversations. The latter is useful for providing rich, but succinct context to the LLM as to relevant past conversations.

Zep's Memory Retriever supports both similarity search and Maximum Marginal Relevance (MMR) reranking. MMR search is useful for ensuring that the retrieved messages are diverse and not too similar to each other

See a usage example.

from langchain_community.retrievers import ZepCloudRetriever
API Reference:ZepCloudRetriever

Vector store​

Zep's Document VectorStore API enables you to store and retrieve documents using vector similarity search. Zep doesn't require you to understand distance functions, types of embeddings, or indexing best practices. You just pass in your chunked documents, and Zep handles the rest.

Zep supports both similarity search and Maximum Marginal Relevance (MMR) reranking. MMR search is useful for ensuring that the retrieved documents are diverse and not too similar to each other.

from langchain_community.vectorstores import ZepCloudVectorStore
API Reference:ZepCloudVectorStore

See a usage example.

Was this page helpful?

You can also leave detailed feedback on GitHub.