Momento Cache is the world’s first truly serverless caching service. It provides instant elasticity, scale-to-zero capability, and blazing-fast performance.
With Momento Cache, you grab the SDK, you get an end point, input a few lines into your code, and you’re off and running.

This page covers how to use the Momento ecosystem within LangChain.

Installation and Setup#

  • Sign up for a free account here and get an auth token

  • Install the Momento Python SDK with pip install momento


The Cache wrapper allows for Momento to be used as a serverless, distributed, low-latency cache for LLM prompts and responses.

The standard cache is the go-to use case for Momento users in any environment.

Import the cache as follows:

from langchain.cache import MomentoCache

And set up like so:

from datetime import timedelta
from momento import CacheClient, Configurations, CredentialProvider
import langchain

# Instantiate the Momento client
cache_client = CacheClient(

# Choose a Momento cache name of your choice
cache_name = "langchain"

# Instantiate the LLM cache
langchain.llm_cache = MomentoCache(cache_client, cache_name)


Momento can be used as a distributed memory store for LLMs.

Chat Message History Memory#

See this notebook for a walkthrough of how to use Momento as a memory store for chat message history.