This guide provides an introduction to Graph RAG. For detailed documentation of all
supported features and configurations, refer to the
Graph RAG Project Page.
The GraphRetriever from the langchain-graph-retriever package provides a LangChain
retriever that combines unstructured similarity search
on vectors with structured traversal of metadata properties. This enables graph-based
retrieval over an existing vector store.
import getpass import os ifnot os.environ.get("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"]= getpass.getpass("Enter API key for OpenAI: ") from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
This section shows how to populate a variety of vector stores with the sample data.
For help on choosing one of the vector stores below, or to add support for your
vector store, consult the documentation about
Adapters and Supported Stores.
AstraDB
Apache Cassandra
OpenSearch
Chroma
InMemory
Install the langchain-graph-retriever package with the astra extra:
pip install "langchain-graph-retriever[astra]"
Then create a vector store and load the test documents:
Apache Cassandra doesn't support searching in nested metadata. Because of this
it is necessary to use the ShreddingTransformer
when inserting documents.
Install the langchain-graph-retriever package with the opensearch extra:
Chroma doesn't support searching in nested metadata. Because of this
it is necessary to use the ShreddingTransformer
when inserting documents.
Install the langchain-graph-retriever package:
pip install "langchain-graph-retriever"
Then create a vector store and load the test documents:
from langchain_core.vectorstores import InMemoryVectorStore vector_store = InMemoryVectorStore.from_documents( documents=animals, embedding=embeddings, )
tip
Using the InMemoryVectorStore is the fastest way to get started with Graph RAG
but it isn't recommended for production use. Instead it is recommended to use
AstraDB or OpenSearch.
This graph retriever starts with a single animal that best matches the query, then
traverses to other animals sharing the same habitat and/or origin.
from graph_retriever.strategies import Eager from langchain_graph_retriever import GraphRetriever traversal_retriever = GraphRetriever( store = vector_store, edges =[("habitat","habitat"),("origin","origin")], strategy = Eager(k=5, start_k=1, max_depth=2), )
The above creates a graph traversing retriever that starts with the nearest
animal (start_k=1), retrieves 5 documents (k=5) and limits the search to documents
that are at most 2 steps away from the first animal (max_depth=2).
The edges define how metadata values can be used for traversal. In this case, every
animal is connected to other animals with the same habitat and/or origin.
results = traversal_retriever.invoke("what animals could be found near a capybara?") for doc in results: print(f"{doc.id}: {doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals. heron: herons are wading birds known for their long legs and necks, often seen near water. crocodile: crocodiles are large reptiles with powerful jaws and a long lifespan, often living over 70 years. frog: frogs are amphibians known for their jumping ability and croaking sounds. duck: ducks are waterfowl birds known for their webbed feet and quacking sounds.
Graph traversal improves retrieval quality by leveraging structured relationships in
the data. Unlike standard similarity search (see below), it provides a clear,
explainable rationale for why documents are selected.
In this case, the documents capybara, heron, frog, crocodile, and newt all
share the same habitat=wetlands, as defined by their metadata. This should increase
Document Relevance and the quality of the answer from the LLM.
This creates a retriever that starts with the nearest 5 animals (start_k=5),
and returns them without any traversal (max_depth=0). The edge definitions
are ignored in this case.
results = standard_retriever.invoke("what animals could be found near a capybara?") for doc in results: print(f"{doc.id}: {doc.page_content}")
capybara: capybaras are the largest rodents in the world and are highly social animals. iguana: iguanas are large herbivorous lizards often found basking in trees and near water. guinea pig: guinea pigs are small rodents often kept as pets due to their gentle and social nature. hippopotamus: hippopotamuses are large semi-aquatic mammals known for their massive size and territorial behavior. boar: boars are wild relatives of pigs, known for their tough hides and tusks.
These documents are joined based on similarity alone. Any structural data that existed
in the store is ignored. As compared to graph retrieval, this can decrease Document
Relevance because the returned results have a lower chance of being helpful to answer
the query.