CrateDB
CrateDB is a distributed and scalable SQL database for storing and analyzing massive amounts of data in near real-time, even with complex queries. It is PostgreSQL-compatible, based on Lucene, and inheriting from Elasticsearch.
Installation and Setupโ
Setup CrateDBโ
There are two ways to get started with CrateDB quickly. Alternatively, choose other CrateDB installation options.
Start CrateDB on your local machineโ
Example: Run a single-node CrateDB instance with security disabled, using Docker or Podman. This is not recommended for production use.
docker run --name=cratedb --rm \
--publish=4200:4200 --publish=5432:5432 --env=CRATE_HEAP_SIZE=2g \
crate:latest -Cdiscovery.type=single-node
Deploy cluster on CrateDB Cloudโ
CrateDB Cloud is a managed CrateDB service. Sign up for a free trial.
Install Clientโ
Install the most recent version of the langchain-cratedb
package
and a few others that are needed for this tutorial.
pip install --upgrade langchain-cratedb langchain-openai unstructured
Documentationโ
For a more detailed walkthrough of the CrateDB wrapper, see using LangChain with CrateDB. See also all features of CrateDB to learn about other functionality provided by CrateDB.
Featuresโ
The CrateDB adapter for LangChain provides APIs to use CrateDB as vector store, document loader, and storage for chat messages.
Vector Storeโ
Use the CrateDB vector store functionality around FLOAT_VECTOR
and KNN_MATCH
for similarity search and other purposes. See also CrateDBVectorStore Tutorial.
Make sure you've configured a valid OpenAI API key.
export OPENAI_API_KEY=sk-XJZ...
from langchain_community.document_loaders import UnstructuredURLLoader
from langchain_cratedb import CrateDBVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
loader = UnstructuredURLLoader(urls=["https://github.com/langchain-ai/langchain/raw/refs/tags/langchain-core==0.3.28/docs/docs/how_to/state_of_the_union.txt"])
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"
store = CrateDBVectorStore.from_documents(
documents=docs,
embedding=embeddings,
collection_name="state_of_the_union",
connection=CONNECTION_STRING,
)
query = "What did the president say about Ketanji Brown Jackson"
docs_with_score = store.similarity_search_with_score(query)
Document Loaderโ
Load load documents from a CrateDB database table, using the document loader
CrateDBLoader
, which is based on SQLAlchemy. See also CrateDBLoader Tutorial.
To use the document loader in your applications:
import sqlalchemy as sa
from langchain_community.utilities import SQLDatabase
from langchain_cratedb import CrateDBLoader
# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"
db = SQLDatabase(engine=sa.create_engine(CONNECTION_STRING))
loader = CrateDBLoader(
'SELECT * FROM sys.summits LIMIT 42',
db=db,
)
documents = loader.load()
Chat Message Historyโ
Use CrateDB as the storage for your chat messages. See also CrateDBChatMessageHistory Tutorial.
To use the chat message history in your applications:
from langchain_cratedb import CrateDBChatMessageHistory
# Connect to a self-managed CrateDB instance on localhost.
CONNECTION_STRING = "crate://?schema=testdrive"
message_history = CrateDBChatMessageHistory(
session_id="test-session",
connection=CONNECTION_STRING,
)
message_history.add_user_message("hi!")