Skip to main content

SingleStoreDB

SingleStoreDB is a high-performance distributed SQL database that supports deployment both in the cloud and on-premises. It provides vector storage, and vector functions including dot_product and euclidean_distance, thereby supporting AI applications that require text similarity matching.

This tutorial illustrates how to work with vector data in SingleStoreDB.

# Establishing a connection to the database is facilitated through the singlestoredb Python connector.
# Please ensure that this connector is installed in your working environment.
%pip install --upgrade --quiet singlestoredb
import getpass
import os

# We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import SingleStoreDB
from langchain_openai import OpenAIEmbeddings
# Load text samples
loader = TextLoader("../../modules/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)

embeddings = OpenAIEmbeddings()

There are several ways to establish a connection to the database. You can either set up environment variables or pass named parameters to the SingleStoreDB constructor. Alternatively, you may provide these parameters to the from_documents and from_texts methods.

# Setup connection url as environment variable
os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

# Load documents to the store
docsearch = SingleStoreDB.from_documents(
docs,
embeddings,
table_name="notebook", # use table with a custom name
)
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query) # Find documents that correspond to the query
print(docs[0].page_content)

Enhance your search efficiency with SingleStore DB version 8.5 or above by leveraging ANN vector indexes. By setting use_vector_index=True during vector store object creation, you can activate this feature. Additionally, if your vectors differ in dimensionality from the default OpenAI embedding size of 1536, ensure to specify the vector_size parameter accordingly.

Multi-modal Example: Leveraging CLIP and OpenClip Embeddings​

In the realm of multi-modal data analysis, the integration of diverse information types like images and text has become increasingly crucial. One powerful tool facilitating such integration is CLIP, a cutting-edge model capable of embedding both images and text into a shared semantic space. By doing so, CLIP enables the retrieval of relevant content across different modalities through similarity search.

To illustrate, let’s consider an application scenario where we aim to effectively analyze multi-modal data. In this example, we harness the capabilities of OpenClip multimodal embeddings, which leverage CLIP’s framework. With OpenClip, we can seamlessly embed textual descriptions alongside corresponding images, enabling comprehensive analysis and retrieval tasks. Whether it’s identifying visually similar images based on textual queries or finding relevant text passages associated with specific visual content, OpenClip empowers users to explore and extract insights from multi-modal data with remarkable efficiency and accuracy.

%pip install -U langchain openai singlestoredb langchain-experimental # (newest versions required for multi-modal)
import os

from langchain_community.vectorstores import SingleStoreDB
from langchain_experimental.open_clip import OpenCLIPEmbeddings

os.environ["SINGLESTOREDB_URL"] = "root:pass@localhost:3306/db"

TEST_IMAGES_DIR = "../../modules/images"

docsearch = SingleStoreDB(OpenCLIPEmbeddings())

image_uris = sorted(
[
os.path.join(TEST_IMAGES_DIR, image_name)
for image_name in os.listdir(TEST_IMAGES_DIR)
if image_name.endswith(".jpg")
]
)

# Add images
docsearch.add_images(uris=image_uris)