Let's load the llamafile Embeddings class.


First, the are 3 setup steps:

  1. Download a llamafile. In this notebook, we use TinyLlama-1.1B-Chat-v1.0.Q5_K_M but there are many others available on HuggingFace.
  2. Make the llamafile executable.
  3. Start the llamafile in server mode.

You can run the following bash script to do all this:

# llamafile setup

# Step 1: Download a llamafile. The download may take several minutes.
wget -nv -nc

# Step 2: Make the llamafile executable. Note: if you're on Windows, just append '.exe' to the filename.
chmod +x TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile

# Step 3: Start llamafile server in background. All the server logs will be written to 'tinyllama.log'.
# Alternatively, you can just open a separate terminal outside this notebook and run:
# ./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding
./TinyLlama-1.1B-Chat-v1.0.Q5_K_M.llamafile --server --nobrowser --embedding > tinyllama.log 2>&1 &
echo "${pid}" > .llamafile_pid # write the process pid to a file so we can terminate the server later

Embedding texts using LlamafileEmbeddingsโ€‹

Now, we can use the LlamafileEmbeddings class to interact with the llamafile server that's currently serving our TinyLlama model at http://localhost:8080.

from langchain_community.embeddings import LlamafileEmbeddings
API Reference:LlamafileEmbeddings
embedder = LlamafileEmbeddings()
text = "This is a test document."

To generate embeddings, you can either query an invidivual text, or you can query a list of texts.

query_result = embedder.embed_query(text)
doc_result = embedder.embed_documents([text])
# cleanup: kill the llamafile server process
kill $(cat .llamafile_pid)
rm .llamafile_pid

