Skip to main content

Xorbits inference (Xinference)

This notebook goes over how to use Xinference embeddings within LangChain


Install Xinference through PyPI:

%pip install --upgrade --quiet  "xinference[all]"

Deploy Xinference Locally or in a Distributed Cluster.โ€‹

For local deployment, run xinference.

To deploy Xinference in a cluster, first start an Xinference supervisor using the xinference-supervisor. You can also use the option -p to specify the port and -H to specify the host. The default port is 9997.

Then, start the Xinference workers using xinference-worker on each server you want to run them on.

You can consult the README file from Xinference for more information.


To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:

!xinference launch -n vicuna-v1.3 -f ggmlv3 -q q4_0
Model uid: 915845ee-2a04-11ee-8ed4-d29396a3f064

A model UID is returned for you to use. Now you can use Xinference embeddings with LangChain:

from langchain_community.embeddings import XinferenceEmbeddings

xinference = XinferenceEmbeddings(
server_url="", model_uid="915845ee-2a04-11ee-8ed4-d29396a3f064"

API Reference:

query_result = xinference.embed_query("This is a test query")
doc_result = xinference.embed_documents(["text A", "text B"])

Lastly, terminate the model when you do not need to use it:

!xinference terminate --model-uid "915845ee-2a04-11ee-8ed4-d29396a3f064"

Help us out by providing feedback on this documentation page: