Skip to main content

DeepInfra

DeepInfra is a serverless inference as a service that provides access to a variety of LLMs and embeddings models. This notebook goes over how to use LangChain with DeepInfra for chat models.

Set the Environment API Keyโ€‹

Make sure to get your API key from DeepInfra. You have to Login and get a new token.

You are given a 1 hour free of serverless GPU compute to test different models. (see here) You can print your token with deepctl auth token

# get a new token: https://deepinfra.com/login?from=%2Fdash

from getpass import getpass

DEEPINFRA_API_TOKEN = getpass()
 ยทยทยทยทยทยทยทยท
import os

# or pass deepinfra_api_token parameter to the ChatDeepInfra constructor
os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN
from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage
chat = ChatDeepInfra(model="meta-llama/Llama-2-7b-chat-hf")
messages = [
HumanMessage(
content="Translate this sentence from English to French. I love programming."
)
]
chat(messages)
AIMessage(content=" J'aime la programmation.", additional_kwargs={}, example=False)

ChatDeepInfra also supports async and streaming functionality:โ€‹

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
await chat.agenerate([messages])
LLMResult(generations=[[ChatGeneration(text=" J'aime programmer.", generation_info=None, message=AIMessage(content=" J'aime programmer.", additional_kwargs={}, example=False))]], llm_output={}, run=[RunInfo(run_id=UUID('8cc8fb68-1c35-439c-96a0-695036a93652'))])
chat = ChatDeepInfra(
streaming=True,
verbose=True,
callbacks=[StreamingStdOutCallbackHandler()],
)
chat(messages)
 J'aime la programmation.
AIMessage(content=" J'aime la programmation.", additional_kwargs={}, example=False)