DeepInfra

DeepInfra is a serverless inference as a service that provides access to a variety of LLMs and embeddings models. This notebook goes over how to use LangChain with DeepInfra for chat models.

Set the Environment API Key

Make sure to get your API key from DeepInfra. You have to Login and get a new token.

You are given a 1 hour free of serverless GPU compute to test different models. (see here) You can print your token with deepctl auth token

# get a new token: https://deepinfra.com/login?from=%2Fdash

import os
from getpass import getpass

from langchain_community.chat_models import ChatDeepInfra
from langchain_core.messages import HumanMessage

DEEPINFRA_API_TOKEN = getpass()

# or pass deepinfra_api_token parameter to the ChatDeepInfra constructor
os.environ["DEEPINFRA_API_TOKEN"] = DEEPINFRA_API_TOKEN

chat = ChatDeepInfra(model="meta-llama/Llama-2-7b-chat-hf")

messages = [
    HumanMessage(
        content="Translate this sentence from English to French. I love programming."
    )
]
chat.invoke(messages)

API Reference:

`ChatDeepInfra` also supports async and streaming functionality:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

API Reference:

StreamingStdOutCallbackHandler

await chat.agenerate([messages])

chat = ChatDeepInfra(
    streaming=True,
    verbose=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)
chat.invoke(messages)

DeepInfra

Set the Environment API Key​

API Reference:

ChatDeepInfra also supports async and streaming functionality:​

API Reference:

Help us out by providing feedback on this documentation page:

Set the Environment API Key

`ChatDeepInfra` also supports async and streaming functionality: