Skip to main content

ChatOllama

Ollama allows you to run open-source large language models, such as Llama 2, locally.

Ollama bundles model weights, configuration, and data into a single package, defined by a Modelfile.

It optimizes setup and configuration details, including GPU usage.

For a complete list of supported models and model variants, see the Ollama model library.

Setup​

First, follow these instructions to set up and run a local Ollama instance:

  • Download and install Ollama onto the available supported platforms (including Windows Subsystem for Linux)
  • Fetch available LLM model via ollama pull <name-of-model>
    • View a list of available models via the model library
    • e.g., for Llama-7b: ollama pull llama2
  • This will download the default tagged version of the model. Typically, the default points to the latest, smallest sized-parameter model.

On Mac, the models will be download to ~/.ollama/models

On Linux (or WSL), the models will be stored at /usr/share/ollama/.ollama/models

  • Specify the exact version of the model of interest as such ollama pull vicuna:13b-v1.5-16k-q4_0 (View the various tags for the Vicuna model in this instance)
  • To view all pulled models, use ollama list
  • To chat directly with a model from the command line, use ollama run <name-of-model>
  • View the Ollama documentation for more commands. Run ollama help in the terminal to see available commands too.

Usage​

You can see a full list of supported parameters on the API reference page.

If you are using a LLaMA chat model (e.g., ollama pull llama2:7b-chat) then you can use the ChatOllama interface.

This includes special tokens for system message and user input.

Interacting with Models​

Here are a few ways to interact with pulled local models

directly in the terminal:​

  • All of your local models are automatically served on localhost:11434
  • Run ollama run <name-of-model> to start interacting via the command line directly

via an API​

Send an application/json request to the API endpoint of Ollama to interact.

curl http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'

See the Ollama API documentation for all endpoints.

via LangChain​

See a typical basic example of using Ollama via the ChatOllama chat model in your LangChain application.

# LangChain supports many other chat models. Here, we're using Ollama
from langchain_community.chat_models import ChatOllama
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

# supports many more optional parameters. Hover on your `ChatOllama(...)`
# class to view the latest available supported parameters
llm = ChatOllama(model="llama2")
prompt = ChatPromptTemplate.from_template("Tell me a short joke about {topic}")

# using LangChain Expressive Language chain syntax
# learn more about the LCEL on
# https://python.langchain.com/docs/expression_language/why
chain = prompt | llm | StrOutputParser()

# for brevity, response is printed in terminal
# You can use LangServe to deploy your application for
# production
print(chain.invoke({"topic": "Space travel"}))
 Sure, here's a fun space-themed joke for you:

Why don't astronauts like broccoli?
Because it has too many "crisps" in it!

LCEL chains, out of the box, provide extra functionalities, such as streaming of responses, and async support

topic = {"topic": "Space travel"}

for chunks in chain.stream(topic):
print(chunks)
 Sure
,
here
's
a
joke
:
Why
did
the
astronaut
break
up
with
his
girlfriend
?
Because
he
needed
more
space
to
explore
.


For streaming async support, here’s an example - all possible via the single chain created above.

topic = {"topic": "Space travel"}

async for chunks in chain.astream(topic):
print(chunks)
 Sure
,
here
's
a
little
one
:
Why
did
the
rocket
scientist
break
up
with
her
partner
?
Because
he
couldn
't
handle
all
her
"
space
y
"
jokes
.


Take a look at the LangChain Expressive Language (LCEL) Interface for the other available interfaces for use when a chain is created.

Building from source​

For up to date instructions on building from source, check the Ollama documentation on Building from Source

Extraction​

Use the latest version of Ollama and supply the format flag. The format flag will force the model to produce the response in JSON.

Note: You can also try out the experimental OllamaFunctions wrapper for convenience.

from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="llama2", format="json", temperature=0)
from langchain_core.messages import HumanMessage

messages = [
HumanMessage(
content="What color is the sky at different times of the day? Respond using JSON"
)
]

chat_model_response = llm.invoke(messages)
print(chat_model_response)
content='{\n"morning": {\n"color": "light blue"\n},\n"noon": {\n"color": "blue"\n},\n"afternoon": {\n"color": "grayish-blue"\n},\n"evening": {\n"color": "pinkish-orange"\n}\n}'
import json

from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

json_schema = {
"title": "Person",
"description": "Identifying information about a person.",
"type": "object",
"properties": {
"name": {"title": "Name", "description": "The person's name", "type": "string"},
"age": {"title": "Age", "description": "The person's age", "type": "integer"},
"fav_food": {
"title": "Fav Food",
"description": "The person's favorite food",
"type": "string",
},
},
"required": ["name", "age"],
}

llm = ChatOllama(model="llama2")

messages = [
HumanMessage(
content="Please tell me about a person using the following JSON schema:"
),
HumanMessage(content="{dumps}"),
HumanMessage(
content="Now, considering the schema, tell me about a person named John who is 35 years old and loves pizza."
),
]

prompt = ChatPromptTemplate.from_messages(messages)
dumps = json.dumps(json_schema, indent=2)

chain = prompt | llm | StrOutputParser()

print(chain.invoke({"dumps": dumps}))

{
"name": "John",
"age": 35,
"interests": [
"pizza"
]
}

Multi-modal​

Ollama has support for multi-modal LLMs, such as bakllava and llava.

Browse the full set of versions for models with tags, such as Llava.

Download the desired LLM via ollama pull bakllava

Be sure to update Ollama so that you have the most recent version to support multi-modal.

Check out the typical example of how to use ChatOllama multi-modal support below:

pip install --upgrade --quiet  pillow
Note: you may need to restart the kernel to use updated packages.
import base64
from io import BytesIO

from IPython.display import HTML, display
from PIL import Image


def convert_to_base64(pil_image):
"""
Convert PIL images to Base64 encoded strings

:param pil_image: PIL image
:return: Re-sized Base64 string
"""

buffered = BytesIO()
pil_image.save(buffered, format="JPEG") # You can change the format if needed
img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")
return img_str


def plt_img_base64(img_base64):
"""
Disply base64 encoded string as image

:param img_base64: Base64 string
"""
# Create an HTML img tag with the base64 string as the source
image_html = f'<img src="data:image/jpeg;base64,{img_base64}" />'
# Display the image by rendering the HTML
display(HTML(image_html))


file_path = "../../../static/img/ollama_example_img.jpg"
pil_image = Image.open(file_path)

image_b64 = convert_to_base64(pil_image)
plt_img_base64(image_b64)
from langchain_community.chat_models import ChatOllama
from langchain_core.messages import HumanMessage

llm = ChatOllama(model="bakllava", temperature=0)


def prompt_func(data):
text = data["text"]
image = data["image"]

image_part = {
"type": "image_url",
"image_url": f"data:image/jpeg;base64,{image}",
}

content_parts = []

text_part = {"type": "text", "text": text}

content_parts.append(image_part)
content_parts.append(text_part)

return [HumanMessage(content=content_parts)]


from langchain_core.output_parsers import StrOutputParser

chain = prompt_func | llm | StrOutputParser()

query_chain = chain.invoke(
{"text": "What is the Dollar-based gross retention rate?", "image": image_b64}
)

print(query_chain)
90%