Skip to main content
Open In ColabOpen on GitHub

ChatAnthropic

This notebook provides a quick overview for getting started with Anthropic chat models. For detailed documentation of all ChatAnthropic features and configurations head to the API reference.

Anthropic has several chat models. You can find information about their latest models and their costs, context windows, and supported input types in the Anthropic docs.

AWS Bedrock and Google VertexAI

Note that certain Anthropic models can also be accessed via AWS Bedrock and Google VertexAI. See the ChatBedrock and ChatVertexAI integrations to use Anthropic models via these services.

Overview

Integration details

ClassPackageLocalSerializableJS supportPackage downloadsPackage latest
ChatAnthropiclangchain-anthropicbetaPyPI - DownloadsPyPI - Version

Model features

Tool callingStructured outputJSON modeImage inputAudio inputVideo inputToken-level streamingNative asyncToken usageLogprobs

Setup

To access Anthropic models you'll need to create an Anthropic account, get an API key, and install the langchain-anthropic integration package.

Credentials

Head to https://console.anthropic.com/ to sign up for Anthropic and generate an API key. Once you've done this set the ANTHROPIC_API_KEY environment variable:

import getpass
import os

if "ANTHROPIC_API_KEY" not in os.environ:
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")

If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:

# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"

Installation

The LangChain Anthropic integration lives in the langchain-anthropic package:

%pip install -qU langchain-anthropic
This guide requires langchain-anthropic>=0.3.10

Instantiation

Now we can instantiate our model object and generate chat completions:

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
model="claude-3-5-sonnet-20240620",
temperature=0,
max_tokens=1024,
timeout=None,
max_retries=2,
# other params...
)
API Reference:ChatAnthropic

Invocation

messages = [
(
"system",
"You are a helpful assistant that translates English to French. Translate the user sentence.",
),
("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg
AIMessage(content="J'adore la programmation.", response_metadata={'id': 'msg_018Nnu76krRPq8HvgKLW4F8T', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 29, 'output_tokens': 11}}, id='run-57e9295f-db8a-48dc-9619-babd2bedd891-0', usage_metadata={'input_tokens': 29, 'output_tokens': 11, 'total_tokens': 40})
print(ai_msg.content)
J'adore la programmation.

Chaining

We can chain our model with a prompt template like so:

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)

chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)
API Reference:ChatPromptTemplate
AIMessage(content="Here's the German translation:\n\nIch liebe Programmieren.", response_metadata={'id': 'msg_01GhkRtQZUkA5Ge9hqmD8HGY', 'model': 'claude-3-5-sonnet-20240620', 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 23, 'output_tokens': 18}}, id='run-da5906b4-b200-4e08-b81a-64d4453643b6-0', usage_metadata={'input_tokens': 23, 'output_tokens': 18, 'total_tokens': 41})

Content blocks

Content from a single Anthropic AI message can either be a single string or a list of content blocks. For example when an Anthropic model invokes a tool, the tool invocation is part of the message content (as well as being exposed in the standardized AIMessage.tool_calls):

from pydantic import BaseModel, Field


class GetWeather(BaseModel):
"""Get the current weather in a given location"""

location: str = Field(..., description="The city and state, e.g. San Francisco, CA")


llm_with_tools = llm.bind_tools([GetWeather])
ai_msg = llm_with_tools.invoke("Which city is hotter today: LA or NY?")
ai_msg.content
[{'text': "To answer this question, we'll need to check the current weather in both Los Angeles (LA) and New York (NY). I'll use the GetWeather function to retrieve this information for both cities.",
'type': 'text'},
{'id': 'toolu_01Ddzj5PkuZkrjF4tafzu54A',
'input': {'location': 'Los Angeles, CA'},
'name': 'GetWeather',
'type': 'tool_use'},
{'id': 'toolu_012kz4qHZQqD4qg8sFPeKqpP',
'input': {'location': 'New York, NY'},
'name': 'GetWeather',
'type': 'tool_use'}]
ai_msg.tool_calls
[{'name': 'GetWeather',
'args': {'location': 'Los Angeles, CA'},
'id': 'toolu_01Ddzj5PkuZkrjF4tafzu54A'},
{'name': 'GetWeather',
'args': {'location': 'New York, NY'},
'id': 'toolu_012kz4qHZQqD4qg8sFPeKqpP'}]

Extended thinking

Claude 3.7 Sonnet supports an extended thinking feature, which will output the step-by-step reasoning process that led to its final answer.

To use it, specify the thinking parameter when initializing ChatAnthropic. It can also be passed in as a kwarg during invocation.

You will need to specify a token budget to use this feature. See usage example below:

import json

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(
model="claude-3-7-sonnet-latest",
max_tokens=5000,
thinking={"type": "enabled", "budget_tokens": 2000},
)

response = llm.invoke("What is the cube root of 50.653?")
print(json.dumps(response.content, indent=2))
API Reference:ChatAnthropic
[
{
"signature": "ErUBCkYIARgCIkCx7bIPj35jGPHpoVOB2y5hvPF8MN4lVK75CYGftmVNlI4axz2+bBbSexofWsN1O/prwNv8yPXnIXQmwT6zrJsKEgwJzvks0yVRZtaGBScaDOm9xcpOxbuhku1zViIw9WDgil/KZL8DsqWrhVpC6TzM0RQNCcsHcmgmyxbgG9g8PR0eJGLxCcGoEw8zMQu1Kh1hQ1/03hZ2JCOgigpByR9aNPTwwpl64fQUe6WwIw==",
"thinking": "To find the cube root of 50.653, I need to find the value of $x$ such that $x^3 = 50.653$.\n\nI can try to estimate this first. \n$3^3 = 27$\n$4^3 = 64$\n\nSo the cube root of 50.653 will be somewhere between 3 and 4, but closer to 4.\n\nLet me try to compute this more precisely. I can use the cube root function:\n\ncube root of 50.653 = 50.653^(1/3)\n\nLet me calculate this:\n50.653^(1/3) \u2248 3.6998\n\nLet me verify:\n3.6998^3 \u2248 50.6533\n\nThat's very close to 50.653, so I'm confident that the cube root of 50.653 is approximately 3.6998.\n\nActually, let me compute this more precisely:\n50.653^(1/3) \u2248 3.69981\n\nLet me verify once more:\n3.69981^3 \u2248 50.652998\n\nThat's extremely close to 50.653, so I'll say that the cube root of 50.653 is approximately 3.69981.",
"type": "thinking"
},
{
"text": "The cube root of 50.653 is approximately 3.6998.\n\nTo verify: 3.6998\u00b3 = 50.6530, which is very close to our original number.",
"type": "text"
}
]

Prompt caching

Anthropic supports caching of elements of your prompts, including messages, tool definitions, tool results, images and documents. This allows you to re-use large documents, instructions, few-shot documents, and other data to reduce latency and costs.

To enable caching on an element of a prompt, mark its associated content block using the cache_control key. See examples below:

Messages

import requests
from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")

# Pull LangChain readme
get_response = requests.get(
"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md"
)
readme = get_response.text

messages = [
{
"role": "system",
"content": [
{
"type": "text",
"text": "You are a technology expert.",
},
{
"type": "text",
"text": f"{readme}",
"cache_control": {"type": "ephemeral"},
},
],
},
{
"role": "user",
"content": "What's LangChain, according to its README?",
},
]

response_1 = llm.invoke(messages)
response_2 = llm.invoke(messages)

usage_1 = response_1.usage_metadata["input_token_details"]
usage_2 = response_2.usage_metadata["input_token_details"]

print(f"First invocation:\n{usage_1}")
print(f"\nSecond:\n{usage_2}")
API Reference:ChatAnthropic
First invocation:
{'cache_read': 0, 'cache_creation': 1458}

Second:
{'cache_read': 1458, 'cache_creation': 0}

Tools

from langchain_anthropic import convert_to_anthropic_tool
from langchain_core.tools import tool

# For demonstration purposes, we artificially expand the
# tool description.
description = (
f"Get the weather at a location. By the way, check out this readme: {readme}"
)


@tool(description=description)
def get_weather(location: str) -> str:
return "It's sunny."


# Enable caching on the tool
weather_tool = convert_to_anthropic_tool(get_weather)
weather_tool["cache_control"] = {"type": "ephemeral"}

llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")
llm_with_tools = llm.bind_tools([weather_tool])
query = "What's the weather in San Francisco?"

response_1 = llm_with_tools.invoke(query)
response_2 = llm_with_tools.invoke(query)

usage_1 = response_1.usage_metadata["input_token_details"]
usage_2 = response_2.usage_metadata["input_token_details"]

print(f"First invocation:\n{usage_1}")
print(f"\nSecond:\n{usage_2}")
First invocation:
{'cache_read': 0, 'cache_creation': 1809}

Second:
{'cache_read': 1809, 'cache_creation': 0}

Incremental caching in conversational applications

Prompt caching can be used in multi-turn conversations to maintain context from earlier messages without redundant processing.

We can enable incremental caching by marking the final message with cache_control. Claude will automatically use the longest previously-cached prefix for follow-up messages.

Below, we implement a simple chatbot that incorporates this feature. We follow the LangChain chatbot tutorial, but add a custom reducer that automatically marks the last content block in each user message with cache_control. See below:

import requests
from langchain_anthropic import ChatAnthropic
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph, add_messages
from typing_extensions import Annotated, TypedDict

llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")

# Pull LangChain readme
get_response = requests.get(
"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md"
)
readme = get_response.text


def messages_reducer(left: list, right: list) -> list:
# Update last user message
for i in range(len(right) - 1, -1, -1):
if right[i].type == "human":
right[i].content[-1]["cache_control"] = {"type": "ephemeral"}
break

return add_messages(left, right)


class State(TypedDict):
messages: Annotated[list, messages_reducer]


workflow = StateGraph(state_schema=State)


# Define the function that calls the model
def call_model(state: State):
response = llm.invoke(state["messages"])
return {"messages": [response]}


# Define the (single) node in the graph
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

# Add memory
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)
from langchain_core.messages import HumanMessage

config = {"configurable": {"thread_id": "abc123"}}

query = "Hi! I'm Bob."

input_message = HumanMessage([{"type": "text", "text": query}])
output = app.invoke({"messages": [input_message]}, config)
output["messages"][-1].pretty_print()
print(f'\n{output["messages"][-1].usage_metadata["input_token_details"]}')
API Reference:HumanMessage
================================== Ai Message ==================================

Hello, Bob! It's nice to meet you. How are you doing today? Is there something I can help you with?

{'cache_read': 0, 'cache_creation': 0}
query = f"Check out this readme: {readme}"

input_message = HumanMessage([{"type": "text", "text": query}])
output = app.invoke({"messages": [input_message]}, config)
output["messages"][-1].pretty_print()
print(f'\n{output["messages"][-1].usage_metadata["input_token_details"]}')
================================== Ai Message ==================================

I can see you've shared the README from the LangChain GitHub repository. This is the documentation for LangChain, which is a popular framework for building applications powered by Large Language Models (LLMs). Here's a summary of what the README contains:

LangChain is:
- A framework for developing LLM-powered applications
- Helps chain together components and integrations to simplify AI application development
- Provides a standard interface for models, embeddings, vector stores, etc.

Key features/benefits:
- Real-time data augmentation (connect LLMs to diverse data sources)
- Model interoperability (swap models easily as needed)
- Large ecosystem of integrations

The LangChain ecosystem includes:
- LangSmith - For evaluations and observability
- LangGraph - For building complex agents with customizable architecture
- LangGraph Platform - For deployment and scaling of agents

The README also mentions installation instructions (`pip install -U langchain`) and links to various resources including tutorials, how-to guides, conceptual guides, and API references.

Is there anything specific about LangChain you'd like to know more about, Bob?

{'cache_read': 0, 'cache_creation': 1498}
query = "What was my name again?"

input_message = HumanMessage([{"type": "text", "text": query}])
output = app.invoke({"messages": [input_message]}, config)
output["messages"][-1].pretty_print()
print(f'\n{output["messages"][-1].usage_metadata["input_token_details"]}')
================================== Ai Message ==================================

Your name is Bob. You introduced yourself at the beginning of our conversation.

{'cache_read': 1498, 'cache_creation': 269}

In the LangSmith trace, toggling "raw output" will show exactly what messages are sent to the chat model, including cache_control keys.

Token-efficient tool use

Anthropic supports a (beta) token-efficient tool use feature. To use it, specify the relevant beta-headers when instantiating the model.

from langchain_anthropic import ChatAnthropic
from langchain_core.tools import tool

llm = ChatAnthropic(
model="claude-3-7-sonnet-20250219",
temperature=0,
model_kwargs={
"extra_headers": {"anthropic-beta": "token-efficient-tools-2025-02-19"}
},
)


@tool
def get_weather(location: str) -> str:
"""Get the weather at a location."""
return "It's sunny."


llm_with_tools = llm.bind_tools([get_weather])
response = llm_with_tools.invoke("What's the weather in San Francisco?")
print(response.tool_calls)
print(f'\nTotal tokens: {response.usage_metadata["total_tokens"]}')
API Reference:ChatAnthropic | tool
[{'name': 'get_weather', 'args': {'location': 'San Francisco'}, 'id': 'toolu_01EoeE1qYaePcmNbUvMsWtmA', 'type': 'tool_call'}]

Total tokens: 408

Citations

Anthropic supports a citations feature that lets Claude attach context to its answers based on source documents supplied by the user. When document content blocks with "citations": {"enabled": True} are included in a query, Claude may generate citations in its response.

Simple example

In this example we pass a plain text document. In the background, Claude automatically chunks the input text into sentences, which are used when generating citations.

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-5-haiku-latest")

messages = [
{
"role": "user",
"content": [
{
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": "The grass is green. The sky is blue.",
},
"title": "My Document",
"context": "This is a trustworthy document.",
"citations": {"enabled": True},
},
{"type": "text", "text": "What color is the grass and sky?"},
],
}
]
response = llm.invoke(messages)
response.content
API Reference:ChatAnthropic
[{'text': 'Based on the document, ', 'type': 'text'},
{'text': 'the grass is green',
'type': 'text',
'citations': [{'type': 'char_location',
'cited_text': 'The grass is green. ',
'document_index': 0,
'document_title': 'My Document',
'start_char_index': 0,
'end_char_index': 20}]},
{'text': ', and ', 'type': 'text'},
{'text': 'the sky is blue',
'type': 'text',
'citations': [{'type': 'char_location',
'cited_text': 'The sky is blue.',
'document_index': 0,
'document_title': 'My Document',
'start_char_index': 20,
'end_char_index': 36}]},
{'text': '.', 'type': 'text'}]

Using with text splitters

Anthropic also lets you specify your own splits using custom document types. LangChain text splitters can be used to generate meaningful splits for this purpose. See the below example, where we split the LangChain README (a markdown document) and pass it to Claude as context:

import requests
from langchain_anthropic import ChatAnthropic
from langchain_text_splitters import MarkdownTextSplitter


def format_to_anthropic_documents(documents: list[str]):
return {
"type": "document",
"source": {
"type": "content",
"content": [{"type": "text", "text": document} for document in documents],
},
"citations": {"enabled": True},
}


# Pull readme
get_response = requests.get(
"https://raw.githubusercontent.com/langchain-ai/langchain/master/README.md"
)
readme = get_response.text

# Split into chunks
splitter = MarkdownTextSplitter(
chunk_overlap=0,
chunk_size=50,
)
documents = splitter.split_text(readme)

# Construct message
message = {
"role": "user",
"content": [
format_to_anthropic_documents(documents),
{"type": "text", "text": "Give me a link to LangChain's tutorials."},
],
}

# Query LLM
llm = ChatAnthropic(model="claude-3-5-haiku-latest")
response = llm.invoke([message])

response.content
[{'text': "You can find LangChain's tutorials at https://python.langchain.com/docs/tutorials/\n\nThe tutorials section is recommended for those looking to build something specific or who prefer a hands-on learning approach. It's considered the best place to get started with LangChain.",
'type': 'text',
'citations': [{'type': 'content_block_location',
'cited_text': "[Tutorials](https://python.langchain.com/docs/tutorials/):If you're looking to build something specific orare more of a hands-on learner, check out ourtutorials. This is the best place to get started.",
'document_index': 0,
'document_title': None,
'start_block_index': 243,
'end_block_index': 248}]}]

Built-in tools

Anthropic supports a variety of built-in tools, which can be bound to the model in the usual way. Claude will generate tool calls adhering to its internal schema for the tool:

from langchain_anthropic import ChatAnthropic

llm = ChatAnthropic(model="claude-3-7-sonnet-20250219")

tool = {"type": "text_editor_20250124", "name": "str_replace_editor"}
llm_with_tools = llm.bind_tools([tool])

response = llm_with_tools.invoke(
"There's a syntax error in my primes.py file. Can you help me fix it?"
)
print(response.text())
response.tool_calls
API Reference:ChatAnthropic
I'd be happy to help you fix the syntax error in your primes.py file. First, let's look at the current content of the file to identify the error.
[{'name': 'str_replace_editor',
'args': {'command': 'view', 'path': '/repo/primes.py'},
'id': 'toolu_01VdNgt1YV7kGfj9LFLm6HyQ',
'type': 'tool_call'}]

API reference

For detailed documentation of all ChatAnthropic features and configurations head to the API reference: https://python.langchain.com/api_reference/anthropic/chat_models/langchain_anthropic.chat_models.ChatAnthropic.html


Was this page helpful?