ChatModelIntegrationTests#
- class langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests[source]#
Base class for chat model integration tests.
Test subclasses must implement the
chat_model_class
andchat_model_params
properties to specify what model to test and its initialization parameters.Example:
from typing import Type from langchain_tests.integration_tests import ChatModelIntegrationTests from my_package.chat_models import MyChatModel class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def chat_model_class(self) -> Type[MyChatModel]: # Return the chat model class to test here return MyChatModel @property def chat_model_params(self) -> dict: # Return initialization parameters for the model. return {"model": "model-001", "temperature": 0}
Note
API references for individual test methods include troubleshooting tips.
Test subclasses must implement the following two properties:
- chat_model_class
The chat model class to test, e.g.,
ChatParrotLink
.Example:
@property def chat_model_class(self) -> Type[ChatParrotLink]: return ChatParrotLink
- chat_model_params
Initialization parameters for the chat model.
Example:
@property def chat_model_params(self) -> dict: return {"model": "bird-brain-001", "temperature": 0}
In addition, test subclasses can control what features are tested (such as tool calling or multi-modality) by selectively overriding the following properties. Expand to see details:
has_tool_calling
Boolean property indicating whether the chat model supports tool calling.
By default, this is determined by whether the chat model’s bind_tools method is overridden. It typically does not need to be overridden on the test class.
Example override:
@property def has_tool_calling(self) -> bool: return True
tool_choice_value
Value to use for tool choice when used in tests.
Some tests for tool calling features attempt to force tool calling via a tool_choice parameter. A common value for this parameter is “any”. Defaults to None.
Note: if the value is set to “tool_name”, the name of the tool used in each test will be set as the value for tool_choice.
Example:
@property def tool_choice_value(self) -> Optional[str]: return "any"
has_structured_output
Boolean property indicating whether the chat model supports structured output.
By default, this is determined by whether the chat model’s with_structured_output method is overridden. If the base implementation is intended to be used, this method should be overridden.
See: https://python.langchain.com/docs/concepts/structured_outputs/
Example:
@property def has_structured_output(self) -> bool: return True
supports_image_inputs
Boolean property indicating whether the chat model supports image inputs. Defaults to
False
.If set to
True
, the chat model will be tested using content blocks of the form[ {"type": "text", "text": "describe the weather in this image"}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ]
See https://python.langchain.com/docs/concepts/multimodality/
Example:
@property def supports_image_inputs(self) -> bool: return True
supports_video_inputs
Boolean property indicating whether the chat model supports image inputs. Defaults to
False
. No current tests are written for this feature.returns_usage_metadata
Boolean property indicating whether the chat model returns usage metadata on invoke and streaming responses.
usage_metadata
is an optional dict attribute on AIMessages that track input and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.htmlExample:
@property def returns_usage_metadata(self) -> bool: return False
supports_anthropic_inputs
Boolean property indicating whether the chat model supports Anthropic-style inputs.
These inputs might feature “tool use” and “tool result” content blocks, e.g.,
[ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ]
If set to
True
, the chat model will be tested using content blocks of this form.Example:
@property def supports_anthropic_inputs(self) -> bool: return False
supports_image_tool_message
Boolean property indicating whether the chat model supports ToolMessages that include image content, e.g.,
ToolMessage( content=[ { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ], tool_call_id="1", name="random_image", )
If set to
True
, the chat model will be tested with message sequences that include ToolMessages of this form.Example:
@property def supports_image_tool_message(self) -> bool: return False
supported_usage_metadata_details
Property controlling what usage metadata details are emitted in both invoke and stream.
usage_metadata
is an optional dict attribute on AIMessages that track input and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.htmlIt includes optional keys
input_token_details
andoutput_token_details
that can track usage details associated with special types of tokens, such as cached, audio, or reasoning.Only needs to be overridden if these details are supplied.
Attributes
chat_model_class
The chat model class to test, e.g.,
ChatParrotLink
.chat_model_params
Initialization parameters for the chat model.
has_structured_output
(bool) whether the chat model supports structured output.
has_tool_calling
(bool) whether the model supports tool calling.
returns_usage_metadata
(bool) whether the chat model returns usage metadata on invoke and streaming responses.
supported_usage_metadata_details
(dict) what usage metadata details are emitted in invoke and stream.
supports_anthropic_inputs
(bool) whether the chat model supports Anthropic-style inputs.
supports_image_inputs
(bool) whether the chat model supports image inputs, defaults to
False
.supports_image_tool_message
(bool) whether the chat model supports ToolMessages that include image content.
supports_video_inputs
(bool) whether the chat model supports video inputs, efaults to
False
.tool_choice_value
(None or str) to use for tool choice when used in tests.
Methods
test_abatch
(model)Test to verify that await model.abatch([messages]) works.
test_ainvoke
(model)Test to verify that await model.ainvoke(simple_message) works.
test_anthropic_inputs
(model)Test that model can process Anthropic-style message histories.
test_astream
(model)Test to verify that await model.astream(simple_message) works.
test_batch
(model)Test to verify that model.batch([messages]) works.
test_bind_runnables_as_tools
(model)Test that the model generates tool calls for tools that are derived from LangChain runnables.
test_conversation
(model)Test to verify that the model can handle multi-turn conversations.
test_image_inputs
(model)Test that the model can process image inputs.
test_image_tool_message
(model)Test that the model can process ToolMessages with image inputs.
test_invoke
(model)Test to verify that model.invoke(simple_message) works.
test_message_with_name
(model)Test that HumanMessage with values for the
name
field can be handled.test_stop_sequence
(model)Test that model does not fail when invoked with the
stop
parameter, which is a standard parameter for stopping generation at a certain token.test_stream
(model)Test to verify that model.stream(simple_message) works.
test_structured_few_shot_examples
(model, ...)Test that the model can process few-shot examples with tool calls.
test_structured_output
(model)Test to verify structured output is generated both on invoke and stream.
test_structured_output_async
(model)Test to verify structured output is generated both on invoke and stream.
Test to verify we can generate structured output that includes optional parameters.
Test to verify we can generate structured output using pydantic.v1.BaseModel.
test_tool_calling
(model)Test that the model generates tool calls.
test_tool_calling_async
(model)Test that the model generates tool calls.
Test that the model generates tool calls for tools with no arguments.
test_tool_message_error_status
(model, ...)Test that ToolMessage with
status="error"
can be handled.Test that message histories are compatible with list tool contents (e.g. Anthropic format).
Test that message histories are compatible with string tool contents (e.g. OpenAI format).
test_usage_metadata
(model)Test to verify that the model returns correct usage metadata.
Test to verify that the model returns correct usage metadata in streaming mode.
- async test_abatch(model: BaseChatModel) None [source]#
Test to verify that await model.abatch([messages]) works.
This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch asynchronously.
Troubleshooting
First, debug
test_batch()
andtest_ainvoke()
because abatch has a default implementation that calls ainvoke for each message in the batch.If those tests pass but not this one, you should make sure your abatch method does not raise any exceptions, and that it returns a list of valid
AIMessage
objects.- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_ainvoke(model: BaseChatModel) None [source]#
Test to verify that await model.ainvoke(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a “natively async” implementation, but rather that the model can be used in an async context.
Troubleshooting
First, debug
test_invoke()
. because ainvoke has a default implementation that calls invoke in an async context.If that test passes but not this one, you should make sure your _agenerate method does not raise any exceptions, and that it returns a valid
ChatResult
like so:return ChatResult( generations=[ChatGeneration( message=AIMessage(content="Output text") )] )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_anthropic_inputs(model: BaseChatModel) None [source]#
Test that model can process Anthropic-style message histories.
These message histories will include
AIMessage
objects withtool_use
content blocks, e.g.,AIMessage( [ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ] )
as well as
HumanMessage
objects containingtool_result
content blocks:HumanMessage( [ { "type": "tool_result", "tool_use_id": "foo", "content": [ { "type": "text", "text": "green is a great pick! that's my sister's favorite color", # noqa: E501 } ], "is_error": False, }, {"type": "text", "text": "what's my sister's favorite color"}, ] )
This test should be skipped if the model does not support messages of this form (or doesn’t support tool calling generally). See Configuration below.
Configuration
To disable this test, set
supports_anthropic_inputs
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_anthropic_inputs(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include message objects with list content.
The
tool_calls
attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.HumanMessages with “tool_result” content blocks are correctly handled.
Otherwise, if Anthropic tool call and result formats are not supported, set the
supports_anthropic_inputs
property to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_astream(model: BaseChatModel) None [source]#
Test to verify that await model.astream(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a “natively async” or “streaming” implementation, but rather that the model can be used in an async streaming context.
Troubleshooting
First, debug
test_stream()
. andtest_ainvoke()
. because astream has a default implementation that calls _stream in an async context if it is implemented, or ainvoke and yields the result as a single chunk if not.If those tests pass but not this one, you should make sure your _astream method does not raise any exceptions, and that it yields valid
ChatGenerationChunk
objects like so:yield ChatGenerationChunk( message=AIMessageChunk(content="chunk text") )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_batch(model: BaseChatModel) None [source]#
Test to verify that model.batch([messages]) works.
This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch.
Troubleshooting
First, debug
test_invoke()
because batch has a default implementation that calls invoke for each message in the batch.If that test passes but not this one, you should make sure your batch method does not raise any exceptions, and that it returns a list of valid
AIMessage
objects.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_bind_runnables_as_tools(model: BaseChatModel) None [source]#
Test that the model generates tool calls for tools that are derived from LangChain runnables. This test is skipped if the
has_tool_calling
property on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_tools
is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choice
parameter. This parameter can be used to force a tool call. Iftool_choice
is not supported and the model consistently fails this test, you canxfail
the test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_bind_runnables_as_tools(self, model: BaseChatModel) -> None: super().test_bind_runnables_as_tools(model)
Otherwise, ensure that the
tool_choice_value
property is correctly specified on the test class.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_conversation(model: BaseChatModel) None [source]#
Test to verify that the model can handle multi-turn conversations.
This should pass for all integrations. Tests the model’s ability to process a sequence of alternating human and AI messages as context for generating the next response.
Troubleshooting
First, debug
test_invoke()
because this test also uses model.invoke().If that test passes but not this one, you should verify that: 1. Your model correctly processes the message history 2. The model maintains appropriate context from previous messages 3. The response is a valid
AIMessage
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_image_inputs(model: BaseChatModel) None [source]#
Test that the model can process image inputs.
This test should be skipped (see Configuration below) if the model does not support image inputs These will take the form of messages with OpenAI-style image content blocks:
[ {"type": "text", "text": "describe the weather in this image"}, { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ]
See https://python.langchain.com/docs/concepts/multimodality/
Configuration
To disable this test, set
supports_image_inputs
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_image_inputs(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with image content blocks in OpenAI format, including base64-encoded images. Otherwise, set the
supports_image_inputs
property to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_image_tool_message(model: BaseChatModel) None [source]#
Test that the model can process ToolMessages with image inputs.
This test should be skipped if the model does not support messages of the form:
ToolMessage( content=[ { "type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}, }, ], tool_call_id="1", name="random_image", )
This test can be skipped by setting the
supports_image_tool_message
property to False (see Configuration below).Configuration
To disable this test, set
supports_image_tool_message
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supports_image_tool_message(self) -> bool: return False
Troubleshooting
If this test fails, check that the model can correctly handle messages with image content blocks in ToolMessages, including base64-encoded images. Otherwise, set the
supports_image_tool_message
property to False.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_invoke(model: BaseChatModel) None [source]#
Test to verify that model.invoke(simple_message) works.
This should pass for all integrations.
Troubleshooting
If this test fails, you should make sure your _generate method does not raise any exceptions, and that it returns a valid
ChatResult
like so:return ChatResult( generations=[ChatGeneration( message=AIMessage(content="Output text") )] )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_message_with_name(model: BaseChatModel) None [source]#
Test that HumanMessage with values for the
name
field can be handled.These messages may take the form:
HumanMessage("hello", name="example_user")
If possible, the
name
field should be parsed and passed appropriately to the model. Otherwise, it should be ignored.Troubleshooting
If this test fails, check that the
name
field onHumanMessage
objects is either ignored or passed to the model appropriately.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_stop_sequence(model: BaseChatModel) None [source]#
Test that model does not fail when invoked with the
stop
parameter, which is a standard parameter for stopping generation at a certain token.More on standard parameters here: https://python.langchain.com/docs/concepts/chat_models/#standard-parameters
This should pass for all integrations.
Troubleshooting
If this test fails, check that the function signature for
_generate
(as well as_stream
and async variants) accepts thestop
parameter:def _generate( self, messages: List[BaseMessage], stop: Optional[List[str]] = None, run_manager: Optional[CallbackManagerForLLMRun] = None, **kwargs: Any, ) -> ChatResult:
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_stream(model: BaseChatModel) None [source]#
Test to verify that model.stream(simple_message) works.
This should pass for all integrations. Passing this test does not indicate a “streaming” implementation, but rather that the model can be used in a streaming context.
Troubleshooting
First, debug
test_invoke()
. because stream has a default implementation that calls invoke and yields the result as a single chunk.If that test passes but not this one, you should make sure your _stream method does not raise any exceptions, and that it yields valid
ChatGenerationChunk
objects like so:yield ChatGenerationChunk( message=AIMessageChunk(content="chunk text") )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_structured_few_shot_examples(model: BaseChatModel, my_adder_tool: BaseTool) None [source]#
Test that the model can process few-shot examples with tool calls.
These are represented as a sequence of messages of the following form:
HumanMessage
with string content;AIMessage
with thetool_calls
attribute populated;ToolMessage
with string content;AIMessage
with string content (an answer);HuamnMessage
with string content (a follow-up question).
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
This test uses a utility function in
langchain_core
to generate a sequence of messages representing “few-shot” examples: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.tool_example_to_messages.htmlIf this test fails, check that the model can correctly handle this sequence of messages.
You can
xfail
the test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_structured_few_shot_examples(self, *args: Any) -> None: super().test_structured_few_shot_examples(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_structured_output(model: BaseChatModel) None [source]#
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, ensure that the model’s
bind_tools
method properly handles both JSON Schema and Pydantic V2 models.langchain_core
implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.htmlSee example implementation of
with_structured_output
here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_structured_output_async(model: BaseChatModel) None [source]#
Test to verify structured output is generated both on invoke and stream.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, ensure that the model’s
bind_tools
method properly handles both JSON Schema and Pydantic V2 models.langchain_core
implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.htmlSee example implementation of
with_structured_output
here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output- Parameters:
model (BaseChatModel)
- Return type:
None
- test_structured_output_optional_param(model: BaseChatModel) None [source]#
Test to verify we can generate structured output that includes optional parameters.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, ensure that the model’s
bind_tools
method properly handles Pydantic V2 models with optional parameters.langchain_core
implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.htmlSee example implementation of
with_structured_output
here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output- Parameters:
model (BaseChatModel)
- Return type:
None
- test_structured_output_pydantic_2_v1(model: BaseChatModel) None [source]#
Test to verify we can generate structured output using pydantic.v1.BaseModel.
pydantic.v1.BaseModel is available in the pydantic 2 package.
This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, ensure that the model’s
bind_tools
method properly handles both JSON Schema and Pydantic V1 models.langchain_core
implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.htmlSee example implementation of
with_structured_output
here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_calling(model: BaseChatModel) None [source]#
Test that the model generates tool calls. This test is skipped if the
has_tool_calling
property on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_tools
is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choice
parameter. This parameter can be used to force a tool call. Iftool_choice
is not supported and the model consistently fails this test, you canxfail
the test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_tool_calling(self, model: BaseChatModel) -> None: super().test_tool_calling(model)
Otherwise, ensure that the
tool_choice_value
property is correctly specified on the test class.- Parameters:
model (BaseChatModel)
- Return type:
None
- async test_tool_calling_async(model: BaseChatModel) None [source]#
Test that the model generates tool calls. This test is skipped if the
has_tool_calling
property on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_tools
is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.This test may fail if the chat model does not support a
tool_choice
parameter. This parameter can be used to force a tool call. Iftool_choice
is not supported and the model consistently fails this test, you canxfail
the test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) async def test_tool_calling_async(self, model: BaseChatModel) -> None: await super().test_tool_calling_async(model)
Otherwise, ensure that the
tool_choice_value
property is correctly specified on the test class.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_calling_with_no_arguments(model: BaseChatModel) None [source]#
Test that the model generates tool calls for tools with no arguments. This test is skipped if the
has_tool_calling
property on the test class is set to False.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that
bind_tools
is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model. It should correctly handle the case where a tool has no arguments.This test may fail if the chat model does not support a
tool_choice
parameter. This parameter can be used to force a tool call. It may also fail if a provider does not support this form of tool. In these cases, you canxfail
the test:@pytest.mark.xfail(reason=("Does not support tool_choice.")) def test_tool_calling_with_no_arguments(self, model: BaseChatModel) -> None: super().test_tool_calling_with_no_arguments(model)
Otherwise, ensure that the
tool_choice_value
property is correctly specified on the test class.- Parameters:
model (BaseChatModel)
- Return type:
None
- test_tool_message_error_status(model: BaseChatModel, my_adder_tool: BaseTool) None [source]#
Test that ToolMessage with
status="error"
can be handled.These messages may take the form:
ToolMessage( "Error: Missing required argument 'b'.", name="my_adder_tool", tool_call_id="abc123", status="error", )
If possible, the
status
field should be parsed and passed appropriately to the model.This test is optional and should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that the
status
field onToolMessage
objects is either ignored or passed to the model appropriately.Otherwise, ensure that the
tool_choice_value
property is correctly specified on the test class.- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_tool_message_histories_list_content(model: BaseChatModel, my_adder_tool: BaseTool) None [source]#
Test that message histories are compatible with list tool contents (e.g. Anthropic format).
These message histories will include AIMessage objects with “tool use” and content blocks, e.g.,
[ {"type": "text", "text": "Hmm let me think about that"}, { "type": "tool_use", "input": {"fav_color": "green"}, "id": "foo", "name": "color_picker", }, ]
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include AIMessage objects with list content.
The
tool_calls
attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.The model can correctly handle ToolMessage objects with string content and arbitrary string values for
tool_call_id
.
You can
xfail
the test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_tool_message_histories_list_content(self, *args: Any) -> None: super().test_tool_message_histories_list_content(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_tool_message_histories_string_content(model: BaseChatModel, my_adder_tool: BaseTool) None [source]#
Test that message histories are compatible with string tool contents (e.g. OpenAI format). If a model passes this test, it should be compatible with messages generated from providers following OpenAI format.
This test should be skipped if the model does not support tool calling (see Configuration below).
Configuration
To disable tool calling tests, set
has_tool_calling
to False in your test class:class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def has_tool_calling(self) -> bool: return False
Troubleshooting
If this test fails, check that:
The model can correctly handle message histories that include AIMessage objects with
""
content.The
tool_calls
attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.The model can correctly handle ToolMessage objects with string content and arbitrary string values for
tool_call_id
.
You can
xfail
the test if tool calling is implemented but this format is not supported.@pytest.mark.xfail(reason=("Not implemented.")) def test_tool_message_histories_string_content(self, *args: Any) -> None: super().test_tool_message_histories_string_content(*args)
- Parameters:
model (BaseChatModel)
my_adder_tool (BaseTool)
- Return type:
None
- test_usage_metadata(model: BaseChatModel) None [source]#
Test to verify that the model returns correct usage metadata.
This test is optional and should be skipped if the model does not return usage metadata (see Configuration below).
Configuration
By default, this test is run. To disable this feature, set returns_usage_metadata to False in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def returns_usage_metadata(self) -> bool: return False
This test can also check the format of specific kinds of usage metadata based on the supported_usage_metadata_details property. This property should be configured as follows with the types of tokens that the model supports tracking:
class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supported_usage_metadata_details(self) -> dict: return { "invoke": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], "stream": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], }
Troubleshooting
If this test fails, first verify that your model returns
UsageMetadata
dicts attached to the returned AIMessage object in _generate:return ChatResult( generations=[ChatGeneration( message=AIMessage( content="Output text", usage_metadata={ "input_tokens": 350, "output_tokens": 240, "total_tokens": 590, "input_token_details": { "audio": 10, "cache_creation": 200, "cache_read": 100, }, "output_token_details": { "audio": 10, "reasoning": 200, } } ) )] )
- Parameters:
model (BaseChatModel)
- Return type:
None
- test_usage_metadata_streaming(model: BaseChatModel) None [source]#
Test to verify that the model returns correct usage metadata in streaming mode.
Configuration
By default, this test is run. To disable this feature, set returns_usage_metadata to False in your test class:
class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def returns_usage_metadata(self) -> bool: return False
This test can also check the format of specific kinds of usage metadata based on the supported_usage_metadata_details property. This property should be configured as follows with the types of tokens that the model supports tracking:
class TestMyChatModelIntegration(ChatModelIntegrationTests): @property def supported_usage_metadata_details(self) -> dict: return { "invoke": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], "stream": [ "audio_input", "audio_output", "reasoning_output", "cache_read_input", "cache_creation_input", ], }
Troubleshooting
If this test fails, first verify that your model yields
UsageMetadata
dicts attached to the returned AIMessage object in _stream that sum up to the total usage metadata.Note that input_tokens should only be included on one of the chunks (typically the first or the last chunk), and the rest should have 0 or None to avoid counting input tokens multiple times.
output_tokens typically count the number of tokens in each chunk, not the sum. This test will pass as long as the sum of output_tokens across all chunks is not 0.
yield ChatResult( generations=[ChatGeneration( message=AIMessage( content="Output text", usage_metadata={ "input_tokens": ( num_input_tokens if is_first_chunk else 0 ), "output_tokens": 11, "total_tokens": ( 11+num_input_tokens if is_first_chunk else 11 ), "input_token_details": { "audio": 10, "cache_creation": 200, "cache_read": 100, }, "output_token_details": { "audio": 10, "reasoning": 200, } } ) )] )
- Parameters:
model (BaseChatModel)
- Return type:
None