ChatModelIntegrationTests#

class langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests[source]#

Base class for chat model integration tests.

Test subclasses must implement the chat_model_class and chat_model_params properties to specify what model to test and its initialization parameters.

Example:

from typing import Type

from langchain_tests.integration_tests import ChatModelIntegrationTests
from my_package.chat_models import MyChatModel


class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def chat_model_class(self) -> Type[MyChatModel]:
        # Return the chat model class to test here
        return MyChatModel

    @property
    def chat_model_params(self) -> dict:
        # Return initialization parameters for the model.
        return {"model": "model-001", "temperature": 0}

Note

API references for individual test methods include troubleshooting tips.

Test subclasses must implement the following two properties:

chat_model_class

The chat model class to test, e.g., ChatParrotLink.

Example:

@property
def chat_model_class(self) -> Type[ChatParrotLink]:
    return ChatParrotLink
chat_model_params

Initialization parameters for the chat model.

Example:

@property
def chat_model_params(self) -> dict:
    return {"model": "bird-brain-001", "temperature": 0}

In addition, test subclasses can control what features are tested (such as tool calling or multi-modality) by selectively overriding the following properties. Expand to see details:

has_tool_calling

Boolean property indicating whether the chat model supports tool calling.

By default, this is determined by whether the chat model’s bind_tools method is overridden. It typically does not need to be overridden on the test class.

Example override:

@property
def has_tool_calling(self) -> bool:
    return True
tool_choice_value

Value to use for tool choice when used in tests.

Some tests for tool calling features attempt to force tool calling via a tool_choice parameter. A common value for this parameter is “any”. Defaults to None.

Note: if the value is set to “tool_name”, the name of the tool used in each test will be set as the value for tool_choice.

Example:

@property
def tool_choice_value(self) -> Optional[str]:
    return "any"
has_structured_output

Boolean property indicating whether the chat model supports structured output.

By default, this is determined by whether the chat model’s with_structured_output method is overridden. If the base implementation is intended to be used, this method should be overridden.

See: https://python.langchain.com/docs/concepts/structured_outputs/

Example:

@property
def has_structured_output(self) -> bool:
    return True
structured_output_kwargs

Dict property that can be used to specify additional kwargs for with_structured_output. Useful for testing different models.

Example:

@property
def structured_output_kwargs(self) -> dict:
    return {"method": "function_calling"}
supports_json_mode

Boolean property indicating whether the chat model supports JSON mode in with_structured_output.

See: https://python.langchain.com/docs/concepts/structured_outputs/#json-mode

Example:

@property
def supports_json_mode(self) -> bool:
    return True
supports_image_inputs

Boolean property indicating whether the chat model supports image inputs. Defaults to False.

If set to True, the chat model will be tested using content blocks of the form

[
    {"type": "text", "text": "describe the weather in this image"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
    },
]

See https://python.langchain.com/docs/concepts/multimodality/

Example:

@property
def supports_image_inputs(self) -> bool:
    return True
supports_video_inputs

Boolean property indicating whether the chat model supports image inputs. Defaults to False. No current tests are written for this feature.

returns_usage_metadata

Boolean property indicating whether the chat model returns usage metadata on invoke and streaming responses.

usage_metadata is an optional dict attribute on AIMessages that track input and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html

Example:

@property
def returns_usage_metadata(self) -> bool:
    return False
supports_anthropic_inputs

Boolean property indicating whether the chat model supports Anthropic-style inputs.

These inputs might feature “tool use” and “tool result” content blocks, e.g.,

[
    {"type": "text", "text": "Hmm let me think about that"},
    {
        "type": "tool_use",
        "input": {"fav_color": "green"},
        "id": "foo",
        "name": "color_picker",
    },
]

If set to True, the chat model will be tested using content blocks of this form.

Example:

@property
def supports_anthropic_inputs(self) -> bool:
    return False
supports_image_tool_message

Boolean property indicating whether the chat model supports ToolMessages that include image content, e.g.,

ToolMessage(
    content=[
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
    tool_call_id="1",
    name="random_image",
)

If set to True, the chat model will be tested with message sequences that include ToolMessages of this form.

Example:

@property
def supports_image_tool_message(self) -> bool:
    return False
supported_usage_metadata_details

Property controlling what usage metadata details are emitted in both invoke and stream.

usage_metadata is an optional dict attribute on AIMessages that track input and output tokens: https://python.langchain.com/api_reference/core/messages/langchain_core.messages.ai.UsageMetadata.html

It includes optional keys input_token_details and output_token_details that can track usage details associated with special types of tokens, such as cached, audio, or reasoning.

Only needs to be overridden if these details are supplied.

Attributes

chat_model_class

The chat model class to test, e.g., ChatParrotLink.

chat_model_params

Initialization parameters for the chat model.

has_structured_output

(bool) whether the chat model supports structured output.

has_tool_calling

(bool) whether the model supports tool calling.

returns_usage_metadata

(bool) whether the chat model returns usage metadata on invoke and streaming responses.

structured_output_kwargs

If specified, additional kwargs for with_structured_output.

supported_usage_metadata_details

(dict) what usage metadata details are emitted in invoke and stream.

supports_anthropic_inputs

(bool) whether the chat model supports Anthropic-style inputs.

supports_image_inputs

(bool) whether the chat model supports image inputs, defaults to False.

supports_image_tool_message

(bool) whether the chat model supports ToolMessages that include image content.

supports_json_mode

(bool) whether the chat model supports JSON mode.

supports_video_inputs

(bool) whether the chat model supports video inputs, efaults to False.

tool_choice_value

(None or str) to use for tool choice when used in tests.

Methods

test_abatch(model)

Test to verify that await model.abatch([messages]) works.

test_ainvoke(model)

Test to verify that await model.ainvoke(simple_message) works.

test_anthropic_inputs(model)

Test that model can process Anthropic-style message histories.

test_astream(model)

Test to verify that await model.astream(simple_message) works.

test_batch(model)

Test to verify that model.batch([messages]) works.

test_bind_runnables_as_tools(model)

Test that the model generates tool calls for tools that are derived from LangChain runnables.

test_conversation(model)

Test to verify that the model can handle multi-turn conversations.

test_image_inputs(model)

Test that the model can process image inputs.

test_image_tool_message(model)

Test that the model can process ToolMessages with image inputs.

test_invoke(model)

Test to verify that model.invoke(simple_message) works.

test_json_mode(model)

Test structured output via `JSON mode.

test_message_with_name(model)

Test that HumanMessage with values for the name field can be handled.

test_stop_sequence(model)

Test that model does not fail when invoked with the stop parameter, which is a standard parameter for stopping generation at a certain token.

test_stream(model)

Test to verify that model.stream(simple_message) works.

test_structured_few_shot_examples(model, ...)

Test that the model can process few-shot examples with tool calls.

test_structured_output(model)

Test to verify structured output is generated both on invoke and stream.

test_structured_output_async(model)

Test to verify structured output is generated both on invoke and stream.

test_structured_output_optional_param(model)

Test to verify we can generate structured output that includes optional parameters.

test_structured_output_pydantic_2_v1(model)

Test to verify we can generate structured output using pydantic.v1.BaseModel.

test_tool_calling(model)

Test that the model generates tool calls.

test_tool_calling_async(model)

Test that the model generates tool calls.

test_tool_calling_with_no_arguments(model)

Test that the model generates tool calls for tools with no arguments.

test_tool_message_error_status(model, ...)

Test that ToolMessage with status="error" can be handled.

test_tool_message_histories_list_content(...)

Test that message histories are compatible with list tool contents (e.g. Anthropic format).

test_tool_message_histories_string_content(...)

Test that message histories are compatible with string tool contents (e.g. OpenAI format).

test_usage_metadata(model)

Test to verify that the model returns correct usage metadata.

test_usage_metadata_streaming(model)

Test to verify that the model returns correct usage metadata in streaming mode.

async test_abatch(model: BaseChatModel) None[source]#

Test to verify that await model.abatch([messages]) works.

This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch asynchronously.

Troubleshooting

First, debug test_batch() and test_ainvoke() because abatch has a default implementation that calls ainvoke for each message in the batch.

If those tests pass but not this one, you should make sure your abatch method does not raise any exceptions, and that it returns a list of valid AIMessage objects.

Parameters:

model (BaseChatModel)

Return type:

None

async test_ainvoke(model: BaseChatModel) None[source]#

Test to verify that await model.ainvoke(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “natively async” implementation, but rather that the model can be used in an async context.

Troubleshooting

First, debug test_invoke(). because ainvoke has a default implementation that calls invoke in an async context.

If that test passes but not this one, you should make sure your _agenerate method does not raise any exceptions, and that it returns a valid ChatResult like so:

return ChatResult(
    generations=[ChatGeneration(
        message=AIMessage(content="Output text")
    )]
)
Parameters:

model (BaseChatModel)

Return type:

None

test_anthropic_inputs(model: BaseChatModel) None[source]#

Test that model can process Anthropic-style message histories.

These message histories will include AIMessage objects with tool_use content blocks, e.g.,

AIMessage(
    [
        {"type": "text", "text": "Hmm let me think about that"},
        {
            "type": "tool_use",
            "input": {"fav_color": "green"},
            "id": "foo",
            "name": "color_picker",
        },
    ]
)

as well as HumanMessage objects containing tool_result content blocks:

HumanMessage(
    [
        {
            "type": "tool_result",
            "tool_use_id": "foo",
            "content": [
                {
                    "type": "text",
                    "text": "green is a great pick! that's my sister's favorite color",  # noqa: E501
                }
            ],
            "is_error": False,
        },
        {"type": "text", "text": "what's my sister's favorite color"},
    ]
)

This test should be skipped if the model does not support messages of this form (or doesn’t support tool calling generally). See Configuration below.

Configuration

To disable this test, set supports_anthropic_inputs to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supports_anthropic_inputs(self) -> bool:
        return False
Troubleshooting

If this test fails, check that:

  1. The model can correctly handle message histories that include message objects with list content.

  2. The tool_calls attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.

  3. HumanMessages with “tool_result” content blocks are correctly handled.

Otherwise, if Anthropic tool call and result formats are not supported, set the supports_anthropic_inputs property to False.

Parameters:

model (BaseChatModel)

Return type:

None

async test_astream(model: BaseChatModel) None[source]#

Test to verify that await model.astream(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “natively async” or “streaming” implementation, but rather that the model can be used in an async streaming context.

Troubleshooting

First, debug test_stream(). and test_ainvoke(). because astream has a default implementation that calls _stream in an async context if it is implemented, or ainvoke and yields the result as a single chunk if not.

If those tests pass but not this one, you should make sure your _astream method does not raise any exceptions, and that it yields valid ChatGenerationChunk objects like so:

yield ChatGenerationChunk(
    message=AIMessageChunk(content="chunk text")
)
Parameters:

model (BaseChatModel)

Return type:

None

test_batch(model: BaseChatModel) None[source]#

Test to verify that model.batch([messages]) works.

This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch.

Troubleshooting

First, debug test_invoke() because batch has a default implementation that calls invoke for each message in the batch.

If that test passes but not this one, you should make sure your batch method does not raise any exceptions, and that it returns a list of valid AIMessage objects.

Parameters:

model (BaseChatModel)

Return type:

None

test_bind_runnables_as_tools(model: BaseChatModel) None[source]#

Test that the model generates tool calls for tools that are derived from LangChain runnables. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that bind_tools is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.

This test may fail if the chat model does not support a tool_choice parameter. This parameter can be used to force a tool call. If tool_choice is not supported and the model consistently fails this test, you can xfail the test:

@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_bind_runnables_as_tools(self, model: BaseChatModel) -> None:
    super().test_bind_runnables_as_tools(model)

Otherwise, ensure that the tool_choice_value property is correctly specified on the test class.

Parameters:

model (BaseChatModel)

Return type:

None

test_conversation(model: BaseChatModel) None[source]#

Test to verify that the model can handle multi-turn conversations.

This should pass for all integrations. Tests the model’s ability to process a sequence of alternating human and AI messages as context for generating the next response.

Troubleshooting

First, debug test_invoke() because this test also uses model.invoke().

If that test passes but not this one, you should verify that: 1. Your model correctly processes the message history 2. The model maintains appropriate context from previous messages 3. The response is a valid AIMessage

Parameters:

model (BaseChatModel)

Return type:

None

test_image_inputs(model: BaseChatModel) None[source]#

Test that the model can process image inputs.

This test should be skipped (see Configuration below) if the model does not support image inputs These will take the form of messages with OpenAI-style image content blocks:

[
    {"type": "text", "text": "describe the weather in this image"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
    },
]

See https://python.langchain.com/docs/concepts/multimodality/

Configuration

To disable this test, set supports_image_inputs to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supports_image_inputs(self) -> bool:
        return False
Troubleshooting

If this test fails, check that the model can correctly handle messages with image content blocks in OpenAI format, including base64-encoded images. Otherwise, set the supports_image_inputs property to False.

Parameters:

model (BaseChatModel)

Return type:

None

test_image_tool_message(model: BaseChatModel) None[source]#

Test that the model can process ToolMessages with image inputs.

This test should be skipped if the model does not support messages of the form:

ToolMessage(
    content=[
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
    tool_call_id="1",
    name="random_image",
)

This test can be skipped by setting the supports_image_tool_message property to False (see Configuration below).

Configuration

To disable this test, set supports_image_tool_message to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supports_image_tool_message(self) -> bool:
        return False
Troubleshooting

If this test fails, check that the model can correctly handle messages with image content blocks in ToolMessages, including base64-encoded images. Otherwise, set the supports_image_tool_message property to False.

Parameters:

model (BaseChatModel)

Return type:

None

test_invoke(model: BaseChatModel) None[source]#

Test to verify that model.invoke(simple_message) works.

This should pass for all integrations.

Troubleshooting

If this test fails, you should make sure your _generate method does not raise any exceptions, and that it returns a valid ChatResult like so:

return ChatResult(
    generations=[ChatGeneration(
        message=AIMessage(content="Output text")
    )]
)
Parameters:

model (BaseChatModel)

Return type:

None

test_json_mode(model: BaseChatModel) None[source]#

Test structured output via JSON mode.

This test is optional and should be skipped if the model does not support the JSON mode feature (see Configuration below).

Configuration

To disable this test, set supports_json_mode to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supports_json_mode(self) -> bool:
        return False
Troubleshooting
Parameters:

model (BaseChatModel)

Return type:

None

test_message_with_name(model: BaseChatModel) None[source]#

Test that HumanMessage with values for the name field can be handled.

These messages may take the form:

HumanMessage("hello", name="example_user")

If possible, the name field should be parsed and passed appropriately to the model. Otherwise, it should be ignored.

Troubleshooting

If this test fails, check that the name field on HumanMessage objects is either ignored or passed to the model appropriately.

Parameters:

model (BaseChatModel)

Return type:

None

test_stop_sequence(model: BaseChatModel) None[source]#

Test that model does not fail when invoked with the stop parameter, which is a standard parameter for stopping generation at a certain token.

More on standard parameters here: https://python.langchain.com/docs/concepts/chat_models/#standard-parameters

This should pass for all integrations.

Troubleshooting

If this test fails, check that the function signature for _generate (as well as _stream and async variants) accepts the stop parameter:

def _generate(
    self,
    messages: List[BaseMessage],
    stop: Optional[List[str]] = None,
    run_manager: Optional[CallbackManagerForLLMRun] = None,
    **kwargs: Any,
) -> ChatResult:
Parameters:

model (BaseChatModel)

Return type:

None

test_stream(model: BaseChatModel) None[source]#

Test to verify that model.stream(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “streaming” implementation, but rather that the model can be used in a streaming context.

Troubleshooting

First, debug test_invoke(). because stream has a default implementation that calls invoke and yields the result as a single chunk.

If that test passes but not this one, you should make sure your _stream method does not raise any exceptions, and that it yields valid ChatGenerationChunk objects like so:

yield ChatGenerationChunk(
    message=AIMessageChunk(content="chunk text")
)
Parameters:

model (BaseChatModel)

Return type:

None

test_structured_few_shot_examples(model: BaseChatModel, my_adder_tool: BaseTool) None[source]#

Test that the model can process few-shot examples with tool calls.

These are represented as a sequence of messages of the following form:

  • HumanMessage with string content;

  • AIMessage with the tool_calls attribute populated;

  • ToolMessage with string content;

  • AIMessage with string content (an answer);

  • HuamnMessage with string content (a follow-up question).

This test should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

This test uses a utility function in langchain_core to generate a sequence of messages representing “few-shot” examples: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.tool_example_to_messages.html

If this test fails, check that the model can correctly handle this sequence of messages.

You can xfail the test if tool calling is implemented but this format is not supported.

@pytest.mark.xfail(reason=("Not implemented."))
def test_structured_few_shot_examples(self, *args: Any) -> None:
    super().test_structured_few_shot_examples(*args)
Parameters:
Return type:

None

test_structured_output(model: BaseChatModel) None[source]#

Test to verify structured output is generated both on invoke and stream.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, ensure that the model’s bind_tools method properly handles both JSON Schema and Pydantic V2 models. langchain_core implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html

See example implementation of with_structured_output here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output

Parameters:

model (BaseChatModel)

Return type:

None

async test_structured_output_async(model: BaseChatModel) None[source]#

Test to verify structured output is generated both on invoke and stream.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, ensure that the model’s bind_tools method properly handles both JSON Schema and Pydantic V2 models. langchain_core implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html

See example implementation of with_structured_output here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output

Parameters:

model (BaseChatModel)

Return type:

None

test_structured_output_optional_param(model: BaseChatModel) None[source]#

Test to verify we can generate structured output that includes optional parameters.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, ensure that the model’s bind_tools method properly handles Pydantic V2 models with optional parameters. langchain_core implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html

See example implementation of with_structured_output here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output

Parameters:

model (BaseChatModel)

Return type:

None

test_structured_output_pydantic_2_v1(model: BaseChatModel) None[source]#

Test to verify we can generate structured output using pydantic.v1.BaseModel.

pydantic.v1.BaseModel is available in the pydantic 2 package.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, ensure that the model’s bind_tools method properly handles both JSON Schema and Pydantic V1 models. langchain_core implements a utility function that will accommodate most formats: https://python.langchain.com/api_reference/core/utils/langchain_core.utils.function_calling.convert_to_openai_tool.html

See example implementation of with_structured_output here: https://python.langchain.com/api_reference/_modules/langchain_openai/chat_models/base.html#BaseChatOpenAI.with_structured_output

Parameters:

model (BaseChatModel)

Return type:

None

test_tool_calling(model: BaseChatModel) None[source]#

Test that the model generates tool calls. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that bind_tools is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.

This test may fail if the chat model does not support a tool_choice parameter. This parameter can be used to force a tool call. If tool_choice is not supported and the model consistently fails this test, you can xfail the test:

@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_tool_calling(self, model: BaseChatModel) -> None:
    super().test_tool_calling(model)

Otherwise, ensure that the tool_choice_value property is correctly specified on the test class.

Parameters:

model (BaseChatModel)

Return type:

None

async test_tool_calling_async(model: BaseChatModel) None[source]#

Test that the model generates tool calls. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that bind_tools is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model.

This test may fail if the chat model does not support a tool_choice parameter. This parameter can be used to force a tool call. If tool_choice is not supported and the model consistently fails this test, you can xfail the test:

@pytest.mark.xfail(reason=("Does not support tool_choice."))
async def test_tool_calling_async(self, model: BaseChatModel) -> None:
    await super().test_tool_calling_async(model)

Otherwise, ensure that the tool_choice_value property is correctly specified on the test class.

Parameters:

model (BaseChatModel)

Return type:

None

test_tool_calling_with_no_arguments(model: BaseChatModel) None[source]#

Test that the model generates tool calls for tools with no arguments. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that bind_tools is implemented to correctly translate LangChain tool objects into the appropriate schema for your chat model. It should correctly handle the case where a tool has no arguments.

This test may fail if the chat model does not support a tool_choice parameter. This parameter can be used to force a tool call. It may also fail if a provider does not support this form of tool. In these cases, you can xfail the test:

@pytest.mark.xfail(reason=("Does not support tool_choice."))
def test_tool_calling_with_no_arguments(self, model: BaseChatModel) -> None:
    super().test_tool_calling_with_no_arguments(model)

Otherwise, ensure that the tool_choice_value property is correctly specified on the test class.

Parameters:

model (BaseChatModel)

Return type:

None

test_tool_message_error_status(model: BaseChatModel, my_adder_tool: BaseTool) None[source]#

Test that ToolMessage with status="error" can be handled.

These messages may take the form:

ToolMessage(
    "Error: Missing required argument 'b'.",
    name="my_adder_tool",
    tool_call_id="abc123",
    status="error",
)

If possible, the status field should be parsed and passed appropriately to the model.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that the status field on ToolMessage objects is either ignored or passed to the model appropriately.

Otherwise, ensure that the tool_choice_value property is correctly specified on the test class.

Parameters:
Return type:

None

test_tool_message_histories_list_content(model: BaseChatModel, my_adder_tool: BaseTool) None[source]#

Test that message histories are compatible with list tool contents (e.g. Anthropic format).

These message histories will include AIMessage objects with “tool use” and content blocks, e.g.,

[
    {"type": "text", "text": "Hmm let me think about that"},
    {
        "type": "tool_use",
        "input": {"fav_color": "green"},
        "id": "foo",
        "name": "color_picker",
    },
]

This test should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that:

  1. The model can correctly handle message histories that include AIMessage objects with list content.

  2. The tool_calls attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.

  3. The model can correctly handle ToolMessage objects with string content and arbitrary string values for tool_call_id.

You can xfail the test if tool calling is implemented but this format is not supported.

@pytest.mark.xfail(reason=("Not implemented."))
def test_tool_message_histories_list_content(self, *args: Any) -> None:
    super().test_tool_message_histories_list_content(*args)
Parameters:
Return type:

None

test_tool_message_histories_string_content(model: BaseChatModel, my_adder_tool: BaseTool) None[source]#

Test that message histories are compatible with string tool contents (e.g. OpenAI format). If a model passes this test, it should be compatible with messages generated from providers following OpenAI format.

This test should be skipped if the model does not support tool calling (see Configuration below).

Configuration

To disable tool calling tests, set has_tool_calling to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def has_tool_calling(self) -> bool:
        return False
Troubleshooting

If this test fails, check that:

  1. The model can correctly handle message histories that include AIMessage objects with "" content.

  2. The tool_calls attribute on AIMessage objects is correctly handled and passed to the model in an appropriate format.

  3. The model can correctly handle ToolMessage objects with string content and arbitrary string values for tool_call_id.

You can xfail the test if tool calling is implemented but this format is not supported.

@pytest.mark.xfail(reason=("Not implemented."))
def test_tool_message_histories_string_content(self, *args: Any) -> None:
    super().test_tool_message_histories_string_content(*args)
Parameters:
Return type:

None

test_usage_metadata(model: BaseChatModel) None[source]#

Test to verify that the model returns correct usage metadata.

This test is optional and should be skipped if the model does not return usage metadata (see Configuration below).

Configuration

By default, this test is run. To disable this feature, set returns_usage_metadata to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def returns_usage_metadata(self) -> bool:
        return False

This test can also check the format of specific kinds of usage metadata based on the supported_usage_metadata_details property. This property should be configured as follows with the types of tokens that the model supports tracking:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supported_usage_metadata_details(self) -> dict:
        return {
            "invoke": [
                "audio_input",
                "audio_output",
                "reasoning_output",
                "cache_read_input",
                "cache_creation_input",
            ],
            "stream": [
                "audio_input",
                "audio_output",
                "reasoning_output",
                "cache_read_input",
                "cache_creation_input",
            ],
        }
Troubleshooting

If this test fails, first verify that your model returns UsageMetadata dicts attached to the returned AIMessage object in _generate:

return ChatResult(
    generations=[ChatGeneration(
        message=AIMessage(
            content="Output text",
            usage_metadata={
                "input_tokens": 350,
                "output_tokens": 240,
                "total_tokens": 590,
                "input_token_details": {
                    "audio": 10,
                    "cache_creation": 200,
                    "cache_read": 100,
                },
                "output_token_details": {
                    "audio": 10,
                    "reasoning": 200,
                }
            }
        )
    )]
)
Parameters:

model (BaseChatModel)

Return type:

None

test_usage_metadata_streaming(model: BaseChatModel) None[source]#

Test to verify that the model returns correct usage metadata in streaming mode.

Configuration

By default, this test is run. To disable this feature, set returns_usage_metadata to False in your test class:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def returns_usage_metadata(self) -> bool:
        return False

This test can also check the format of specific kinds of usage metadata based on the supported_usage_metadata_details property. This property should be configured as follows with the types of tokens that the model supports tracking:

class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def supported_usage_metadata_details(self) -> dict:
        return {
            "invoke": [
                "audio_input",
                "audio_output",
                "reasoning_output",
                "cache_read_input",
                "cache_creation_input",
            ],
            "stream": [
                "audio_input",
                "audio_output",
                "reasoning_output",
                "cache_read_input",
                "cache_creation_input",
            ],
        }
Troubleshooting

If this test fails, first verify that your model yields UsageMetadata dicts attached to the returned AIMessage object in _stream that sum up to the total usage metadata.

Note that input_tokens should only be included on one of the chunks (typically the first or the last chunk), and the rest should have 0 or None to avoid counting input tokens multiple times.

output_tokens typically count the number of tokens in each chunk, not the sum. This test will pass as long as the sum of output_tokens across all chunks is not 0.

yield ChatResult(
    generations=[ChatGeneration(
        message=AIMessage(
            content="Output text",
            usage_metadata={
                "input_tokens": (
                    num_input_tokens if is_first_chunk else 0
                ),
                "output_tokens": 11,
                "total_tokens": (
                    11+num_input_tokens if is_first_chunk else 11
                ),
                "input_token_details": {
                    "audio": 10,
                    "cache_creation": 200,
                    "cache_read": 100,
                },
                "output_token_details": {
                    "audio": 10,
                    "reasoning": 200,
                }
            }
        )
    )]
)
Parameters:

model (BaseChatModel)

Return type:

None