ChatModelIntegrationTests#

class langchain_tests.integration_tests.chat_models.ChatModelIntegrationTests[source]#

Base class for chat model integration tests.

Test subclasses must implement the chat_model_class and chat_model_params properties to specify what model to test and its initialization parameters.

Example:

from typing import Type

from langchain_tests.integration_tests import ChatModelIntegrationTests
from my_package.chat_models import MyChatModel


class TestMyChatModelIntegration(ChatModelIntegrationTests):
    @property
    def chat_model_class(self) -> Type[MyChatModel]:
        # Return the chat model class to test here
        return MyChatModel

    @property
    def chat_model_params(self) -> dict:
        # Return initialization parameters for the model.
        return {"model": "model-001", "temperature": 0}

Note

API references for individual test methods include troubleshooting tips.

Test subclasses must implement the following two properties:

chat_model_class

The chat model class to test, e.g., ChatParrotLink.

Example:

@property
def chat_model_class(self) -> Type[ChatParrotLink]:
    return ChatParrotLink

chat_model_params

Initialization parameters for the chat model.

Example:

@property
def chat_model_params(self) -> dict:
    return {"model": "bird-brain-001", "temperature": 0}

In addition, test subclasses can control what features are tested (such as tool calling or multi-modality) by selectively overriding the following properties. Expand to see details:

enable_vcr_tests

Property controlling whether to enable select tests that rely on VCR caching of HTTP calls, such as benchmarking tests.

To enable these tests, follow these steps:

Override the enable_vcr_tests property to return True:

@property
def enable_vcr_tests(self) -> bool:
    return True

Configure VCR to exclude sensitive headers and other information from cassettes.

Important

VCR will by default record authentication headers and other sensitive information in cassettes. Read below for how to configure what information is recorded in cassettes.

To add configuration to VCR, add a conftest.py file to the tests/ directory and implement the vcr_config fixture there.

langchain-tests excludes the headers 'authorization', 'x-api-key', and 'api-key' from VCR cassettes. To pick up this configuration, you will need to add conftest.py as shown below. You can also exclude additional headers, override the default exclusions, or apply other customizations to the VCR configuration. See example below:

tests/conftest.py#

import pytest
from langchain_tests.conftest import _base_vcr_config as _base_vcr_config

_EXTRA_HEADERS = [
    # Specify additional headers to redact
    ("user-agent", "PLACEHOLDER"),
]


def remove_response_headers(response: dict) -> dict:
    # If desired, remove or modify headers in the response.
    response["headers"] = {}
    return response


@pytest.fixture(scope="session")
def vcr_config(_base_vcr_config: dict) -> dict:  # noqa: F811
    """Extend the default configuration from langchain_tests."""
    config = _base_vcr_config.copy()
    config.setdefault("filter_headers", []).extend(_EXTRA_HEADERS)
    config["before_record_response"] = remove_response_headers

    return config

Run tests to generate VCR cassettes.
Example:
uv run python -m pytest tests/integration_tests/test_chat_models.py::TestMyModel::test_stream_time
This will generate a VCR cassette for the test in tests/integration_tests/cassettes/.

Important

You should inspect the generated cassette to ensure that it does not contain sensitive information. If it does, you can modify the vcr_config fixture to exclude headers or modify the response before it is recorded.

You can then commit the cassette to your repository. Subsequent test runs will use the cassette instead of making HTTP calls.

Attributes

`chat_model_class`	The chat model class to test, e.g., `ChatParrotLink`.
`chat_model_params`	Initialization parameters for the chat model.
`enable_vcr_tests`	(bool) whether to enable VCR tests for the chat model.
`has_structured_output`	(bool) whether the chat model supports structured output.
`has_tool_calling`	(bool) whether the model supports tool calling.
`has_tool_choice`	(bool) whether the model supports tool calling.
`returns_usage_metadata`	(bool) whether the chat model returns usage metadata on invoke and streaming responses.
`structured_output_kwargs`	If specified, additional kwargs for with_structured_output.
`supported_usage_metadata_details`	(dict) what usage metadata details are emitted in invoke and stream.
`supports_anthropic_inputs`	(bool) whether the chat model supports Anthropic-style inputs.
`supports_audio_inputs`	(bool) whether the chat model supports audio inputs, defaults to `False`.
`supports_image_inputs`	(bool) whether the chat model supports image inputs, defaults to `False`.
`supports_image_tool_message`	(bool) whether the chat model supports ToolMessages that include image content.
`supports_image_urls`	(bool) whether the chat model supports image inputs from URLs, defaults to `False`.
`supports_json_mode`	(bool) whether the chat model supports JSON mode.
`supports_pdf_inputs`	(bool) whether the chat model supports PDF inputs, defaults to `False`.
`supports_video_inputs`	(bool) whether the chat model supports video inputs, defaults to `False`.
`tool_choice_value`	(None or str) to use for tool choice when used in tests.

Methods

`test_abatch`(model)	Test to verify that await model.abatch([messages]) works.
`test_agent_loop`(model)	Test that the model supports a simple ReAct agent loop.
`test_ainvoke`(model)	Test to verify that await model.ainvoke(simple_message) works.
`test_anthropic_inputs`(model)	Test that model can process Anthropic-style message histories.
`test_astream`(model)	Test to verify that await model.astream(simple_message) works.
`test_audio_inputs`(model)	Test that the model can process audio inputs.
`test_batch`(model)	Test to verify that model.batch([messages]) works.
`test_bind_runnables_as_tools`(model)	Test that the model generates tool calls for tools that are derived from LangChain runnables.
`test_conversation`(model)	Test to verify that the model can handle multi-turn conversations.
`test_double_messages_conversation`(model)	Test to verify that the model can handle double-message conversations.
`test_image_inputs`(model)	Test that the model can process image inputs.
`test_image_tool_message`(model)	Test that the model can process ToolMessages with image inputs.
`test_invoke`(model)	Test to verify that model.invoke(simple_message) works.
`test_json_mode`(model)	Test structured output via `JSON mode.
`test_message_with_name`(model)	Test that HumanMessage with values for the `name` field can be handled.
`test_pdf_inputs`(model)	Test that the model can process PDF inputs.
`test_stop_sequence`(model)	Test that model does not fail when invoked with the `stop` parameter, which is a standard parameter for stopping generation at a certain token.
`test_stream`(model)	Test to verify that model.stream(simple_message) works.
`test_stream_time`(model, benchmark, vcr)	Test that streaming does not introduce undue overhead.
`test_structured_few_shot_examples`(model, ...)	Test that the model can process few-shot examples with tool calls.
`test_structured_output`(model, schema_type)	Test to verify structured output is generated both on invoke and stream.
`test_structured_output_async`(model, schema_type)	Test to verify structured output is generated both on invoke and stream.
`test_structured_output_optional_param`(model)	Test to verify we can generate structured output that includes optional parameters.
`test_structured_output_pydantic_2_v1`(model)	Test to verify we can generate structured output using pydantic.v1.BaseModel.
`test_tool_calling`(model)	Test that the model generates tool calls.
`test_tool_calling_async`(model)	Test that the model generates tool calls.
`test_tool_calling_with_no_arguments`(model)	Test that the model generates tool calls for tools with no arguments.
`test_tool_choice`(model)	Test that the model can force tool calling via the `tool_choice` parameter.
`test_tool_message_error_status`(model, ...)	Test that ToolMessage with `status="error"` can be handled.
`test_tool_message_histories_list_content`(...)	Test that message histories are compatible with list tool contents (e.g. Anthropic format).
`test_tool_message_histories_string_content`(...)	Test that message histories are compatible with string tool contents (e.g. OpenAI format).
`test_unicode_tool_call_integration`(model, *)	Generic integration test for Unicode characters in tool calls.
`test_usage_metadata`(model)	Test to verify that the model returns correct usage metadata.
`test_usage_metadata_streaming`(model)	Test usage metadata in streaming mode.

async test_abatch( model: BaseChatModel, ) → None[source]#

Test to verify that await model.abatch([messages]) works.

This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch asynchronously.

Parameters:: model (BaseChatModel)
Return type:: None

test_agent_loop( model: BaseChatModel, ) → None[source]#

Test that the model supports a simple ReAct agent loop. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

async test_ainvoke( model: BaseChatModel, ) → None[source]#

Test to verify that await model.ainvoke(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “natively async” implementation, but rather that the model can be used in an async context.

Parameters:: model (BaseChatModel)
Return type:: None

test_anthropic_inputs( model: BaseChatModel, ) → None[source]#

Test that model can process Anthropic-style message histories.

These message histories will include AIMessage objects with tool_use content blocks, e.g.,

AIMessage(
    [
        {"type": "text", "text": "Hmm let me think about that"},
        {
            "type": "tool_use",
            "input": {"fav_color": "green"},
            "id": "foo",
            "name": "color_picker",
        },
    ]
)

as well as HumanMessage objects containing tool_result content blocks:

HumanMessage(
    [
        {
            "type": "tool_result",
            "tool_use_id": "foo",
            "content": [
                {
                    "type": "text",
                    "text": "green is a great pick! that's my sister's favorite color",  # noqa: E501
                }
            ],
            "is_error": False,
        },
        {"type": "text", "text": "what's my sister's favorite color"},
    ]
)

This test should be skipped if the model does not support messages of this form (or doesn’t support tool calling generally). See Configuration below.

Parameters:: model (BaseChatModel)
Return type:: None

async test_astream( model: BaseChatModel, ) → None[source]#

Test to verify that await model.astream(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “natively async” or “streaming” implementation, but rather that the model can be used in an async streaming context.

Parameters:: model (BaseChatModel)
Return type:: None

test_audio_inputs( model: BaseChatModel, ) → None[source]#

Test that the model can process audio inputs.

This test should be skipped (see Configuration below) if the model does not support audio inputs. These will take the form:

{
    "type": "audio",
    "source_type": "base64",
    "data": "<base64 audio data>",
    "mime_type": "audio/wav",  # or appropriate mime-type
}

See https://python.langchain.com/docs/concepts/multimodality/

Parameters:: model (BaseChatModel)
Return type:: None

test_batch( model: BaseChatModel, ) → None[source]#

Test to verify that model.batch([messages]) works.

This should pass for all integrations. Tests the model’s ability to process multiple prompts in a single batch.

Parameters:: model (BaseChatModel)
Return type:: None

test_bind_runnables_as_tools( model: BaseChatModel, ) → None[source]#

Test that the model generates tool calls for tools that are derived from LangChain runnables. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_conversation( model: BaseChatModel, ) → None[source]#

Test to verify that the model can handle multi-turn conversations.

This should pass for all integrations. Tests the model’s ability to process a sequence of alternating human and AI messages as context for generating the next response.

Parameters:: model (BaseChatModel)
Return type:: None

test_double_messages_conversation( model: BaseChatModel, ) → None[source]#

Test to verify that the model can handle double-message conversations.

This should pass for all integrations. Tests the model’s ability to process a sequence of double-system, double-human, and double-ai messages as context for generating the next response.

Parameters:: model (BaseChatModel)
Return type:: None

test_image_inputs( model: BaseChatModel, ) → None[source]#

Test that the model can process image inputs.

This test should be skipped (see Configuration below) if the model does not support image inputs. These will take the form:

{
    "type": "image",
    "source_type": "base64",
    "data": "<base64 image data>",
    "mime_type": "image/jpeg",  # or appropriate mime-type
}

For backward-compatibility, we must also support OpenAI-style image content blocks:

[
    {"type": "text", "text": "describe the weather in this image"},
    {
        "type": "image_url",
        "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
    },
]

See https://python.langchain.com/docs/concepts/multimodality/

If the property supports_image_urls is set to True, the test will also check that we can process content blocks of the form:

{
    "type": "image",
    "source_type": "url",
    "url": "<url>",
}

Parameters:: model (BaseChatModel)
Return type:: None

test_image_tool_message( model: BaseChatModel, ) → None[source]#

Test that the model can process ToolMessages with image inputs.

This test should be skipped if the model does not support messages of the form:

ToolMessage(
    content=[
        {
            "type": "image_url",
            "image_url": {"url": f"data:image/jpeg;base64,{image_data}"},
        },
    ],
    tool_call_id="1",
    name="random_image",
)

containing image content blocks in OpenAI Chat Completions format, in addition to messages of the form:

ToolMessage(
    content=[
        {
            "type": "image",
            "source_type": "base64",
            "data": image_data,
            "mime_type": "image/jpeg",
        },
    ],
    tool_call_id="1",
    name="random_image",
)

containing image content blocks in standard format.

This test can be skipped by setting the supports_image_tool_message property to False (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_invoke( model: BaseChatModel, ) → None[source]#

Test to verify that model.invoke(simple_message) works.

This should pass for all integrations.

Parameters:: model (BaseChatModel)
Return type:: None

test_json_mode( model: BaseChatModel, ) → None[source]#

Test structured output via JSON mode..

This test is optional and should be skipped if the model does not support the JSON mode feature (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_message_with_name( model: BaseChatModel, ) → None[source]#

Test that HumanMessage with values for the name field can be handled.

These messages may take the form:

HumanMessage("hello", name="example_user")

If possible, the name field should be parsed and passed appropriately to the model. Otherwise, it should be ignored.

Parameters:: model (BaseChatModel)
Return type:: None

test_pdf_inputs( model: BaseChatModel, ) → None[source]#

Test that the model can process PDF inputs.

This test should be skipped (see Configuration below) if the model does not support PDF inputs. These will take the form:

{
    "type": "image",
    "source_type": "base64",
    "data": "<base64 image data>",
    "mime_type": "application/pdf",
}

See https://python.langchain.com/docs/concepts/multimodality/

Parameters:: model (BaseChatModel)
Return type:: None

test_stop_sequence( model: BaseChatModel, ) → None[source]#

Test that model does not fail when invoked with the stop parameter, which is a standard parameter for stopping generation at a certain token.

More on standard parameters here: https://python.langchain.com/docs/concepts/chat_models/#standard-parameters

This should pass for all integrations.

Parameters:: model (BaseChatModel)
Return type:: None

test_stream( model: BaseChatModel, ) → None[source]#

Test to verify that model.stream(simple_message) works.

This should pass for all integrations. Passing this test does not indicate a “streaming” implementation, but rather that the model can be used in a streaming context.

Parameters:: model (BaseChatModel)
Return type:: None

test_stream_time( model: BaseChatModel, benchmark: BenchmarkFixture, vcr: Cassette, ) → None[source]#

Test that streaming does not introduce undue overhead.

See enable_vcr_tests dropdown above for more information.

Parameters:

model (BaseChatModel)
benchmark (BenchmarkFixture)
vcr (Cassette)

Return type:

None

test_structured_few_shot_examples( model: BaseChatModel, my_adder_tool: BaseTool, ) → None[source]#

Test that the model can process few-shot examples with tool calls.

These are represented as a sequence of messages of the following form:

HumanMessage with string content;
AIMessage with the tool_calls attribute populated;
ToolMessage with string content;
AIMessage with string content (an answer);
HumanMessage with string content (a follow-up question).

This test should be skipped if the model does not support tool calling (see Configuration below).

Parameters:

model (BaseChatModel)
my_adder_tool (BaseTool)

Return type:

None

test_structured_output( model: BaseChatModel, schema_type: str, ) → None[source]#

Test to verify structured output is generated both on invoke and stream.

This test is optional and should be skipped if the model does not support structured output (see Configuration below).

Parameters:

model (BaseChatModel)
schema_type (str)

Return type:

None

async test_structured_output_async( model: BaseChatModel, schema_type: str, ) → None[source]#

Test to verify structured output is generated both on invoke and stream.

This test is optional and should be skipped if the model does not support structured output (see Configuration below).

Parameters:

model (BaseChatModel)
schema_type (str)

Return type:

None

test_structured_output_optional_param( model: BaseChatModel, ) → None[source]#

Test to verify we can generate structured output that includes optional parameters.

This test is optional and should be skipped if the model does not support structured output (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_structured_output_pydantic_2_v1( model: BaseChatModel, ) → None[source]#

Test to verify we can generate structured output using pydantic.v1.BaseModel.

pydantic.v1.BaseModel is available in the pydantic 2 package.

This test is optional and should be skipped if the model does not support structured output (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_tool_calling( model: BaseChatModel, ) → None[source]#

Test that the model generates tool calls. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

async test_tool_calling_async( model: BaseChatModel, ) → None[source]#

Test that the model generates tool calls. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_tool_calling_with_no_arguments( model: BaseChatModel, ) → None[source]#

Test that the model generates tool calls for tools with no arguments. This test is skipped if the has_tool_calling property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_tool_choice( model: BaseChatModel, ) → None[source]#

Test that the model can force tool calling via the tool_choice parameter. This test is skipped if the has_tool_choice property on the test class is set to False.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:: model (BaseChatModel)
Return type:: None

test_tool_message_error_status( model: BaseChatModel, my_adder_tool: BaseTool, ) → None[source]#

Test that ToolMessage with status="error" can be handled.

These messages may take the form:

ToolMessage(
    "Error: Missing required argument 'b'.",
    name="my_adder_tool",
    tool_call_id="abc123",
    status="error",
)

If possible, the status field should be parsed and passed appropriately to the model.

This test is optional and should be skipped if the model does not support tool calling (see Configuration below).

Parameters:

model (BaseChatModel)
my_adder_tool (BaseTool)

Return type:

None

test_tool_message_histories_list_content( model: BaseChatModel, my_adder_tool: BaseTool, ) → None[source]#

Test that message histories are compatible with list tool contents (e.g. Anthropic format).

These message histories will include AIMessage objects with “tool use” and content blocks, e.g.,

[
    {"type": "text", "text": "Hmm let me think about that"},
    {
        "type": "tool_use",
        "input": {"fav_color": "green"},
        "id": "foo",
        "name": "color_picker",
    },
]

This test should be skipped if the model does not support tool calling (see Configuration below).

Parameters:

model (BaseChatModel)
my_adder_tool (BaseTool)

Return type:

None

test_tool_message_histories_string_content( model: BaseChatModel, my_adder_tool: BaseTool, ) → None[source]#

Test that message histories are compatible with string tool contents (e.g. OpenAI format). If a model passes this test, it should be compatible with messages generated from providers following OpenAI format.

This test should be skipped if the model does not support tool calling (see Configuration below).

assert tool_call.get(“type”) == “tool_call”

You can xfail the test if tool calling is implemented but this format is not supported.

@pytest.mark.xfail(reason=("Not implemented."))
def test_tool_message_histories_string_content(self, *args: Any) -> None:
    super().test_tool_message_histories_string_content(*args)

Parameters:

model (BaseChatModel)
my_adder_tool (BaseTool)

Return type:

None

test_unicode_tool_call_integration( model: BaseChatModel, *, tool_choice: str | None = None, force_tool_call: bool = True, ) → None[source]#

Generic integration test for Unicode characters in tool calls.

Parameters:

model (BaseChatModel) – The chat model to test
tool_choice (str | None) – Tool choice parameter to pass to bind_tools (provider-specific)
force_tool_call (bool) – Whether to force a tool call (use tool_choice=True if None)

Return type:

None

Tests that Unicode characters in tool call arguments are preserved correctly, not escaped as uXXXX sequences.

test_usage_metadata( model: BaseChatModel, ) → None[source]#

Test to verify that the model returns correct usage metadata.

This test is optional and should be skipped if the model does not return usage metadata (see Configuration below).

Changed in version 0.3.17: Additionally check for the presence of model_name in the response metadata, which is needed for usage tracking in callback handlers.

Parameters:: model (BaseChatModel)
Return type:: None

test_usage_metadata_streaming( model: BaseChatModel, ) → None[source]#

Test usage metadata in streaming mode.

Test to verify that the model returns correct usage metadata in streaming mode.

Changed in version 0.3.17: Additionally check for the presence of model_name in the response metadata, which is needed for usage tracking in callback handlers.

Parameters:: model (BaseChatModel)
Return type:: None