# LangChain

## High level

[Why LangChain?](https://python.langchain.com/docs/concepts/why_langchain/): considering using LangChain, when building complex AI applications, and when needing to evaluate AI applications This page discusses the main reasons to use LangChain: standardized component interfaces, orchestration capabilities, and observability/evaluation through LangSmith
[Architecture](https://python.langchain.com/docs/concepts/architecture/): needing an overview of the LangChain architecture, exploring the various packages and components, or deciding which parts to use for a specific application. Provides a high-level overview of the different packages that make up the LangChain framework, including langchain-core, langchain, integration packages, langchain-community, langgraph, langserve, and LangSmith.

## Concepts

[Chat Models](https://python.langchain.com/docs/concepts/chat_models/): building applications using chat models, learning about chat model interfaces and features, or interested in integrating chat models with external tools and services. Provides an overview of chat models in LangChain, including their features, integration options, interfaces, tool calling, structured outputs, multimodality, context windows, and advanced topics like rate-limiting and caching.
[Messages](https://python.langchain.com/docs/concepts/messages/): querying LangChain's chat message format, understanding different message types, building chat applications. Messages are the unit of communication in chat models, representing input/output with roles, content, metadata. Covers SystemMessage, HumanMessage, AIMessage, AIMessageChunk, ToolMessage, RemoveMessage, and legacy FunctionMessage.
[Chat history](https://python.langchain.com/docs/concepts/chat_history/): dealing with chat history, managing chat context, or understanding conversation patterns. Covers chat history structure, conversation patterns between user/assistant/tools, and guidelines for managing chat history to stay within context window.
[Tools](https://python.langchain.com/docs/concepts/tools/): needing an overview of tools in LangChain, wanting to create custom tools, or learning how to pass runtime values to tools. Tools are a way to encapsulate functions with schemas that can be passed to chat models supporting tool calling. The page covers the tool interface, creating tools using the @tool decorator, configuring tool schemas, tool artifacts, special type annotations like InjectedToolArg, and toolkits.
[tool calling](https://python.langchain.com/docs/concepts/tool_calling/): needing to understand how to enable tool calling functionality, how to create tools from functions, how to bind tools to a model that supports tool calling. The page covers the key concepts of tool calling, including tool creation using decorators, tool binding to models, tool calling by models, and tool execution. It provides an overview, recommended usage, and best practices.
[structured outputs](https://python.langchain.com/docs/concepts/structured_outputs/): it needs to return output in a structured format, when working with databases or APIs that require structured data, or when building applications with structured responses. Covers structured output concepts like schema definition and methods like tool calling and JSON mode, as well as helper functions, to instruct models to produce structured outputs conforming to a given schema.
[Memory](https://langchain-ai.github.io/langgraph/concepts/memory/): developing agents with memory capabilities, implementing memory management strategies, or learning about different types of memory for AI agents. Covers topics related to short-term and long-term memory for agents, techniques for managing conversation history and summarizing past conversations, different types of memory (semantic, episodic, procedural), and approaches for writing memories in the hot path or in the background.
[Multimodality](https://python.langchain.com/docs/concepts/multimodality/): needing to understand multimodal capabilities, using chat models with multimodal inputs, or using multimodal retrieval/embeddings. Discusses ability of LangChain components like chat models, embedding models, and vector stores to handle multimodal data like text, images, audio, video. Covers current status and limitations around multimodal inputs and outputs for chat models.
[invoke](https://python.langchain.com/docs/concepts/runnables/): learning how to use the Runnable interface, when working with custom Runnables, and when needing to configure Runnables at runtime. The page covers the Runnable interface, its methods for invocation, batching, streaming, inspecting schemas, and configuration. It explains RunnableConfig, custom Runnables, and configurable Runnables.
[stream](https://python.langchain.com/docs/concepts/streaming/): [building applications that use streaming, building applications that need to display partial results in real-time, building applications that need to provide updates on pipeline or workflow progress] 'This page covers streaming in LangChain, including what can be streamed in LLM applications, the streaming APIs available, how to write custom data to the stream, and how LangChain automatically enables streaming for chat models in certain cases.'
[LCEL](https://python.langchain.com/docs/concepts/lcel/): needing an overview of the LangChain Expression Language (LCEL), deciding whether to use LCEL or not, and understanding how to compose chains using LCEL primitives. Provides an overview of the LCEL, a declarative approach to building chains from existing Runnables, covering its benefits, composition primitives like RunnableSequence and RunnableParallel, the composition syntax, automatic type coercion, and guidance on when to use LCEL versus alternatives like LangGraph.
[Document Loaders](https://python.langchain.com/docs/concepts/document_loaders/): needing to load data from various sources like files, webpages, or databases, or when handling large datasets with lazy loading. Document loaders help load data from different sources into a standardized Document object format, with options for lazy loading of large datasets.
[Retrieval](https://python.langchain.com/docs/concepts/retrieval/): building retrieval systems, understanding query analysis, integrating with databases This page covers key concepts and techniques in retrieval systems, including query analysis (re-writing and construction), vector and lexical indexes, databases, and LangChain's unified retriever interface.
[Text Splitters](https://python.langchain.com/docs/concepts/text_splitters/): working with long documents, handling limited model input sizes, or optimizing retrieval systems This page discusses different strategies for splitting large texts into smaller chunks, including length-based, text structure-based, document structure-based, and semantic meaning-based approaches.
[Embedding Models](https://python.langchain.com/docs/concepts/embedding_models/): LLM should read this page when: 1) Working with text embeddings for search/retrieval 2) Comparing text similarity using embedding vectors 3) Selecting or integrating text embedding models It covers key concepts of embedding models: converting text to numerical vectors, measuring similarity between vectors, embedding models (historical context, interface, integrations), and common similarity metrics (cosine, Euclidean, dot product).
[Vector stores](https://python.langchain.com/docs/concepts/vectorstores/): LLM should read this page when: 1) Building applications that need to index and retrieve information based on semantic similarity 2) Integrating vector databases into their application 3) Exploring advanced vector search and retrieval techniques Vector stores are specialized data stores that enable indexing and retrieving information based on vector representations (embeddings) of data, allowing semantic similarity search over unstructured data like text, images, and audio. The page covers vector store integrations, the core interface, adding/deleting documents, basic and advanced similarity search techniques, and concepts like metadata filtering.
[Retrievers](https://python.langchain.com/docs/concepts/retrievers/): building a retrieval system, integrating different retrieval sources, or linking retrieved information to source documents. This page outlines the retriever interface in LangChain, common types of retrievers such as vector stores and search APIs, and advanced retrieval patterns like ensembling and retaining source document information.
[Retrieval Augmented Generation (RAG)](https://python.langchain.com/docs/concepts/rag/): developing applications that incorporate retrieval and generation, building question-answering systems with external data sources, or optimizing knowledge retrieval and integration into language models. Covers the concept of Retrieval Augmented Generation (RAG), which combines retrieval systems with language models to utilize external knowledge, access up-to-date information, leverage domain-specific expertise, reduce hallucination, and integrate knowledge cost-effectively.
[Agents](https://python.langchain.com/docs/concepts/agents/): building AI agents or systems that take high-level tasks and perform a series of actions to accomplish them, transitioning from the legacy AgentExecutor to the newer and more flexible LangGraph system. Provides an overview of agents in LangChain, the legacy AgentExecutor concept, resources for using AgentExecutor, and guidance on migrating to the preferred LangGraph architecture for building customizable agents.
[Prompt Templates](https://python.langchain.com/docs/concepts/prompt_templates/): creating prompts for language models, formatting chat messages, slotting messages into specific locations in a prompt. This page covers different types of prompt templates (string, chat, messages placeholder) for formatting prompts for language models and chat models.
[Output Parsers](https://python.langchain.com/docs/concepts/output_parsers/): looking for ways to extract structured data from model outputs, parsing model outputs into different formats, or handling errors in parsing. Covers various LangChain output parsers like JSON, XML, CSV, Pandas DataFrame, along with capabilities like output fixing, retrying, and using user-defined formats.
[Few-shot prompting](https://python.langchain.com/docs/concepts/few_shot_prompting/): needing to improve model performance, when deciding how to format few-shot examples, when selecting examples for few-shot prompting The page covers generating examples, number of examples, selecting examples, and formatting examples for few-shot prompting with language models.
[Example Selectors](https://python.langchain.com/docs/concepts/example_selectors/): selecting examples for few-shot prompting, dynamically choosing examples for prompts, or understanding different example selection techniques. The page covers example selectors, which are classes responsible for selecting and formatting examples to include as part of prompts for improved performance with few-shot learning.
[Async programming](https://python.langchain.com/docs/concepts/async/): building asynchronous applications with LangChain, working with async runnables, or handling async API calls. Explains LangChain's asynchronous APIs, delegation to sync methods, performance considerations, compatibility with asyncio, and usage in Jupyter notebooks.
[Callbacks](https://python.langchain.com/docs/concepts/callbacks/): [needing to log, monitor, or stream events in an LLM application] [This page covers LangChain's callback system, which allows hooking into various stages of an LLM application for logging, monitoring, streaming, and other purposes. It explains the different callback events, callback handlers, and how to pass callbacks.]
[Tracing](https://python.langchain.com/docs/concepts/tracing/): tracing the steps of a chain/agent for debugging, understanding the chain's flow, or inspecting intermediary outputs. Discusses the concept of tracing in LangChain, including that traces contain runs which are individual steps, and that tracing provides observability into chains/agents.
[Evaluation](https://python.langchain.com/docs/concepts/evaluation/): evaluating the performance of LLM-powered applications, creating or curating datasets, defining metrics for evaluation This page covers the concept of evaluation in LangChain, including using LangSmith to create datasets, define metrics, track results over time, and run evaluations automatically.
[Testing](https://python.langchain.com/docs/concepts/testing/): testing LangChain components, implementing unit tests, or setting up integration tests This page explains unit tests, integration tests, and standard tests in LangChain, including code examples

## How-to guides

### Installation

[How to: install LangChain packages](https://python.langchain.com/docs/how_to/installation/): installing LangChain packages, learning about the LangChain ecosystem packages, installing specific ecosystem packages This page explains how to install the main LangChain package, as well as different ecosystem packages like langchain-core, langchain-community, langchain-openai, langchain-experimental, langgraph, langserve, langchain-cli, and langsmith SDK.
[How to: use LangChain with different Pydantic versions](https://python.langchain.com/docs/how_to/pydantic_compatibility/): needing to use LangChain with different Pydantic versions, needing to install Pydantic 2 with LangChain, or avoiding using the pydantic.v1 namespace with LangChain APIs. The page explains that LangChain 0.3 uses Pydantic 2 internally and advises users to install Pydantic 2 and avoid using the pydantic.v1 namespace with LangChain APIs.
[How to: return structured data from a model](https://python.langchain.com/docs/how_to/structured_output/): LLM should read this page when: 1) wanting to return structured data from a model, 2) building applications that require structured outputs, 3) exploring techniques for parsing model outputs into objects or schemas. This page covers methods for obtaining structured outputs from language models, including using .with_structured_output(), prompting techniques with output parsers, and handling complex schemas with few-shot examples.
[How to: use chat models to call tools](https://python.langchain.com/docs/how_to/tool_calling/): needing to call tools from chat models, wanting to use chat models to generate structured output, or doing extraction from text using chat models. Explains how to define tool schemas as Python functions, Pydantic/TypedDict classes, or LangChain Tools; bind them to chat models; retrieve tool calls from LLM responses; and optionally parse tool calls into structured objects.
[How to: stream runnables](https://python.langchain.com/docs/how_to/streaming/): Line 1: 'wanting to learn how to stream LLM responses, stream intermediate steps, and configure streaming events.' Line 2: 'This page covers how to use the `stream` and `astream` methods to stream final outputs, how to use `astream_events` to stream both final outputs and intermediate steps, filtering events, propagating callbacks for streaming, and working with input streams.'
[How to: debug your LLM apps](https://python.langchain.com/docs/how_to/debugging/): debugging LLM applications, adding print statements, or logging events for tracing. Covers setting verbose mode to print important events, debug mode to print all events, and using LangSmith for visualizing event traces.

### Components

These are the core building blocks you can use when building applications.

#### Chat models

[Chat Models](https://python.langchain.com/docs/concepts/chat_models/): building applications using chat models, learning about chat model interfaces and features, or interested in integrating chat models with external tools and services. Provides an overview of chat models in LangChain, including their features, integration options, interfaces, tool calling, structured outputs, multimodality, context windows, and advanced topics like rate-limiting and caching.
[here](https://python.langchain.com/docs/integrations/chat/): integrating chat models into an application, using chat models for conversational AI tasks, or choosing between different chat model providers. Provides an overview of chat models integrated with LangChain, including OpenAI, Anthropic, Google, and others. Covers key features like tool calling, structured output, JSON mode, local usage, and multimodal support.

[How to: use chat models to call tools](https://python.langchain.com/docs/how_to/tool_calling/): needing to call tools from chat models, wanting to use chat models to generate structured output, or doing extraction from text using chat models. Explains how to define tool schemas as Python functions, Pydantic/TypedDict classes, or LangChain Tools; bind them to chat models; retrieve tool calls from LLM responses; and optionally parse tool calls into structured objects.
[How to: get models to return structured output](https://python.langchain.com/docs/how_to/structured_output/): wanting to obtain structured output from an LLM, needing to parse JSON/XML/YAML output from an LLM, or looking to use few-shot examples with structured outputs. This page covers using the `.with_structured_output()` method to obtain structured data from LLMs, prompting techniques to elicit structured outputs, and parsing structured outputs.
[How to: cache model responses](https://python.langchain.com/docs/how_to/chat_model_caching/): needing to cache ChatModel responses for efficiency, needing to reduce API calls for cost savings, or during development. This page covers how to use an in-memory cache or a SQLite database for caching ChatModel responses, which can improve performance and reduce costs.
[How to: get log probabilities](https://python.langchain.com/docs/how_to/logprobs/): Line 1: 'seeking to get token-level log probabilities from OpenAI chat models, when needing to understand how log probabilities are represented in LangChain' Line 2: 'Explains how to configure OpenAI chat models to return token log probabilities, and how these are included in the response metadata and streamed responses.'
[How to: create a custom chat model class](https://python.langchain.com/docs/how_to/custom_chat_model/): creating a custom chat model class, integrating a new language model as a chat model, or implementing streaming for a chat model. This page explains how to create a custom chat model class by inheriting from BaseChatModel, and implementing methods like _generate and _stream. It covers handling inputs, messages, streaming, identifying parameters, and contributing custom chat models.
[How to: stream a response back](https://python.langchain.com/docs/how_to/chat_streaming/): LLM should read this page when: 1) It needs to stream chat model responses token-by-token 2) It needs to understand how to use the astream() and astream_events() methods for chat models 3) It wants to see examples of streaming chat model responses synchronously and asynchronously This page explains how to stream chat model responses token-by-token using the astream() and astream_events() methods, and provides examples for synchronous and asynchronous streaming with chat models that support this feature.
[How to: track token usage](https://python.langchain.com/docs/how_to/chat_token_usage_tracking/): tracking token usage for chat models, determining costs of using chat models, implementing token usage tracking in applications. Provides methods to track token usage from OpenAI and Anthropic chat models through AIMessage.usage_metadata, callbacks, and using LangSmith. Covers streaming token usage and aggregating usage across multiple calls.
[How to: track response metadata across providers](https://python.langchain.com/docs/how_to/response_metadata/): needing to access metadata from model responses, wanting to get information like token usage or log probabilities, or checking safety ratings Explains how to access response metadata from various chat model providers like OpenAI, Anthropic, Vertex AI, etc. Shows code examples of retrieving metadata like token usage, log probabilities, and safety ratings.
[How to: use chat models to call tools](https://python.langchain.com/docs/how_to/tool_calling/): needing to call tools from chat models, wanting to use chat models to generate structured output, or doing extraction from text using chat models. Explains how to define tool schemas as Python functions, Pydantic/TypedDict classes, or LangChain Tools; bind them to chat models; retrieve tool calls from LLM responses; and optionally parse tool calls into structured objects.
[How to: stream tool calls](https://python.langchain.com/docs/how_to/tool_streaming/): Line 1: 'wanting to stream tool calls, when needing to handle partial tool call data, or when needing to accumulate tool call chunks' Line 2: 'This page explains how to stream tool calls, merge message chunks to accumulate tool call chunks, and parse tool calls from accumulated chunks, with code examples.'
[How to: handle rate limits](https://python.langchain.com/docs/how_to/chat_model_rate_limiting/): handling rate limits from model providers, running many parallel queries to a model, benchmarking a chat model. The page explains how to initialize and use an in-memory rate limiter with chat models to limit the number of requests made per unit time.
[How to: few shot prompt tool behavior](https://python.langchain.com/docs/how_to/tools_few_shot/): using few-shot examples to improve tool calling, demonstrating how to incorporate example queries and responses into the prompt. The page explains how to create few-shot prompts including examples of tool usage, allowing the model to learn from these demonstrations to improve its ability to correctly call tools for math operations or other tasks.
[How to: bind model-specific formatted tools](https://python.langchain.com/docs/how_to/tools_model_specific/): binding model-specific tools, binding OpenAI tool schemas, invoking model-specific tools This page explains how to bind model-specific tool schemas directly to an LLM, with an example using the OpenAI tool schema format.
[How to: force models to call a tool](https://python.langchain.com/docs/how_to/tool_choice/): needing to force an LLM to call a specific tool, needing to force an LLM to call at least one tool This page shows how to use the tool_choice parameter to force an LLM to call a specific tool or to call at least one tool from a set of available tools.
[How to: work with local models](https://python.langchain.com/docs/how_to/local_llms/): [running LLMs locally on a user's device, using open-source LLMs, utilizing custom prompts with LLMs] [Overview of open-source LLMs and frameworks for running inference locally, instructions for setting up and using local LLMs (Ollama, llama.cpp, GPT4All, llamafile), guidance on formatting prompts for specific LLMs, potential use cases for local LLMs.]
[How to: init any model in one line](https://python.langchain.com/docs/how_to/chat_models_universal_init/): initializing chat models for different model providers, creating a configurable chat model, inferring the model provider from the model name. The page explains how to initialize any LLM chat model integration in one line using the init_chat_model() helper, create a configurable chat model with default or custom parameters, and infer the model provider based on the model name.

#### Messages

[Messages](https://python.langchain.com/docs/concepts/messages/): querying LangChain's chat message format, understanding different message types, building chat applications. Messages are the unit of communication in chat models, representing input/output with roles, content, metadata. Covers SystemMessage, HumanMessage, AIMessage, AIMessageChunk, ToolMessage, RemoveMessage, and legacy FunctionMessage.

[How to: manage large chat history](https://python.langchain.com/docs/how_to/trim_messages/): working with long chat histories, when concerned about token limits for chat models, when implementing token management strategies. This page explains how to use the trim_messages utility to reduce the size of a chat message history to fit within token limits, covering trimming by token count or message count, and allowing customization of trimming strategies.
[How to: filter messages](https://python.langchain.com/docs/how_to/filter_messages/): needing to filter messages by type, id, or name when working with message histories, when using chains/agents that pass message histories between components. Provides instructions and examples for filtering message lists (e.g. to only include human messages) using the filter_messages utility, including basic usage, chaining with models, and API reference.
[How to: merge consecutive messages of the same type](https://python.langchain.com/docs/how_to/merge_message_runs/): it needs to merge consecutive messages of the same type for a particular model, when it wants to compose the merge_message_runs utility with other components in a chain, or when it needs to invoke the merge_message_runs utility imperatively. The page explains how to use the merge_message_runs utility to merge consecutive messages of the same type, provides examples of using it in chains or invoking it directly, and links to the API reference for more details.

#### Prompt templates

[Prompt Templates](https://python.langchain.com/docs/concepts/prompt_templates/): creating prompts for language models, formatting chat messages, slotting messages into specific locations in a prompt. This page covers different types of prompt templates (string, chat, messages placeholder) for formatting prompts for language models and chat models.

[How to: use few shot examples](https://python.langchain.com/docs/how_to/few_shot_examples/): creating few-shot prompts, using example selectors, providing examples to large language models This page explains how to use few-shot examples to provide context to language models, including creating formatters, constructing example sets, using example selectors like SemanticSimilarityExampleSelector, and creating FewShotPromptTemplates.
[How to: use few shot examples in chat models](https://python.langchain.com/docs/how_to/few_shot_examples_chat/): LLM should read this page when: 1) wanting to provide a few-shot example to fine-tune a chat model's output, 2) needing to dynamically select examples from a larger set based on semantic similarity to the input This page covers how to provide few-shot examples to chat models using either fixed examples or dynamically selecting examples from a vectorstore based on semantic similarity to the input.
[How to: partially format prompt templates](https://python.langchain.com/docs/how_to/prompts_partial/): needing to partially format prompt templates, wanting to pass partial strings to templates, or needing to pass functions returning strings to templates. Explains how to partially format prompt templates by passing in a subset of required values as strings or functions that return strings, to create a new template expecting only remaining values.
[How to: compose prompts together](https://python.langchain.com/docs/how_to/prompts_composition/): needing to compose prompts from various prompt components, working with chat prompts, or using the PipelinePromptTemplate class. This page explains how to concatenate different prompt templates together to build larger prompts, covering both string prompts and chat prompts, as well as using the PipelinePromptTemplate to reuse prompt components.

#### Example selectors

[Example Selectors](https://python.langchain.com/docs/concepts/example_selectors/): selecting examples for few-shot prompting, dynamically choosing examples for prompts, or understanding different example selection techniques. The page covers example selectors, which are classes responsible for selecting and formatting examples to include as part of prompts for improved performance with few-shot learning.

[How to: use example selectors](https://python.langchain.com/docs/how_to/example_selectors/): needing to select example prompts for few-shot learning, when having many examples to choose from, or when creating a custom example selector. Explains how to use example selectors in LangChain to select which examples to include in a prompt, covering built-in selectors like similarity and providing a custom example selector.
[How to: select examples by length](https://python.langchain.com/docs/how_to/example_selectors_length_based/): selecting examples for few-shot prompting, handling long examples that may exceed context window, and dynamically including the appropriate number of examples. This page explains how to use the LengthBasedExampleSelector to select examples based on their length, including fewer examples for longer inputs to avoid exceeding the context window.
[How to: select examples by semantic similarity](https://python.langchain.com/docs/how_to/example_selectors_similarity/): selecting relevant examples for few-shot prompting, building example-based systems, finding relevant reference cases This page covers how to select examples by similarity to the input using embedding-based semantic search over a vector store.
[How to: select examples by semantic ngram overlap](https://python.langchain.com/docs/how_to/example_selectors_ngram/): selecting relevant examples to include in few-shot prompts, determining relevancy through n-gram overlap scores, and customizing example selection thresholds. Explains how to use the NGramOverlapExampleSelector to select and order examples based on n-gram overlap with the input text, including setting thresholds and dynamically adding examples.
[How to: select examples by maximal marginal relevance](https://python.langchain.com/docs/how_to/example_selectors_mmr/): needing to select few-shot examples optimizing for both similarity to inputs and diversity from each other, working with example-based prompting for fewshot learning. Demonstrates how to use the MaxMarginalRelevanceExampleSelector, which selects examples by maximizing relevance to inputs while also optimizing for diversity between selected examples, contrasting it with just selecting by similarity.
[How to: select examples from LangSmith few-shot datasets](https://python.langchain.com/docs/how_to/example_selectors_langsmith/): [learning how to use LangSmith datasets for few-shot example selection, dynamically creating few-shot prompts from LangSmith data, integrating LangSmith with LangChain chains] [The page covers setting up LangSmith, querying LangSmith datasets for similar examples, and using those examples in a LangChain chain to create dynamic few-shot prompts for chat models.]

#### LLMs

[LLMs](https://python.langchain.com/docs/concepts/text_llms/): needing an overview of string-based language models, learning about legacy models in LangChain, or comparing string-based models to chat models. Covers LangChain's support for older language models that take strings as input and output, distinguishing them from newer chat models; advises using chat models where possible.

[How to: cache model responses](https://python.langchain.com/docs/how_to/llm_caching/): it needs to cache responses to save money and time, learn about caching in LangChain. LangChain provides an optional caching layer for LLMs to save money and time by reducing API calls for repeated requests. Examples show caching with InMemoryCache and SQLiteCache.
[How to: create a custom LLM class](https://python.langchain.com/docs/how_to/custom_llm/): creating a custom LLM class, wrapping their own LLM provider, integrating with a new language model not yet supported by LangChain. This page explains how to create a custom LLM class by implementing the required _call and _llm_type methods, as well as optional methods like _identifying_params, _acall, _stream, and _astream. It provides an example implementation, demonstrates testing and integration with LangChain APIs, and offers guidance for contributing custom LLM integrations.
[How to: stream a response back](https://python.langchain.com/docs/how_to/streaming_llm/): it needs to stream responses from an LLM, when it needs to work with async streaming from LLMs, when it needs to stream events from an LLM. This page shows how to stream responses token-by-token from LLMs using both sync and async methods, as well as how to stream events from LLMs asynchronously.
[How to: track token usage](https://python.langchain.com/docs/how_to/llm_token_usage_tracking/): tracking token usage for LLM calls, managing costs for an LLM application, or calculating costs based on token counts. The page covers how to track token usage using LangSmith, OpenAI callback handlers, and handling streaming contexts; it also summarizes limitations with legacy models for streaming.
[How to: work with local models](https://python.langchain.com/docs/how_to/local_llms/): [running LLMs locally on a user's device, using open-source LLMs, utilizing custom prompts with LLMs] [Overview of open-source LLMs and frameworks for running inference locally, instructions for setting up and using local LLMs (Ollama, llama.cpp, GPT4All, llamafile), guidance on formatting prompts for specific LLMs, potential use cases for local LLMs.]

#### Output parsers

[Output Parsers](https://python.langchain.com/docs/concepts/output_parsers/): looking for ways to extract structured data from model outputs, parsing model outputs into different formats, or handling errors in parsing. Covers various LangChain output parsers like JSON, XML, CSV, Pandas DataFrame, along with capabilities like output fixing, retrying, and using user-defined formats.

[How to: parse text from message objects](https://python.langchain.com/docs/how_to/output_parser_string/): needing to parse text from message objects, needing to extract text from chat model responses, or working with structured output formats. This page explains how to use the StrOutputParser to extract text from message objects, regardless of the underlying content format, such as text, multimodal data, or structured output.
[How to: use output parsers to parse an LLM response into structured format](https://python.langchain.com/docs/how_to/output_parser_structured/): [needing to parse LLM output into structured data, needing to stream partially parsed structured outputs, using LCEL with output parsers] 'Explains how to use output parsers like PydanticOutputParser to parse LLM text responses into structured formats like Python objects, and how to integrate them with prompts, models, and LCEL streaming.'
[How to: parse JSON output](https://python.langchain.com/docs/how_to/output_parser_json/): LLM should read this page when: 1) Prompting a language model to return JSON output 2) Parsing JSON output from a language model 3) Streaming partial JSON objects from a language model 'This page explains how to use the JsonOutputParser to specify a desired JSON schema, prompt a language model to generate output conforming to that schema, and parse the model's response as JSON. It covers using JsonOutputParser with and without Pydantic, streaming partial JSON objects, and provides code examples.'
[How to: parse XML output](https://python.langchain.com/docs/how_to/output_parser_xml/): needing to parse XML output from a model, when outputting prompts with XML formatting instructions for models, when streaming partial XML results This page shows how to use the XMLOutputParser to parse model output in XML format, including adding XML formatting instructions to prompts and streaming partial XML output
[How to: parse YAML output](https://python.langchain.com/docs/how_to/output_parser_yaml/): LLM should read this page when: 1) Needing to generate YAML output conforming to a specific schema 2) Incorporating YAML output into a larger prompt/chain 3) Parsing YAML output returned by an LLM 'This page explains how to use the YamlOutputParser to parse YAML output from language models, allowing the output to conform to a predefined schema. It covers setting up the parser, constructing prompts with formatting instructions, and chaining the parser with a model.'
[How to: retry when output parsing errors occur](https://python.langchain.com/docs/how_to/output_parser_retry/): [attempting to parse and handle partial or error LLM outputs, troubleshooting output parsing failures, implementing retry logic for parsing] [Explains how to use the RetryOutputParser to handle parsing errors by reprompting the LLM, provides examples for using it with OpenAI models and chaining it with other runnables.]
[How to: try to fix errors in output parsing](https://python.langchain.com/docs/how_to/output_parser_fixing/): needing to handle improperly formatted outputs, attempting to fix formatting issues using an LLM, or parsing outputs that do not conform to a predefined schema. Explains how to use the OutputFixingParser, which wraps another parser and attempts to fix formatting errors by consulting an LLM when the original parser fails.
[How to: write a custom output parser class](https://python.langchain.com/docs/how_to/output_parser_custom/): Line 1: 'creating a custom output parser, implementing a custom parser by inheriting from base classes, or parsing raw model outputs' Line 2: 'Covers how to create custom output parsers using runnable lambdas/generators (recommended) or by inheriting from base parser classes like BaseOutputParser and BaseGenerationOutputParser. Includes examples for simple and more complex parsing scenarios.'

#### Document loaders

[Document Loaders](https://python.langchain.com/docs/concepts/document_loaders/): needing to load data from various sources like files, webpages, or databases, or when handling large datasets with lazy loading. Document loaders help load data from different sources into a standardized Document object format, with options for lazy loading of large datasets.

- [How to: load PDF files](https://python.langchain.com/docs/how_to/document_loader_pdf/)
[How to: load web pages](https://python.langchain.com/docs/how_to/document_loader_web/): LLM should read this page when: - It needs to load and process web pages for question answering or other applications - It needs guidance on using web page content with LangChain 'The page covers how to load web pages into LangChain's Document format, including simple text extraction and advanced parsing of page structure. It demonstrates tools like WebBaseLoader and UnstructuredLoader, and shows how to perform operations like vector search over loaded web content.'
[How to: load CSV data](https://python.langchain.com/docs/how_to/document_loader_csv/): loading CSV files into a sequence of documents, customizing CSV parsing and loading, specifying a column to identify the document source This page explains how to load CSV files into a sequence of Document objects using LangChain's CSVLoader, including customizing the parsing, specifying a source column, and loading from a string.
[How to: load data from a directory](https://python.langchain.com/docs/how_to/document_loader_directory/): loading documents from a file system, handling various file encodings, or using custom document loaders. Shows how to load files from directories using the DirectoryLoader, handle encoding errors, use multithreading, and customize the loader class.
[How to: load HTML data](https://python.langchain.com/docs/how_to/document_loader_html/): loading HTML documents, parsing HTML files with specialized tools, or extracting text from HTML. This page covers how to load HTML documents into LangChain Document objects using Unstructured and BeautifulSoup4, with code examples and API references provided.
[How to: load JSON data](https://python.langchain.com/docs/how_to/document_loader_json/): loading JSON or JSON Lines data into LangChain Documents, or extracting metadata from JSON data. This page explains how to use the JSONLoader to convert JSON and JSONL data into LangChain Documents, including how to extract specific fields into the content and metadata, and provides examples for common JSON structures.
[How to: load Markdown data](https://python.langchain.com/docs/how_to/document_loader_markdown/): needing to load Markdown files, needing to retain Markdown elements, needing to parse Markdown into components This page covers how to load Markdown files into LangChain documents, including retaining elements like titles and lists, and parsing Markdown into components.
[How to: load Microsoft Office data](https://python.langchain.com/docs/how_to/document_loader_office_file/): loading Microsoft Office files (DOCX, XLSX, PPTX) into LangChain, when working with Azure AI Document Intelligence. It covers how to use the AzureAIDocumentIntelligenceLoader to load Office documents into LangChain Documents for further processing.
[How to: write a custom document loader](https://python.langchain.com/docs/how_to/document_loader_custom/): Line 1: 'creating a custom document loader, working with files, or using the GenericLoader abstraction' Line 2: 'This page explains how to create a custom document loader, work with files using BaseBlobParser and Blob, and use the GenericLoader to combine a BlobLoader with a BaseBlobParser.'

#### Text splitters

[Text Splitters](https://python.langchain.com/docs/concepts/text_splitters/): working with long documents, handling limited model input sizes, or optimizing retrieval systems This page discusses different strategies for splitting large texts into smaller chunks, including length-based, text structure-based, document structure-based, and semantic meaning-based approaches.

[How to: recursively split text](https://python.langchain.com/docs/how_to/recursive_text_splitter/): splitting long text into smaller chunks, processing text from languages without word boundaries like Chinese or Japanese, parsing documents for downstream tasks. Covers how to recursively split text by list of characters like newlines and spaces, and options to customize characters for different languages. Discusses chunk size, overlap, and creating LangChain Document objects.
[How to: split HTML](https://python.langchain.com/docs/how_to/split_html/): needing to split HTML content into chunks, preserving semantic structure for better context during processing Explains different techniques to split HTML pages like HTMLHeaderTextSplitter, HTMLSectionSplitter, HTMLSemanticPreservingSplitter; covers preserving tables, lists, custom handlers
[How to: split by character](https://python.langchain.com/docs/how_to/character_text_splitter/): needing to split text by individual characters, needing to control chunk size by character count, needing to handle text with differing chunk sizes. Explains how to split text into chunks by character count, using the CharacterTextSplitter. Covers setting chunk size, overlap, and passing metadata.
[How to: split code](https://python.langchain.com/docs/how_to/code_splitter/): needing to split code into logical chunks, working with code from specific programming languages, or creating language-specific text splitters. Provides examples of using the RecursiveCharacterTextSplitter to split code from various programming languages like Python, JavaScript, Markdown, and others into document chunks based on language-specific separators.
[How to: split Markdown by headers](https://python.langchain.com/docs/how_to/markdown_header_metadata_splitter/): splitting markdown files into chunks, handling headers and metadata in markdown files, constraining chunk sizes in markdown files. This page covers how to split markdown files by headers into chunks, handle metadata associated with headers, and constrain chunk sizes using other text splitters like RecursiveCharacterTextSplitter.
[How to: recursively split JSON](https://python.langchain.com/docs/how_to/recursive_json_splitter/): splitting JSON data into smaller chunks, managing chunk sizes from list content within JSON data. Explains how to split JSON data into smaller chunks while keeping nested objects intact, control chunk sizes, and handle JSON lists by converting them to dictionaries before splitting.
[How to: split text into semantic chunks](https://python.langchain.com/docs/how_to/semantic-chunker/): building an application that needs to split long text into smaller chunks based on semantic meaning, when working with large documents that need to be broken down into semantically coherent sections, or when needing to control the granularity of text splitting. This page explains how to use the SemanticChunker from LangChain to split text into semantically coherent chunks by leveraging embedding models, with options to control the splitting behavior based on percentile, standard deviation, interquartile range, or gradient of embedding distance.
[How to: split by tokens](https://python.langchain.com/docs/how_to/split_by_token/): LLM should read this page when: 1) Splitting long text into chunks while counting tokens 2) Handling non-English languages for text splitting 3) Comparing different tokenizers for text splitting 'The page covers how to split text into chunks based on token count using different tokenizers like tiktoken, spaCy, SentenceTransformers, NLTK, KoNLPY (for Korean), and Hugging Face tokenizers. It explains the approaches, usage, and API references for each tokenizer.'

#### Embedding models

[Embedding Models](https://python.langchain.com/docs/concepts/embedding_models/): LLM should read this page when: 1) Working with text embeddings for search/retrieval 2) Comparing text similarity using embedding vectors 3) Selecting or integrating text embedding models It covers key concepts of embedding models: converting text to numerical vectors, measuring similarity between vectors, embedding models (historical context, interface, integrations), and common similarity metrics (cosine, Euclidean, dot product).
[supported integrations](https://python.langchain.com/docs/integrations/text_embedding/): looking for integrations with embedding models, wanting to compare embedding providers, needing guidance on selecting an embedding model This page documents integrations with various model providers that allow using embeddings in LangChain, covering OpenAI, Azure, Google, AWS, HuggingFace, and other embedding services.

[How to: embed text data](https://python.langchain.com/docs/how_to/embed_text/): it needs to embed text into vectors, when it needs to use text embeddings for tasks like semantic search, and when it needs to understand the interface for text embedding models. This page explains how to use LangChain's Embeddings class to interface with various text embedding model providers, embed documents and queries, and work with the resulting vector representations of text.
[How to: cache embedding results](https://python.langchain.com/docs/how_to/caching_embeddings/): caching document embeddings to improve performance, caching query embeddings to improve performance, or choosing a data store for caching embeddings. This page covers how to use the CacheBackedEmbeddings class to cache document and query embeddings in a ByteStore, demonstrating its usage with a local file store and an in-memory store. It also explains how to specify the cache namespace to avoid collisions.
[How to: create a custom embeddings class](https://python.langchain.com/docs/how_to/custom_embeddings/): needing to use a custom text embedding model, integrating a new text embedding provider, or contributing a new text embedding integration. The page covers implementing custom text embedding models for LangChain by following the Embeddings interface, providing examples, testing, and contributing guidelines.

#### Vector stores

[Vector stores](https://python.langchain.com/docs/concepts/vectorstores/): LLM should read this page when: 1) Building applications that need to index and retrieve information based on semantic similarity 2) Integrating vector databases into their application 3) Exploring advanced vector search and retrieval techniques Vector stores are specialized data stores that enable indexing and retrieving information based on vector representations (embeddings) of data, allowing semantic similarity search over unstructured data like text, images, and audio. The page covers vector store integrations, the core interface, adding/deleting documents, basic and advanced similarity search techniques, and concepts like metadata filtering.
[supported integrations](https://python.langchain.com/docs/integrations/vectorstores/): Line 1: 'integrating vector stores into applications, deciding which vector store to use, or understanding the capabilities of different vector stores' Line 2: 'This page provides an overview of vector stores, which are used to store embedded data and perform similarity search. It lists the different vector stores integrated with LangChain, along with their key features and capabilities.'

[How to: use a vector store to retrieve data](https://python.langchain.com/docs/how_to/vectorstores/): building applications that require searching over large collections of text, when indexing and retrieving relevant information based on similarity between embeddings, and when working with vector databases and embeddings. The page covers how to create and query vector stores, which are used to store embedded vectors of text and search for similar embeddings. It explains how to initialize different vector store options like Chroma, FAISS, and LanceDB, and how to perform similarity searches on them. It also touches on asynchronous operations with vector stores.

#### Retrievers

[Retrievers](https://python.langchain.com/docs/concepts/retrievers/): building a retrieval system, integrating different retrieval sources, or linking retrieved information to source documents. This page outlines the retriever interface in LangChain, common types of retrievers such as vector stores and search APIs, and advanced retrieval patterns like ensembling and retaining source document information.

[How to: use a vector store to retrieve data](https://python.langchain.com/docs/how_to/vectorstore_retriever/): using vector stores for retrieval, implementing maximum marginal relevance retrieval, or specifying additional search parameters. This page explains how to create a retriever from a vector store, how to use maximum marginal relevance retrieval, and how to pass parameters like similarity score thresholds and top-k results.
[How to: generate multiple queries to retrieve data for](https://python.langchain.com/docs/how_to/MultiQueryRetriever/): Line 1: 'improving retrieval results for search queries, retrieving documents from a vector database, or using an LLM to generate multiple queries for a given input' Line 2: 'Explains how to use MultiQueryRetriever to automatically generate multiple queries from an input question using an LLM, retrieve documents for each query, and take the unique union of results to improve retrieval performance.'
[How to: use contextual compression to compress the data retrieved](https://python.langchain.com/docs/how_to/contextual_compression/): [it needs to retrieve relevant information from a large corpus of documents, it needs to filter out irrelevant content from retrieved documents, it needs to compress or shorten documents to focus on query-relevant content] This page discusses contextual compression, a technique that allows retrieving only relevant portions of documents given a query, using various methods like LLM-based extractors/filters, embedding similarity filters, or combinations thereof via pipelines.
[How to: write a custom retriever class](https://python.langchain.com/docs/how_to/custom_retriever/): learning how to create a custom retriever, when implementing custom retrieval logic, when adding retrieval capabilities to an application. Explains how to implement a custom Retriever class by extending BaseRetriever, including providing examples and guidelines for contributing custom retrievers.
[How to: add similarity scores to retriever results](https://python.langchain.com/docs/how_to/add_scores_retriever/): needing to incorporate similarity/relevance scores from retrievers, using vector or multi-vector retrievers, or propagating scores through custom retriever subclasses Shows how to add similarity scores from retrievers like Vector Store Retrievers, SelfQueryRetriever, and MultiVectorRetriever to the metadata of retrieved documents
[How to: combine the results from multiple retrievers](https://python.langchain.com/docs/how_to/ensemble_retriever/): combining results from multiple retriever algorithms, leveraging different retrieval strengths, or using a hybrid search approach. The page explains how to use the EnsembleRetriever to combine results from sparse and dense retrievers, outlines basic usage, and demonstrates runtime configuration of individual retrievers.
[How to: reorder retrieved results to mitigate the "lost in the middle" effect](https://python.langchain.com/docs/how_to/long_context_reorder/): looking to improve performance of RAG applications, mitigating the "lost in the middle" effect, reordering retrieved results for longer contexts. Explains how to reorder retrieved documents to position the most relevant at the beginning and end, with less relevant in the middle, helping surface important information for language models.
[How to: generate multiple embeddings per document](https://python.langchain.com/docs/how_to/multi_vector/): needing to retrieve documents using multiple vector embeddings per document, when working with long documents that need to be split into chunks, when using document summaries for retrieval. This page covers how to index documents using 1) document chunks, 2) summaries generated with an LLM, and 3) hypothetical questions generated with an LLM. It demonstrates the usage of the MultiVectorRetriever to retrieve parent documents based on vector embeddings of chunks/summaries/questions.
[How to: retrieve the whole document for a chunk](https://python.langchain.com/docs/how_to/parent_document_retriever/): [1) wanting to retrieve larger documents instead of just smaller chunks for context, 2) trying to balance keeping context while splitting long documents] [The page explains how to use the ParentDocumentRetriever, which first splits documents into small chunks for indexing but then retrieves the larger parent documents those chunks came from during retrieval. It shows code examples for retrieving full documents as well as larger chunks rather than full documents.]
[How to: generate metadata filters](https://python.langchain.com/docs/how_to/self_query/): needing to perform retrieval on documents based on semantic similarity to the query text and metadata filters, integrating the retrieval into a question-answering pipeline. Covers creating a Self Query Retriever which can perform semantic text retrieval and structured metadata filtering in one step, using an underlying vector store and a query constructor LLM chain to parse natural language queries into structured representations.
[How to: create a time-weighted retriever](https://python.langchain.com/docs/how_to/time_weighted_vectorstore/): it needs to retrieve documents from a vector store considering both semantic similarity and time decay, it needs to simulate time for testing purposes, or it needs to adjust the balance between semantic similarity and recency in retrieving documents. This page explains how to use the TimeWeightedVectorStoreRetriever, which combines semantic similarity scores from a vector store with a time decay factor that reduces the relevance of older documents over time, and provides examples of using different decay rates and mocking time for testing.
[How to: use hybrid vector and keyword retrieval](https://python.langchain.com/docs/how_to/hybrid/): LLM should read this page when: 1) It needs to perform hybrid search combining vector and other search techniques 2) It uses a vectorstore that supports hybrid search capabilities Explains how to configure and invoke LangChain chains to leverage hybrid search features of vectorstores like Astra DB, ElasticSearch, etc.

#### Indexing

Indexing is the process of keeping your vectorstore in-sync with the underlying data source.

[How to: reindex data to keep your vectorstore in-sync with the underlying data source](https://python.langchain.com/docs/how_to/indexing/): needing to index documents into a vector store, handling content deduplication and document mutations over time, or cleaning up old/deleted documents from the store. Covers the LangChain indexing API workflow, including deletion modes, using document loaders, and setting source metadata for documents to handle mutations and deletions properly.

#### Tools

[Tools](https://python.langchain.com/docs/concepts/tools/): needing an overview of tools in LangChain, wanting to create custom tools, or learning how to pass runtime values to tools. Tools are a way to encapsulate functions with schemas that can be passed to chat models supporting tool calling. The page covers the tool interface, creating tools using the @tool decorator, configuring tool schemas, tool artifacts, special type annotations like InjectedToolArg, and toolkits.

[How to: define a custom tool](https://python.langchain.com/docs/how_to/custom_tools/): creating custom tools for agents, converting functions or runnables to tools, or subclassing BaseTool. This page covers creating tools from functions using the @tool decorator or StructuredTool class, creating tools from Runnables, subclassing BaseTool for custom tools, creating async tools, handling tool errors, and returning artifacts from tool execution.
[How to: use built-in tools and toolkits](https://python.langchain.com/docs/how_to/tools_builtin/): needing to use built-in LangChain tools or toolkits, needing to customize built-in LangChain tools. This page covers how to use LangChain's built-in tools and toolkits, including customizing tool names, descriptions, and argument schemas. It also explains how to use LangChain toolkits, which are collections of tools for specific tasks.
[How to: use chat models to call tools](https://python.langchain.com/docs/how_to/tool_calling/): needing to call tools from chat models, wanting to use chat models to generate structured output, or doing extraction from text using chat models. Explains how to define tool schemas as Python functions, Pydantic/TypedDict classes, or LangChain Tools; bind them to chat models; retrieve tool calls from LLM responses; and optionally parse tool calls into structured objects.
[How to: pass tool outputs to chat models](https://python.langchain.com/docs/how_to/tool_results_pass_to_model/): 1) integrating tools with chat models, 2) implementing tool calling functionality, 3) passing tool outputs back to chat models. Demonstrates how to pass tool function outputs back to chat models as tool messages, allowing the model to incorporate tool results in generating a final response.
[How to: pass run time values to tools](https://python.langchain.com/docs/how_to/tool_runtime/): it needs to pass runtime values to tools, when it needs to prevent an LLM from generating certain tool arguments, and when it needs to inject arguments directly at runtime. This page explains how to use the InjectedToolArg annotation to mark certain parameters of a Tool as being injected at runtime, preventing the LLM from generating those arguments. It also shows how to inject the arguments at runtime and create a tool-executing chain.
[How to: add a human-in-the-loop for tools](https://python.langchain.com/docs/how_to/tools_human/): adding human approval to tool calling, allowing human intervention in a workflow, or setting up fail-safes for sensitive operations. This page demonstrates how to add a human-in-the-loop step to approve or reject tool calls made by an LLM in a tool-calling chain using LangChain.
[How to: handle tool errors](https://python.langchain.com/docs/how_to/tools_error/): needing to handle errors that occur when tools are called by an LLM, when building fault tolerance into tool-calling chains, or when enabling self-correction for tool calling errors. The page covers strategies like try/except for tool calls, fallbacks to different models, retrying with exceptions passed to the LLM, and creating custom tool exceptions.
[How to: force models to call a tool](https://python.langchain.com/docs/how_to/tool_choice/): needing to force an LLM to call a specific tool, needing to force an LLM to call at least one tool This page shows how to use the tool_choice parameter to force an LLM to call a specific tool or to call at least one tool from a set of available tools.
[How to: disable parallel tool calling](https://python.langchain.com/docs/how_to/tool_calling_parallel/): considering disabling parallel tool calling, when looking for examples on parallel vs. single tool calls, when trying to control the number of tool calls made. Explains how to disable parallel tool calling in LangChain so that only one tool is called at a time, providing code examples.
[How to: access the `RunnableConfig` from a tool](https://python.langchain.com/docs/how_to/tool_configure/): accessing or configuring runtime behavior of sub-runnables from a custom tool, streaming events from child runnables within a tool This page explains how to access the RunnableConfig from within a custom tool to configure sub-invocations and stream events from those sub-invocations
[How to: stream events from a tool](https://python.langchain.com/docs/how_to/tool_stream_events/): Line 1: 'it needs to stream events from a tool, when it needs to configure tools to access internal runnables, or when it needs to propagate configurations to child runnables in async environments' Line 2: 'Guide on how to stream events from tools that call chat models, retrievers, or other runnables, by accessing internal events and propagating configurations, with examples and explanations for compatibility across Python versions'
[How to: return artifacts from a tool](https://python.langchain.com/docs/how_to/tool_artifacts/): returning structured data from a tool, passing artifacts to downstream components, handling custom data types from tools This page explains how tools can return artifacts separate from model input, allowing custom objects, dataframes, or images to be passed to downstream components while limiting model exposure.
[How to: convert Runnables to tools](https://python.langchain.com/docs/how_to/convert_runnable_to_tool/): Line 1: 'needing to convert a Python function or Runnable into a LangChain tool, when building an agent that calls external tools, or when integrating a custom tool into a chat model' Line 2: 'Demonstrates how to use the Runnable.as_tool() method to convert a Runnable to a tool with a name, description, and arguments schema. Includes examples of agents calling tools created from Runnables.'
[How to: add ad-hoc tool calling capability to models](https://python.langchain.com/docs/how_to/tools_prompting/): LLM should read this page when: 1) Adding ad-hoc tool calling capability to chat models/LLMs, 2) Using models not fine-tuned for tool calling, 3) Invoking custom tools from LLMs 'This guide demonstrates how to create prompts that instruct LLMs to request tool invocations, parse the LLM output to extract tool and arguments, invoke the requested tool, and return the tool output.'
[How to: pass runtime secrets to a runnable](https://python.langchain.com/docs/how_to/runnable_runtime_secrets/): needing to pass sensitive data to a runnable, ensuring secrets remain hidden from tracing, or integrating secret values with runnables. Explains how to pass runtime secrets to runnables using RunnableConfig, allowing certain keys to be hidden from tracing while still being accessible during invocation.

#### Multimodal

[How to: pass multimodal data directly to models](https://python.langchain.com/docs/how_to/multimodal_inputs/): needing to pass multimodal data (images, videos, etc.) to models, when working with models that support multimodal input and tool calling capabilities, and when looking to understand how to encode and pass different types of multimodal data. This page demonstrates how to pass multimodal input like images directly to LLMs and chat models, covering encoding techniques, passing single/multiple images, and invoking models with image/multimodal content. It also shows how to use multimodal models for tool calling.
[How to: use multimodal prompts](https://python.langchain.com/docs/how_to/multimodal_prompts/): wanting to pass multimodal data like images to an LLM, when wanting to send multiple pieces of multimodal data to an LLM, when wanting instructions on how to format multimodal prompts. This shows how to use prompt templates to format multimodal inputs like images to models that support it, including sending multiple images, and comparing images.

#### Agents

:::note

[LangGraph](https://langchain-ai.github.io/langgraph/): learning about LangGraph, considering using LangGraph for an AI application, or deciding between LangGraph and alternatives. Overview of LangGraph as an open-source framework for building AI agents, its key features like reliability and customizability, its ecosystem integration with other LangChain products, and additional learning resources.

:::

[How to: use legacy LangChain Agents (AgentExecutor)](https://python.langchain.com/docs/how_to/agent_executor/): building agents with specific tools, when working with chat history, when using language models for tool calling. This page explains how to build agents with AgentExecutor that can call tools like search engines and retrievers, how to add chat history to agents, and how to use language models to determine which tools to call.
[How to: migrate from legacy LangChain agents to LangGraph](https://python.langchain.com/docs/how_to/migrate_agent/): LLM should read this page when: 1) Migrating from legacy LangChain agents to LangGraph 2) Comparing the functionality of LangChain and LangGraph agents This page provides a detailed guide on migrating from legacy LangChain agents to LangGraph agents, covering topics such as basic usage, prompt templates, memory handling, iterating through steps, dealing with intermediate steps, setting iteration and execution time limits, early stopping methods, and trimming intermediate steps.

#### Callbacks

[Callbacks](https://python.langchain.com/docs/concepts/callbacks/): [needing to log, monitor, or stream events in an LLM application] [This page covers LangChain's callback system, which allows hooking into various stages of an LLM application for logging, monitoring, streaming, and other purposes. It explains the different callback events, callback handlers, and how to pass callbacks.]

[How to: pass in callbacks at runtime](https://python.langchain.com/docs/how_to/callbacks_runtime/): needing to pass callback handlers at runtime to capture events, needing to attach handlers to nested objects This page explains how to pass callback handlers at runtime when invoking a runnable, which allows capturing events from all nested objects without manually attaching handlers.
[How to: attach callbacks to a module](https://python.langchain.com/docs/how_to/callbacks_attach/): attaching callbacks to a runnable, reusing callbacks across multiple executions, composing a chain of runnables This page explains how to attach callbacks to a runnable using the .with_config() method, allowing callbacks to be reused across multiple executions and propagated to child components in a chain of runnables.
[How to: pass callbacks into a module constructor](https://python.langchain.com/docs/how_to/callbacks_constructor/): LLM should read this page when: 1) Implementing callbacks in LangChain, 2) Understanding the scope of constructor callbacks, 3) Deciding whether to use constructor or runtime callbacks 'This page explains how to pass callbacks into the constructor of LangChain objects, and that constructor callbacks are scoped only to the object they are defined on, not inherited by child objects.'
[How to: create custom callback handlers](https://python.langchain.com/docs/how_to/custom_callbacks/): creating custom behavior for LangChain components, customizing callback events, implementing event handlers This page explains how to create custom callback handlers by implementing callback methods and attaching the handler to LangChain components
[How to: use callbacks in async environments](https://python.langchain.com/docs/how_to/callbacks_async/): needing to use callbacks in async environments, handling sync callbacks in async methods, using AsyncCallbackHandler Covers using callbacks with async APIs, avoiding blocking with AsyncCallbackHandler, propagating callbacks in async runnables, example of sync and async callback handlers
[How to: dispatch custom callback events](https://python.langchain.com/docs/how_to/callbacks_custom_events/): dispatching custom callback events, handling async or sync custom callback events, or consuming custom events via the astream events API. This page covers how to dispatch custom callback events from within a Runnable, consume these events via async/sync callback handlers, and access custom events through the astream events API.

#### Custom

All of LangChain components can easily be extended to support your own versions.

[How to: create a custom chat model class](https://python.langchain.com/docs/how_to/custom_chat_model/): creating a custom chat model class, integrating a new language model as a chat model, or implementing streaming for a chat model. This page explains how to create a custom chat model class by inheriting from BaseChatModel, and implementing methods like _generate and _stream. It covers handling inputs, messages, streaming, identifying parameters, and contributing custom chat models.
[How to: create a custom LLM class](https://python.langchain.com/docs/how_to/custom_llm/): creating a custom LLM class, wrapping their own LLM provider, integrating with a new language model not yet supported by LangChain. This page explains how to create a custom LLM class by implementing the required _call and _llm_type methods, as well as optional methods like _identifying_params, _acall, _stream, and _astream. It provides an example implementation, demonstrates testing and integration with LangChain APIs, and offers guidance for contributing custom LLM integrations.
[How to: create a custom embeddings class](https://python.langchain.com/docs/how_to/custom_embeddings/): needing to use a custom text embedding model, integrating a new text embedding provider, or contributing a new text embedding integration. The page covers implementing custom text embedding models for LangChain by following the Embeddings interface, providing examples, testing, and contributing guidelines.
[How to: write a custom retriever class](https://python.langchain.com/docs/how_to/custom_retriever/): learning how to create a custom retriever, when implementing custom retrieval logic, when adding retrieval capabilities to an application. Explains how to implement a custom Retriever class by extending BaseRetriever, including providing examples and guidelines for contributing custom retrievers.
[How to: write a custom document loader](https://python.langchain.com/docs/how_to/document_loader_custom/): Line 1: 'creating a custom document loader, working with files, or using the GenericLoader abstraction' Line 2: 'This page explains how to create a custom document loader, work with files using BaseBlobParser and Blob, and use the GenericLoader to combine a BlobLoader with a BaseBlobParser.'
[How to: write a custom output parser class](https://python.langchain.com/docs/how_to/output_parser_custom/): Line 1: 'creating a custom output parser, implementing a custom parser by inheriting from base classes, or parsing raw model outputs' Line 2: 'Covers how to create custom output parsers using runnable lambdas/generators (recommended) or by inheriting from base parser classes like BaseOutputParser and BaseGenerationOutputParser. Includes examples for simple and more complex parsing scenarios.'
[How to: create custom callback handlers](https://python.langchain.com/docs/how_to/custom_callbacks/): creating custom behavior for LangChain components, customizing callback events, implementing event handlers This page explains how to create custom callback handlers by implementing callback methods and attaching the handler to LangChain components
[How to: define a custom tool](https://python.langchain.com/docs/how_to/custom_tools/): creating custom tools for agents, converting functions or runnables to tools, or subclassing BaseTool. This page covers creating tools from functions using the @tool decorator or StructuredTool class, creating tools from Runnables, subclassing BaseTool for custom tools, creating async tools, handling tool errors, and returning artifacts from tool execution.
[How to: dispatch custom callback events](https://python.langchain.com/docs/how_to/callbacks_custom_events/): dispatching custom callback events, handling async or sync custom callback events, or consuming custom events via the astream events API. This page covers how to dispatch custom callback events from within a Runnable, consume these events via async/sync callback handlers, and access custom events through the astream events API.

#### Serialization

[How to: save and load LangChain objects](https://python.langchain.com/docs/how_to/serialization/): needing to save and reload LangChain objects, handle API keys securely when serializing/deserializing objects, and maintain compatibility when deserializing objects across different versions of LangChain. This page discusses how to save and load serializable LangChain objects like chains, messages, and documents using the dump/load functions, which separate API keys and ensure cross-version compatibility. Examples are provided for serializing/deserializing to JSON strings, Python dicts, and disk files.

## Use cases

These guides cover use-case specific details.

### Q&A with RAG

Retrieval Augmented Generation (RAG) is a way to connect LLMs to external sources of data.
[this guide](https://python.langchain.com/docs/tutorials/rag/): building a retrieval-augmented question-answering system, when needing to index and search through unstructured data sources, when learning about key concepts like document loaders, text splitters, vector stores, and retrievers. This tutorial covers how to build a Q&A application over textual data by loading documents, splitting them into chunks, embedding and storing the chunks in a vector store, retrieving relevant chunks for a user query, and generating an answer using a language model with the retrieved context.

[How to: add chat history](https://python.langchain.com/docs/how_to/qa_chat_history_how_to/): building a conversational question-answering application, incorporating chat history and retrieval from external knowledge sources, and deciding between using chains or agents for the application logic. Discusses building chat applications with LangChain by using chains for predictable retrieval steps or agents for more dynamic reasoning. Covers setting up components like embeddings and vector stores, constructing chains with tool calls for retrieval, and assembling LangGraph agents with a ReAct executor. Provides examples for testing the applications.
[How to: stream](https://python.langchain.com/docs/how_to/qa_streaming/): LLM should read this page when: 1) Building a RAG (Retrieval Augmented Generation) application that requires streaming final outputs or intermediate steps 2) Integrating streaming capabilities into an existing LLM-based application 'The page provides guidance on how to stream final outputs and intermediate steps from a RAG (Retrieval Augmented Generation) application built with LangChain and LangGraph. It covers setting up the necessary components, constructing the RAG application, and utilizing different streaming modes to stream tokens from the final output or individual state updates from each step.'
[How to: return sources](https://python.langchain.com/docs/how_to/qa_sources/): LLM should read this page when: 1) Building a question-answering (QA) application that needs to return the sources used to generate the answer. 2) Implementing a conversational QA system with retrieval-augmented generation (RAG). 3) Structuring model outputs to include sources or citations. 'This guide explains how to configure LangChain's QA and RAG workflows to retrieve and return the source documents or citations used to generate the final answer. It covers both basic RAG and conversational RAG architectures, and demonstrates techniques for structuring the model output to include source information.'
[How to: return citations](https://python.langchain.com/docs/how_to/qa_citations/): seeking to add citations to results from a Retrieval Augmented Generation (RAG) application, when wanting to justify an answer using source material, and when needing to provide evidence for generated outputs. The page covers various methods for getting a RAG application to cite sources used in generating answers, including tool-calling to return source IDs or text snippets, direct prompting to generate structured outputs with citations, retrieving and compressing context to minimize need for citations, and post-processing generated answers to annotate with citations.
[How to: do per-user retrieval](https://python.langchain.com/docs/how_to/qa_per_user/): needing to configure retrieval chains for per-user data access, wanting to limit document access for different users, or building retrieval applications with multi-tenant architectures. Explains how to configure retriever search kwargs to limit retrieved documents based on user, demonstrates code example using Pinecone namespace for multi-tenancy.


### Extraction

Extraction is when you use LLMs to extract structured information from unstructured text.
[this guide](https://python.langchain.com/docs/tutorials/extraction/): building information extraction applications, understanding how to use reference examples for improving extraction performance, or when needing to extract structured data from unstructured text. This tutorial covers building an information extraction chain using LangChain, defining schemas for extracting structured data, using reference examples to improve extraction quality, and extracting multiple entities from text.

[How to: use reference examples](https://python.langchain.com/docs/how_to/extraction_examples/): wanting to use reference examples to improve extraction quality, wanting to structure example inputs and outputs for extraction, wanting to test an extraction model with and without examples. This page explains how to define reference examples in the format expected for the LangChain tool calling API, how to incorporate these examples into prompts, and how using examples can improve extraction performance compared to not using examples.
[How to: handle long text](https://python.langchain.com/docs/how_to/extraction_long_text/): working with large documents or PDFs that exceed the context window of the LLM, when needing to extract structured information from text. This page covers strategies for handling long text when doing information extraction, including a brute force approach of chunking the text and extracting from each chunk, and a retrieval-augmented generation (RAG) approach of indexing the chunks and only extracting from relevant ones. It also discusses common issues with these approaches.
[How to: do extraction without using function calling](https://python.langchain.com/docs/how_to/extraction_parse/): looking to extract structured data from text, when needing to parse model outputs into objects, or when wanting to avoid using tool calling methods for extraction tasks. This page explains how to use prompting instructions to get LLMs to generate outputs in a structured format like JSON, and then use output parsers to convert the model responses into Python objects.

### Chatbots

Chatbots involve using an LLM to have a conversation.
[this guide](https://python.langchain.com/docs/tutorials/chatbot/): building a chatbot application, incorporating conversational history, or using prompt templates. This page demonstrates how to build a chatbot with LangChain, including adding message persistence, prompt templates, conversation history management, and response streaming.

[How to: manage memory](https://python.langchain.com/docs/how_to/chatbots_memory/): LLM should read this page when: 1) Building a chatbot and wants to incorporate memory (chat history) 2) Looking to add context from previous messages to improve responses 3) Needs techniques to handle long conversations by summarizing or trimming history 'The page covers different techniques to add memory capabilities to chatbots, including passing previous messages directly, automatic history management using LangGraph persistence, trimming messages to reduce context, and generating summaries of conversations. Examples in Python are provided for each approach.'
[How to: do retrieval](https://python.langchain.com/docs/how_to/chatbots_retrieval/): building a retrieval-augmented chatbot, adding conversational context to retrieval queries, or streaming responses from a chatbot. This page covers setting up a retriever over a document corpus, creating document chains and retrieval chains, transforming queries for better retrieval, and streaming responses from the retrieval chain.
[How to: use tools](https://python.langchain.com/docs/how_to/chatbots_tools/): looking to integrate tools into chatbots, when using agents with tools, when incorporating web search into conversational agents. The page covers how to create a conversational agent using LangChain that can interact with APIs and web search tools, while maintaining chat history. It demonstrates setting up a ReAct agent with a Tavily search tool, invoking the agent, handling conversational responses with chat history, and adding memory.
[How to: manage large chat history](https://python.langchain.com/docs/how_to/trim_messages/): working with long chat histories, when concerned about token limits for chat models, when implementing token management strategies. This page explains how to use the trim_messages utility to reduce the size of a chat message history to fit within token limits, covering trimming by token count or message count, and allowing customization of trimming strategies.

### Query analysis

Query Analysis is the task of using an LLM to generate a query to send to a retriever.
[this guide](https://python.langchain.com/docs/tutorials/rag/#query-analysis): LLM should read this page when: 1) Building a question-answering application over unstructured data 2) Learning about Retrieval Augmented Generation (RAG) architectures 3) Indexing data for use with LLMs 'This tutorial covers building a Retrieval Augmented Generation (RAG) application that can answer questions based on ingested data. It walks through loading data, chunking it, embedding and storing it in a vector store, retrieving relevant chunks for a given query, and generating an answer using an LLM. It also shows how to incorporate query analysis for improved retrieval.'

[How to: add examples to the prompt](https://python.langchain.com/docs/how_to/query_few_shot/): needing to guide an LLM to generate queries, when fine-tuning an LLM for query generation, when incorporating examples into few-shot prompts. This page covers how to add examples to prompts for query analysis in LangChain, including setting up the system, defining the query schema, generating queries, and tuning prompts by adding examples.
[How to: handle cases where no queries are generated](https://python.langchain.com/docs/how_to/query_no_queries/): querying for information, handling cases where no queries are generated, integrating query analysis with retrieval. Provides guidance on handling scenarios where query analysis techniques allow for no queries to be generated, including code examples for structuring the output, performing query analysis with an LLM, and integrating query analysis with a retriever in a chain.
[How to: handle multiple queries](https://python.langchain.com/docs/how_to/query_multiple_queries/): handling queries that generate multiple potential queries, combining retrieval results from multiple queries, and integrating query analysis with retrieval pipelines. Explains how to handle scenarios where a query analysis step produces multiple potential queries by running retrievals for each query and combining the results. Demonstrates this approach with code examples using LangChain components.
[How to: handle multiple retrievers](https://python.langchain.com/docs/how_to/query_multiple_retrievers/): needing to handle multiple retrievers for query analysis, when implementing a query analyzer that can select between different retrievers, when building a retrieval-augmented system that needs to choose between different data sources. This page explains how to handle scenarios where a query analysis step allows for selecting between multiple retrievers, showing an example implementation using LangChain's tools for structured output parsing, prompting, and chaining components together.
[How to: construct filters](https://python.langchain.com/docs/how_to/query_constructing_filters/): constructing filters for query analysis, translating filters to specific retriever formats, using LangChain's structured query objects. This page covers how to construct filters as Pydantic models and translate them into retriever-specific filters using LangChain's translators for Chroma and Elasticsearch.
[How to: deal with high cardinality categorical variables](https://python.langchain.com/docs/how_to/query_high_cardinality/): dealing with categorical data with high cardinality, handling potential misspellings of categorical values, and filtering based on categorical values. The page discusses techniques for handling high-cardinality categorical data in query analysis, such as adding all possible values to the prompt, using a vector store to find relevant values, and correcting user input to the closest valid categorical value.

### Q&A over SQL + CSV

You can use LLMs to do question answering over tabular data.
[this guide](https://python.langchain.com/docs/tutorials/sql_qa/): LLM should read this page when: 1. Building a question-answering system over a SQL database 2. Implementing agents or chains to interact with a SQL database 'This tutorial covers building question-answering systems over SQL databases using LangChain. It demonstrates creating chains and agents that can generate SQL queries from natural language, execute them against a database, and provide natural language responses. It covers techniques like schema exploration, query validation, and handling high-cardinality columns.'

[How to: use prompting to improve results](https://python.langchain.com/docs/how_to/sql_prompting/): 'querying SQL databases with a language model, when doing few-shot prompting for SQL queries, and when selecting relevant few-shot examples dynamically.' 'This page covers how to improve SQL query generation prompts by incorporating database schema information, providing few-shot examples, and dynamically selecting the most relevant few-shot examples using semantic similarity.'
[How to: do query validation](https://python.langchain.com/docs/how_to/sql_query_checking/): Line 1: 'working on SQL query generation, handling invalid SQL queries, or incorporating human approval for SQL queries' Line 2: 'This page covers strategies for validating SQL queries, such as appending a query validator step, prompt engineering, human-in-the-loop approval, and error handling.'
[How to: deal with large databases](https://python.langchain.com/docs/how_to/sql_large_db/): dealing with large databases in SQL question-answering, identifying relevant table schemas to include in prompts, and handling high-cardinality columns with proper nouns or other unique values. The page discusses methods to identify relevant tables and table schemas to include in prompts when dealing with large databases. It also covers techniques to handle high-cardinality columns containing proper nouns or other unique values, such as creating a vector store of distinct values and querying it to include relevant spellings in prompts.
[How to: deal with CSV files](https://python.langchain.com/docs/how_to/sql_csv/): needing to build question-answering systems over CSV data, wanting to understand the tradeoffs between using SQL or Python libraries like Pandas, and requiring guidance on securely executing code from language models. This page covers two main approaches to question answering over CSV data: using SQL by loading CSVs into a database, or giving an LLM access to Python environments to interact with CSV data using libraries like Pandas. It discusses the security implications of each approach and provides code examples for implementing question-answering chains and agents with both methods.

### Q&A over graph databases

You can use an LLM to do question answering over graph databases.
[this guide](https://python.langchain.com/docs/tutorials/graph/): LLM should read this page when: 1) Building a question-answering system over a graph database 2) Implementing text-to-query generation for graph databases 3) Learning techniques for query validation and error handling 'This page covers building a question-answering application over a graph database using LangChain. It provides a basic implementation using the GraphQACypherChain, followed by an advanced implementation with LangGraph. The latter includes techniques like few-shot prompting, query validation, and error handling for generating accurate Cypher queries from natural language.'

[How to: add a semantic layer over the database](https://python.langchain.com/docs/how_to/graph_semantic/): needing to add a semantic layer over a graph database, needing to use tools representing Cypher templates with an LLM, or needing to build a LangGraph Agent to interact with a Neo4j database. This page covers how to create custom tools with Cypher templates for a Neo4j graph database, bind those tools to an LLM, and build a LangGraph Agent that can invoke the tools to retrieve information from the graph database.
[How to: construct knowledge graphs](https://python.langchain.com/docs/how_to/graph_constructing/): constructing knowledge graphs from unstructured text, storing information in a graph database, using LLM Graph Transformer to extract knowledge from text. This page explains how to set up a Neo4j graph database, use LLMGraphTransformer to extract structured knowledge graph data from text, filter extracted nodes/relationships, and store the knowledge graph in Neo4j.

### Summarization

LLMs can summarize and otherwise distill desired information from text, including
[this guide](https://python.langchain.com/docs/tutorials/summarization/): needing to summarize long texts or documents, when building question-answering systems, when creating text analysis applications. This page covers summarizing texts using LangChain, including the "stuff" method (concatenating into single prompt), the "map-reduce" method (splitting into chunks for parallel summarization), and orchestrating these methods using LangGraph.

[How to: summarize text in a single LLM call](https://python.langchain.com/docs/how_to/summarize_stuff/): looking to summarize text, seeking a simple single-LLM summarization method, or exploring basic summarization chains in LangChain. This page outlines how to use LangChain's pre-built 'stuff' summarization chain, which stuffs text into a prompt for an LLM to summarize in a single call.
[How to: summarize text through parallelization](https://python.langchain.com/docs/how_to/summarize_map_reduce/): needing to summarize long text documents using parallelization, needing to optimize summarization for large volumes of text, and needing efficient summarization strategies. This page discusses using a map-reduce strategy to summarize text through parallelization, including breaking the text into subdocuments, generating summaries for each in parallel (map step), and then consolidating the summaries into a final summary (reduce step). It provides code examples using LangChain and LangGraph.
[How to: summarize text through iterative refinement](https://python.langchain.com/docs/how_to/summarize_refine/): LLM should read this page when: 1. Attempting to summarize long texts through iterative refinement 2. Learning about building applications with LangGraph 3. Seeking examples of streaming LLM outputs 'This guide demonstrates how to summarize text through iterative refinement using LangGraph. It involves splitting the text into documents, summarizing the first document, and then refining the summary based on subsequent documents until finished. The approach leverages LangGraph's streaming capabilities and modularity.'

## LangChain Expression Language (LCEL)

[LCEL](https://python.langchain.com/docs/concepts/lcel/): needing an overview of the LangChain Expression Language (LCEL), deciding whether to use LCEL or not, and understanding how to compose chains using LCEL primitives. Provides an overview of the LCEL, a declarative approach to building chains from existing Runnables, covering its benefits, composition primitives like RunnableSequence and RunnableParallel, the composition syntax, automatic type coercion, and guidance on when to use LCEL versus alternatives like LangGraph.

[**LCEL cheatsheet**](https://python.langchain.com/docs/how_to/lcel_cheatsheet/): 'needing a reference for interacting with Runnables in LangChain or building custom runnables and chains' 'This page provides a comprehensive cheatsheet with examples for key operations with Runnables such as invoking, batching, streaming, composing, configuring, and dynamically building runnables and chains'

[**Migration guide**](https://python.langchain.com/docs/versions/migrating_chains/): migrating older chains from LangChain v0.0, reimplementing legacy chains, or upgrading to use LCEL and LangGraph This page provides guidance on migrating from deprecated v0.0 chain implementations to using LCEL and LangGraph, including specific guides for various legacy chains like LLMChain, ConversationChain, RetrievalQA, and others.

[How to: chain runnables](https://python.langchain.com/docs/how_to/sequence/): chaining multiple LangChain components together, composing prompt templates with models, or combining runnables in a sequence. This page explains how to chain runnables (LangChain components) together using the pipe operator '|' or the .pipe() method, including chaining prompt templates with models and parsers, and how input/output formats are coerced during chaining.
[How to: stream runnables](https://python.langchain.com/docs/how_to/streaming/): Line 1: 'wanting to learn how to stream LLM responses, stream intermediate steps, and configure streaming events.' Line 2: 'This page covers how to use the `stream` and `astream` methods to stream final outputs, how to use `astream_events` to stream both final outputs and intermediate steps, filtering events, propagating callbacks for streaming, and working with input streams.'
[How to: invoke runnables in parallel](https://python.langchain.com/docs/how_to/parallel/): parallelizing steps in a chain, formatting data for chaining, or splitting inputs to run multiple runnables in parallel. Explains how to use RunnableParallel to execute runnables concurrently, format data between steps, and provides examples of parallelizing chains.
[How to: add default invocation args to runnables](https://python.langchain.com/docs/how_to/binding/): LLM should read this page when: 1) Wanting to invoke a Runnable with constant arguments not part of the preceding output or user input 2) Needing to bind provider-specific arguments like stop sequences or tools 'This page explains how to use the Runnable.bind() method to set default invocation arguments for a Runnable within a RunnableSequence. It covers binding stop sequences to language models and attaching OpenAI tools.'
[How to: turn any function into a runnable](https://python.langchain.com/docs/how_to/functions/): needing to use custom functions, needing to implement streaming, needing to pass metadata to runnables Covers how to use custom functions as Runnables, including constructors, decorators, coercion, passing metadata, and implementing streaming.
[How to: pass through inputs from one chain step to the next](https://python.langchain.com/docs/how_to/passthrough/): needing to pass data from one step to the next in a chain, when formatting inputs for prompts, when retrieving and preparing context for prompts. This page explains how to use RunnablePassthrough and RunnableParallel to pass data unchanged through chains, covering examples like formatting retrieval results and user inputs into prompts.
[How to: configure runnable behavior at runtime](https://python.langchain.com/docs/how_to/configure/): configuring chain internals at runtime, swapping models or prompts within a chain, or exploring different configurations of runnables. The page covers how to use .configurable_fields to configure parameters of a runnable at runtime, and .configurable_alternatives to swap out runnables with alternatives, including examples for chat models, prompts, and combinations thereof.
[How to: add message history (memory) to a chain](https://python.langchain.com/docs/how_to/message_history/): building a chatbot or multi-turn application, wanting to persist conversational state, wanting to manage message history This page explains how to add message history and persist conversational state using LangGraph, covering examples with chat models and prompt templates, and managing the message history.
[How to: route between sub-chains](https://python.langchain.com/docs/how_to/routing/): LLM should read this page when: - It needs to conditionally route between sub-chains based on previous outputs - It needs to use semantic similarity to choose the most relevant prompt for a given query 'The page covers how to route between sub-chains in LangChain, including using custom functions, RunnableBranch, and semantic similarity for prompt routing. It provides code examples for each method.'
[How to: create a dynamic (self-constructing) chain](https://python.langchain.com/docs/how_to/dynamic_chain/): developing dynamic chains, implementing conditional routing, returning runnables dynamically The page explains how to create a dynamic chain that constructs parts of itself at runtime by having Runnable Lambdas return other Runnables.
[How to: inspect runnables](https://python.langchain.com/docs/how_to/inspect/): inspecting internals of an LCEL chain, debugging chain logic, or retrieving chain prompts. Provides methods to visualize chain graphs, print prompts used in chains, and inspect chain steps programmatically.
[How to: add fallbacks to a runnable](https://python.langchain.com/docs/how_to/fallbacks/): needing to add fallback options in case of errors, processing long inputs, or wanting to use a better model. This page explains how to configure fallback chains for LLM APIs in case of rate limiting or errors, for handling long input texts exceeding context windows, and for defaulting to better models when parsing fails.
[How to: pass runtime secrets to a runnable](https://python.langchain.com/docs/how_to/runnable_runtime_secrets/): needing to pass sensitive data to a runnable, ensuring secrets remain hidden from tracing, or integrating secret values with runnables. Explains how to pass runtime secrets to runnables using RunnableConfig, allowing certain keys to be hidden from tracing while still being accessible during invocation.

Tracing gives you observability inside your chains and agents, and is vital in diagnosing issues.

[How to: trace with LangChain](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain/): tracing LangChain applications with LangSmith, customizing trace metadata and run names, or integrating LangChain with the LangSmith SDK. Provides guides on integrating LangSmith tracing into LangChain applications, configuring trace metadata and run names, distributed tracing, interoperability between LangChain and LangSmith SDK, and tracing LangChain invocations without environment variables.
[How to: add metadata and tags to traces](https://docs.smith.langchain.com/how_to_guides/tracing/trace_with_langchain/#add-metadata-and-tags-to-traces): tracing LangChain applications with LangSmith, when logging metadata and tags to traces, and when customizing trace names and IDs. This page provides step-by-step guides on integrating LangSmith tracing with LangChain in Python and JS/TS, covering quick start instructions, selective tracing, logging to specific projects, adding metadata/tags, customizing run names/IDs, accessing run IDs, distributed tracing in Python, and interoperability with the LangSmith SDK.

[in this section of the LangSmith docs](https://docs.smith.langchain.com/how_to_guides/tracing/): configuring observability for LLM applications, accessing and managing traces, and setting up automation and monitoring. Guides on configuring tracing, using the UI/API for traces, creating dashboards, automating rules/alerts, and gathering human feedback for LLM applications.

## Integrations

### Featured Chat Model Providers

- [ChatAnthropic](https://python.langchain.com/docs/integrations/chat/anthropic/)
- [ChatMistralAI](https://python.langchain.com/docs/integrations/chat/mistralai/)
- [ChatFireworks](https://python.langchain.com/docs/integrations/chat/fireworks/)
- [AzureChatOpenAI](https://python.langchain.com/docs/integrations/chat/azure_chat_openai/)
- [ChatOpenAI](https://python.langchain.com/docs/integrations/chat/openai/)
- [ChatTogether](https://python.langchain.com/docs/integrations/chat/together/)
- [ChatVertexAI](https://python.langchain.com/docs/integrations/chat/google_vertex_ai_palm/)
- [ChatGoogleGenerativeAI](https://python.langchain.com/docs/integrations/chat/google_generative_ai/)
- [ChatGroq](https://python.langchain.com/docs/integrations/chat/groq/)
- [ChatCohere](https://python.langchain.com/docs/integrations/chat/cohere/)
- [ChatBedrock](https://python.langchain.com/docs/integrations/chat/bedrock/)
- [ChatHuggingFace](https://python.langchain.com/docs/integrations/chat/huggingface/)
- [ChatNVIDIA](https://python.langchain.com/docs/integrations/chat/nvidia_ai_endpoints/)
- [ChatOllama](https://python.langchain.com/docs/integrations/chat/ollama/)
- [ChatLlamaCpp](https://python.langchain.com/docs/integrations/chat/llamacpp/)
- [ChatAI21](https://python.langchain.com/docs/integrations/chat/ai21/)
- [ChatUpstage](https://python.langchain.com/docs/integrations/chat/upstage/)
- [ChatDatabricks](https://python.langchain.com/docs/integrations/chat/databricks/)
- [ChatWatsonx](https://python.langchain.com/docs/integrations/chat/ibm_watsonx/)
- [ChatXAI](https://python.langchain.com/docs/integrations/chat/xai/)

[All](https://python.langchain.com/docs/integrations/chat/): integrating chat models into an application, using chat models for conversational AI tasks, or choosing between different chat model providers. Provides an overview of chat models integrated with LangChain, including OpenAI, Anthropic, Google, and others. Covers key features like tool calling, structured output, JSON mode, local usage, and multimodal support.

## Glossary

[AIMessageChunk](https://python.langchain.com/docs/concepts/messages/#aimessagechunk): 'needing to understand messages and message structure for chat models, when working with chat history, and when integrating with chat model providers' Line 2: 'Detailed overview of the different message types used in LangChain for chat models, how messages are structured, and how to convert between LangChain and OpenAI message formats.'
[AIMessage](https://python.langchain.com/docs/concepts/messages/#aimessage): building chat applications, when implementing tool calling, or when working with chat model outputs. Messages are the units of communication in chat models, representing input, output and metadata; topics include message types, roles, content, metadata, conversation structure, and LangChain's unified message format.
[astream_events](https://python.langchain.com/docs/concepts/chat_models/#key-methods): LLM should read this page when: 1) Implementing an application that uses a chat model 2) Integrating chat models with other LangChain components 3) Planning for advanced chat model features like tool calling or structured outputs This page provides an overview of chat models in LangChain, including their key features, interfaces, integration options, tool calling, structured outputs, multimodality, context windows, and advanced topics like rate limiting and caching.
[BaseTool](https://python.langchain.com/docs/concepts/tools/#tool-interface): needing to understand LangChain tools, wanting to create custom tools, or looking for best practices for designing tools. The page covers the tool abstraction in LangChain, which associates a Python function with a schema for name, description, and arguments. It explains how to create tools using the @tool decorator, configure the schema, handle tool artifacts, use special type annotations (InjectedToolArg, RunnableConfig), and provides an overview of toolkits.
[invoke](https://python.langchain.com/docs/concepts/runnables/): learning how to use the Runnable interface, when working with custom Runnables, and when needing to configure Runnables at runtime. The page covers the Runnable interface, its methods for invocation, batching, streaming, inspecting schemas, and configuration. It explains RunnableConfig, custom Runnables, and configurable Runnables.
[bind_tools](https://python.langchain.com/docs/concepts/tool_calling/#tool-binding): building applications that require an LLM to directly interact with external systems or APIs, when integrating tools or functions into an LLM workflow, or when fine-tuning an LLM to better handle tool calling. This page provides an overview of tool calling, which allows LLMs to invoke external tools or APIs with specific input schemas. It covers key concepts like tool creation, binding tools to LLMs, initiating tool calls from LLMs, and executing the called tools. It also offers guidance on recommended usage and best practices.
[Caching](https://python.langchain.com/docs/concepts/chat_models/#caching): building chat applications, using LLMs for information extraction, or working with multimodal data This page discusses chat models, which are language models that operate on messages. It covers chat model interfaces, integrations, features like tool calling and structured outputs, multimodality, context windows, rate limiting, and caching.
[Chat models](https://python.langchain.com/docs/concepts/multimodality/#multimodality-in-chat-models): needing to understand multimodal capabilities in LangChain, when working with multimodal data like images/audio/video, and when determining if a specific LangChain component supports multimodality. Provides an overview of multimodality in chat models, embedding models, and vector stores. Discusses multimodal inputs/outputs for chat models and how they are formatted.
[Configurable runnables](https://python.langchain.com/docs/concepts/runnables/#configurable-runnables): trying to understand how to use Runnables, how to configure and compose Runnables, and how to inspect Runnable schemas. The Runnable interface is the foundation for working with LangChain components like language models, output parsers, and retrievers. It defines methods for invoking, batching, streaming, inspecting schemas, configuring, and composing Runnables.
[Context window](https://python.langchain.com/docs/concepts/chat_models/#context-window): getting an overview of chat models, understanding the key functionality of chat models, and determining if this concept is relevant for their application. Provides an overview of chat models (LLMs with a chat interface), their features, integrations, key methods like invoking/streaming, handling inputs/outputs, using tools/structured outputs, and advanced topics like rate limiting and caching.
[Conversation patterns](https://python.langchain.com/docs/concepts/chat_history/#conversation-patterns): managing conversation history in chatbots, implementing memory for chat models, understanding correct conversation structure. This page explains the concept of chat history, a record of messages exchanged between a user and a chat model. It covers conversation patterns, guidelines for managing chat history to avoid exceeding context window, and the importance of preserving conversation structure.
[Document](https://python.langchain.com/api_reference/core/documents/langchain_core.documents.base.Document.html/): working with document data, retrieving and processing text documents, integrating with text embedding and vector storage systems This page provides details on the Document class and its associated methods and properties, as well as examples of how to use it in various scenarios such as document loading, retrieval, and transformation
[Embedding models](https://python.langchain.com/docs/concepts/multimodality/#multimodality-in-embedding-models): needing to understand multimodal capabilities of LangChain components, wanting to work with non-text data like images/audio/video, or planning to incorporate multimodal data in chat interactions. Provides an overview of multimodality support in chat models (inputs and tools), embedding models, and vector stores; notes current limitations and expected future expansions to handle different data types.
[HumanMessage](https://python.langchain.com/docs/concepts/messages/#humanmessage): LLM should read this page when: 1) Understanding how to structure conversations with chat models, 2) Needing to work with different types of messages (user, assistant, system, tool), 3) Converting between LangChain and OpenAI message formats. Messages are the units of communication used by chat models, representing user input, assistant output, system instructions, and tool results. Key topics include message structure, types (HumanMessage, AIMessage, SystemMessage, ToolMessage), multimodal content support, and integration with OpenAI message format.
[InjectedState](https://python.langchain.com/docs/concepts/tools/#injectedstate): learning about LangChain's tools, creating custom tools, or integrating tools with chat models. Provides conceptual overview of tools - encapsulating functions with schemas for models to call. Covers creating tools with @tool decorator, tool interfaces, special type annotations, artifacts, best practices, and toolkits.
[InjectedStore](https://python.langchain.com/docs/concepts/tools/#injectedstore): needing to understand how to create and use tools in LangChain, when needing to pass runtime values to tools, and when needing to configure a tool's schema. Tools are a way to encapsulate functions and their schemas to be used with chat models that support tool calling. The page covers the tool interface, creating tools with the @tool decorator, using tools directly, configuring tool schemas, returning artifacts from tools, and special type annotations like InjectedToolArg and RunnableConfig.
[InjectedToolArg](https://python.langchain.com/docs/concepts/tools/#injectedtoolarg): trying to understand how to create and use tools in LangChain, when needing to configure tool schemas, and when wanting to return artifacts from tools. Tools provide a way to encapsulate Python functions and schemas to be passed to chat models for execution. The page covers creating tools with the @tool decorator, configuring tool schemas, special type annotations, and tool artifacts.
[input and output types](https://python.langchain.com/docs/concepts/runnables/#input-and-output-types): needing to interact with LangChain components, wanting to understand the core Runnable interface, or composing complex chains using LCEL. Covers the Runnable interface that defines a standard way to invoke, batch, stream and inspect components; the RunnableConfig for setting runtime options; creating custom Runnables; configurable Runnables; and how input/output types, schemas, and streaming work.
[Integration packages](https://python.langchain.com/docs/concepts/architecture/#integration-packages): determining the overall architecture of LangChain, understanding the different components and packages in the LangChain ecosystem, or deciding which packages to import for a specific use case. This page provides an overview of the different packages that make up the LangChain framework, including langchain-core, langchain, integration packages, langchain-community, langgraph, langserve, and LangSmith, and explains the purpose and contents of each package.
[Integration tests](https://python.langchain.com/docs/concepts/testing/#integration-tests): needing guidance on testing LangChain components, understanding different types of tests (unit, integration, standard), or wanting to contribute by adding tests to an integration. Provides an overview of unit tests, integration tests, and standard tests in the LangChain ecosystem, including definitions, examples, and how to implement them for new tools/integrations.
[invoke](https://python.langchain.com/docs/concepts/runnables/): learning how to use the Runnable interface, when working with custom Runnables, and when needing to configure Runnables at runtime. The page covers the Runnable interface, its methods for invocation, batching, streaming, inspecting schemas, and configuration. It explains RunnableConfig, custom Runnables, and configurable Runnables.
[JSON mode](https://python.langchain.com/docs/concepts/structured_outputs/#json-mode): LLM should read this page when: 1) It needs to return structured output that conforms to a specific schema, 2) It needs to store model output in a database, 3) It needs to ensure model output matches a predefined format. This page covers how to define an output schema, and techniques like tool calling and JSON mode that allow models to return structured output conforming to that schema, as well as a helper method to streamline the process.
[langchain-community](https://python.langchain.com/docs/concepts/architecture/#langchain-community): learning about the structure of LangChain, deploying LangChain applications, or needing an overview of the LangChain ecosystem. This page gives an overview of the different packages, components, and services that make up the LangChain framework, including langchain-core, langchain, integration packages, langchain-community, LangGraph, LangServe, and LangSmith.
[langchain-core](https://python.langchain.com/docs/concepts/architecture/#langchain-core): needing an overview of LangChain's architecture, when considering integrating external packages, or when exploring the LangChain ecosystem. Outlines the main components of LangChain (langchain-core, langchain, integration packages, langchain-community, langgraph, langserve, LangSmith) and their roles, providing a high-level architectural overview.
[langchain](https://python.langchain.com/docs/concepts/architecture/#langchain): looking to understand the overall architecture of LangChain, when trying to determine what LangChain packages to install, or when wanting an overview of the various LangChain projects. This page outlines the hierarchical structure of the LangChain framework, describing the purpose and contents of key packages like langchain-core, langchain, integration packages, langchain-community, langgraph, langserve, and LangSmith.
[langgraph](https://python.langchain.com/docs/concepts/architecture/#langgraph): developing applications with LangChain, seeking to understand the overall architecture of LangChain, planning to contribute to or integrate with LangChain The page outlines the layered architecture of LangChain, describing the core abstraction layer, the main LangChain package, integration packages, community integrations, LangGraph for stateful agents, LangServe for deployment, and LangSmith developer tools
[Managing chat history](https://python.langchain.com/docs/concepts/chat_history/#managing-chat-history): understanding and managing chat history, learning about conversation patterns, following correct chat history structure. Explains chat history concept, provides guidelines for managing chat history, discusses conversation patterns involving users, assistants, and tools.
[OpenAI format](https://python.langchain.com/docs/concepts/messages/#openai-format): building chat applications, working with chat models, or consuming message streams. This page covers the structure and components of messages used in chat models, including roles, content, usage metadata, and different message types like HumanMessage, AIMessage, and ToolMessage.
[Propagation of RunnableConfig](https://python.langchain.com/docs/concepts/runnables/#propagation-of-runnableconfig): LLM should read this page when: learning about the LangChain Runnable interface, working with Runnables in LangChain, understanding how to configure and execute Runnables. The page covers the Runnable interface in LangChain, including invoking/batching/streaming Runnables, input/output schemas, configuring Runnables, creating custom Runnables, and working with configurable Runnables.
[rate-limiting](https://python.langchain.com/docs/concepts/chat_models/#rate-limiting): 1) working with chat models, 2) integrating tool calling or structured outputs, 3) understanding chat model capabilities. Overview of chat model interface, inputs/outputs, standard parameters; tool calling and structured output support; multimodality; context window; advanced topics like rate limiting, caching.
[RemoveMessage](https://python.langchain.com/docs/concepts/messages/#removemessage): needing information on the structure of messages used in conversational AI models, wanting to understand how messages are represented in LangChain, or looking for details on specific message types like SystemMessage, HumanMessage, and AIMessage. Messages are the basic units of communication in conversational AI models, containing a role (e.g. user, assistant), content (text or multimodal data), and metadata; LangChain provides a standardized message format and different message types to represent various components of a conversation.
[role](https://python.langchain.com/docs/concepts/messages/#role): understanding how to structure messages for chat models, accessing details about different LangChain message types, or converting between LangChain and OpenAI message formats. Messages are the core unit of communication in chat models, representing input/output content and metadata; LangChain defines SystemMessage, HumanMessage, AIMessage, ToolMessage and others to standardize message format across providers.
[RunnableConfig](https://python.langchain.com/docs/concepts/runnables/#runnableconfig): needing to understand the Runnable interface, invoking and configuring Runnables, and creating custom Runnables. The page covers the Runnable interface's core concepts, methods like invoke, batch, and stream, input/output types, configuring Runnables with RunnableConfig, creating custom Runnables from functions, and using configurable Runnables.
[Standard parameters for chat models](https://python.langchain.com/docs/concepts/chat_models/#standard-parameters): building applications using chat models, working with chat models for tool calling, structured outputs or multimodal inputs/outputs. Covers overview of chat models, integrations, interfaces, tool calling, structured outputs, multimodality, context window, rate-limiting, and caching of chat models.
[Standard tests](https://python.langchain.com/docs/concepts/testing/#standard-tests): needing guidance on testing LangChain components, or wanting to understand the different types of tests used in LangChain. This page discusses unit tests for individual functions, integration tests for validating multiple components working together, and LangChain's standard tests for ensuring consistency across tools and integrations.
[stream](https://python.langchain.com/docs/concepts/streaming/): [building applications that use streaming, building applications that need to display partial results in real-time, building applications that need to provide updates on pipeline or workflow progress] 'This page covers streaming in LangChain, including what can be streamed in LLM applications, the streaming APIs available, how to write custom data to the stream, and how LangChain automatically enables streaming for chat models in certain cases.'
[Tokens](https://python.langchain.com/docs/concepts/tokens/): needing to understand tokens used by LLMs, when dealing with character/token counts, when working with multimodal inputs Tokens are the fundamental units processed by language models. A token can represent words, word parts, punctuation, and other units. Models tokenize inputs, process tokens sequentially, and generate new tokens as output. Tokens enable efficient and contextual language processing compared to characters.
[Tokens](https://python.langchain.com/docs/concepts/tokens/): needing to understand tokens used by LLMs, when dealing with character/token counts, when working with multimodal inputs Tokens are the fundamental units processed by language models. A token can represent words, word parts, punctuation, and other units. Models tokenize inputs, process tokens sequentially, and generate new tokens as output. Tokens enable efficient and contextual language processing compared to characters.
[Tool artifacts](https://python.langchain.com/docs/concepts/tools/#tool-artifacts): needing to understand what tools are, how to create and use them, and how they integrate with models. Explains what tools are in LangChain, how to create them using the @tool decorator, special type annotations for configuring runtime behavior, how to use tools directly or pass them to chat models, and best practices for designing tools.
[Tool binding](https://python.langchain.com/docs/concepts/tool_calling/#tool-binding): determining if tool calling functionality is appropriate for their application, understanding the key concepts and workflow of tool calling, and considering best practices for designing tools. This page covers an overview of tool calling, key concepts like tool creation/binding/calling/execution, recommended usage workflow, details on implementing each step, and best practices for designing effective tools.
[@tool](https://python.langchain.com/docs/concepts/tools/#create-tools-using-the-tool-decorator): needing to understand tools in LangChain, when creating custom tools, or when integrating tools into LangChain applications. Provides an overview of tools, how to create and configure tools using the @tool decorator, different tool types (e.g. with artifacts, injected arguments), and best practices for designing tools.
[Toolkits](https://python.langchain.com/docs/concepts/tools/#toolkits): creating custom Python functions to use with LangChain, configuring existing tools, or adding tools to chat models. Explains the tool abstraction for encapsulating Python functions, creating tools with the `@tool` decorator, configuring schemas, handling tool artifacts, special type annotations, and using toolkits that group related tools.
[ToolMessage](https://python.langchain.com/docs/concepts/messages/#toolmessage): understanding the communication protocol with chat models, working with chat history management, or understanding LangChain's Message object structure. Messages are the unit of communication in chat models and represent input/output along with metadata; LangChain provides a unified Message format with types like SystemMessage, HumanMessage, AIMessage to handle different roles, content types, tool calls.
[Unit tests](https://python.langchain.com/docs/concepts/testing/#unit-tests): developing unit or integration tests, or when contributing to LangChain integrations Provides an overview of unit tests, integration tests, and standard tests used in the LangChain ecosystem
[Vector stores](https://python.langchain.com/docs/concepts/vectorstores/): LLM should read this page when: 1) Building applications that need to index and retrieve information based on semantic similarity 2) Integrating vector databases into their application 3) Exploring advanced vector search and retrieval techniques Vector stores are specialized data stores that enable indexing and retrieving information based on vector representations (embeddings) of data, allowing semantic similarity search over unstructured data like text, images, and audio. The page covers vector store integrations, the core interface, adding/deleting documents, basic and advanced similarity search techniques, and concepts like metadata filtering.
[with_structured_output](https://python.langchain.com/docs/concepts/structured_outputs/#structured-output-method): [needing to return structured data like JSON or database rows, working with models that support structured output like tools or JSON modes, or integrating with helper functions to streamline structured output] [Overview of structured output concept, schema definition formats like JSON/dicts and Pydantic, model integration methods like tool calling and JSON modes, LangChain structured output helper method]
[with_types](https://python.langchain.com/docs/concepts/runnables/#with_types): learning about the Runnable interface in LangChain, understanding how to work with Runnables, and customizing or configuring Runnables. The page covers the Runnable interface, optimized parallel execution, streaming APIs, input/output types, inspecting schemas, RunnableConfig options, creating custom Runnables from functions, and configurable Runnables.