Skip to main content

LangSmith LLM Runs

This notebook demonstrates how to directly load data from LangSmith's LLM runs and fine-tune a model on that data. The process is simple and comprises 3 steps.

  1. Select the LLM runs to train on.
  2. Use the LangSmithRunChatLoader to load runs as chat sessions.
  3. Fine-tune your model.

Then you can use the fine-tuned model in your LangChain app.

Before diving in, let's install our prerequisites.


Ensure you've installed langchain >= 0.0.311 and have configured your environment with your LangSmith API key.

%pip install --upgrade --quiet  langchain langchain-openai
import os
import uuid

uid = uuid.uuid4().hex[:6]
project_name = f"Run Fine-tuning Walkthrough {uid}"
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = project_name

1. Select Runs

The first step is selecting which runs to fine-tune on. A common case would be to select LLM runs within traces that have received positive user feedback. You can find examples of this in theLangSmith Cookbook and in the docs.

For the sake of this tutorial, we will generate some runs for you to use here. Let's try fine-tuning a simple function-calling chain.

from enum import Enum

from langchain_core.pydantic_v1 import BaseModel, Field

class Operation(Enum):
add = "+"
subtract = "-"
multiply = "*"
divide = "/"

class Calculator(BaseModel):
"""A calculator function"""

num1: float
num2: float
operation: Operation = Field(..., description="+,-,*,/")

def calculate(self):
if self.operation == Operation.add:
return self.num1 + self.num2
elif self.operation == Operation.subtract:
return self.num1 - self.num2
elif self.operation == Operation.multiply:
return self.num1 * self.num2
elif self.operation == Operation.divide:
if self.num2 != 0:
return self.num1 / self.num2
return "Cannot divide by zero"
from pprint import pprint

from langchain_core.pydantic_v1 import BaseModel
from langchain_core.utils.function_calling import convert_pydantic_to_openai_function

openai_function_def = convert_pydantic_to_openai_function(Calculator)
{'description': 'A calculator function',
'name': 'Calculator',
'parameters': {'description': 'A calculator function',
'properties': {'num1': {'title': 'Num1', 'type': 'number'},
'num2': {'title': 'Num2', 'type': 'number'},
'operation': {'allOf': [{'description': 'An '
'enum': ['+',
'title': 'Operation'}],
'description': '+,-,*,/'}},
'required': ['num1', 'num2', 'operation'],
'title': 'Calculator',
'type': 'object'}}
from langchain_core.output_parsers.openai_functions import PydanticOutputFunctionsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages(
("system", "You are an accounting assistant."),
("user", "{input}"),
chain = (
| ChatOpenAI().bind(functions=[openai_function_def])
| PydanticOutputFunctionsParser(pydantic_schema=Calculator)
| (lambda x: x.calculate())
math_questions = [
"What's 45/9?",
"What's 81/9?",
"What's 72/8?",
"What's 56/7?",
"What's 36/6?",
"What's 64/8?",
"What's 12*6?",
"What's 8*8?",
"What's 10*10?",
"What's 11*11?",
"What's 13*13?",
"What's 45+30?",
"What's 72+28?",
"What's 56+44?",
"What's 63+37?",
"What's 70-35?",
"What's 60-30?",
"What's 50-25?",
"What's 40-20?",
"What's 30-15?",
results = chain.batch([{"input": q} for q in math_questions], return_exceptions=True)

Load runs that did not error

Now we can select the successful runs to fine-tune on.

from langsmith.client import Client

client = Client()
successful_traces = {
for run in client.list_runs(

llm_runs = [
for run in client.list_runs(
if run.trace_id in successful_traces

2. Prepare data

Now we can create an instance of LangSmithRunChatLoader and load the chat sessions using its lazy_load() method.

from langchain_community.chat_loaders.langsmith import LangSmithRunChatLoader

loader = LangSmithRunChatLoader(runs=llm_runs)

chat_sessions = loader.lazy_load()

With the chat sessions loaded, convert them into a format suitable for fine-tuning.

from langchain_community.adapters.openai import convert_messages_for_finetuning

training_data = convert_messages_for_finetuning(chat_sessions)

3. Fine-tune the model

Now, initiate the fine-tuning process using the OpenAI library.

import json
import time
from io import BytesIO

import openai

my_file = BytesIO()
for dialog in training_data:
my_file.write((json.dumps({"messages": dialog}) + "\n").encode("utf-8"))
training_file = openai.files.create(file=my_file, purpose="fine-tune")

job =,

# Wait for the fine-tuning to complete (this may take some time)
status =
start_time = time.time()
while status != "succeeded":
print(f"Status=[{status}]... {time.time() - start_time:.2f}s", end="\r", flush=True)
status =

# Now your model is fine-tuned!
Status=[running]... 349.84s. 17.72s

4. Use in LangChain

After fine-tuning, use the resulting model ID with the ChatOpenAI model class in your LangChain app.

# Get the fine-tuned model ID
job =
model_id = job.fine_tuned_model

# Use the fine-tuned model in LangChain
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
API Reference:ChatOpenAI
(prompt | model).invoke({"input": "What's 56/7?"})
AIMessage(content='Let me calculate that for you.')

Now you have successfully fine-tuned a model using data from LangSmith LLM runs!

Was this page helpful?

You can also leave detailed feedback on GitHub.