smith
#
LangSmith utilities.
This module provides utilities for connecting to LangSmith. For more information on LangSmith, see the LangSmith documentation.
Evaluation
LangSmith helps you evaluate Chains and other language model application components using a number of LangChain evaluators.
An example of this is shown below, assuming you’ve created a LangSmith dataset called <my_dataset_name>
:
from langsmith import Client
from langchain_community.chat_models import ChatOpenAI
from langchain.chains import LLMChain
from langchain.smith import RunEvalConfig, run_on_dataset
# Chains may have memory. Passing in a constructor function lets the
# evaluation framework avoid cross-contamination between runs.
def construct_chain():
llm = ChatOpenAI(temperature=0)
chain = LLMChain.from_string(
llm,
"What's the answer to {your_input_key}"
)
return chain
# Load off-the-shelf evaluators via config or the EvaluatorType (string or enum)
evaluation_config = RunEvalConfig(
evaluators=[
"qa", # "Correctness" against a reference answer
"embedding_distance",
RunEvalConfig.Criteria("helpfulness"),
RunEvalConfig.Criteria({
"fifth-grader-score": "Do you have to be smarter than a fifth grader to answer this question?"
}),
]
)
client = Client()
run_on_dataset(
client,
"<my_dataset_name>",
construct_chain,
evaluation=evaluation_config,
)
You can also create custom evaluators by subclassing the
StringEvaluator
or LangSmith’s RunEvaluator classes.
from typing import Optional
from langchain.evaluation import StringEvaluator
class MyStringEvaluator(StringEvaluator):
@property
def requires_input(self) -> bool:
return False
@property
def requires_reference(self) -> bool:
return True
@property
def evaluation_name(self) -> str:
return "exact_match"
def _evaluate_strings(self, prediction, reference=None, input=None, **kwargs) -> dict:
return {"score": prediction == reference}
evaluation_config = RunEvalConfig(
custom_evaluators = [MyStringEvaluator()],
)
run_on_dataset(
client,
"<my_dataset_name>",
construct_chain,
evaluation=evaluation_config,
)
Primary Functions
arun_on_dataset
: Asynchronous function to evaluate a chain, agent, or other LangChain component over a dataset.run_on_dataset
: Function to evaluate a chain, agent, or other LangChain component over a dataset.RunEvalConfig
: Class representing the configuration for running evaluation. You can select evaluators byEvaluatorType
or config, or you can pass in custom_evaluators
Classes
Configuration for a given run evaluator. |
|
Configuration for a run evaluation. |
|
Configuration for a run evaluator that only requires a single key. |
|
A simple progress bar for the console. |
|
Input for a chat model. |
|
Your architecture raised an error. |
|
Raised when the input format is invalid. |
|
A dictionary of the results of a single test run. |
|
Extract items to evaluate from the run object from a chain. |
|
Extract items to evaluate from the run object. |
|
Map an example, or row in the dataset, to the inputs of an evaluation. |
|
|
Evaluate Run and optional examples. |
Extract items to evaluate from the run object. |
|
Map an input to the tool. |
Functions
Generate a random name. |
|
Run the Chain or language model on a dataset and store traces to the specified project name. |
|
Run the Chain or language model on a dataset and store traces to the specified project name. |