RunEvalConfig#
- class langchain.smith.evaluation.config.RunEvalConfig[source]#
Bases:
BaseModel
Configuration for a run evaluation.
- Parameters:
evaluators (List[Union[EvaluatorType, EvalConfig, RunEvaluator, Callable]]) – Configurations for which evaluators to apply to the dataset run. Each can be the string of an
EvaluatorType
, such as EvaluatorType.QA, the evaluator type string (“qa”), or a configuration for a given evaluator (e.g.,RunEvalConfig.QA
).custom_evaluators (Optional[List[Union[RunEvaluator, StringEvaluator]]]) – Custom evaluators to apply to the dataset run.
reference_key (Optional[str]) – The key in the dataset run to use as the reference string. If not provided, it will be inferred automatically.
prediction_key (Optional[str]) – The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
input_key (Optional[str]) – The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
eval_llm (Optional[BaseLanguageModel]) – The language model to pass to any evaluators that use a language model.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param batch_evaluators: List[Callable[[Sequence[Run], Sequence[Example] | None], EvaluationResult | EvaluationResults | dict]] | None = None#
Evaluators that run on an aggregate/batch level.
These generate 1 or more metrics that are assigned to the full test run. As a result, they are not associated with individual traces.
- param custom_evaluators: List[Callable[[Run, Example | None], EvaluationResult | EvaluationResults | dict] | RunEvaluator | StringEvaluator] | None = None#
Custom evaluators to apply to the dataset run.
- param eval_llm: BaseLanguageModel | None = None#
The language model to pass to any evaluators that require one.
- param evaluators: List[EvaluatorType | str | EvalConfig | Callable[[Run, Example | None], EvaluationResult | EvaluationResults | dict] | RunEvaluator | StringEvaluator] [Optional]#
Configurations for which evaluators to apply to the dataset run. Each can be the string of an
EvaluatorType
, such as EvaluatorType.QA, the evaluator type string (“qa”), or a configuration for a given evaluator (e.g.,RunEvalConfig.QA
).
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- class CoTQA[source]#
Bases:
SingleKeyEvalConfig
Configuration for a context-based QA evaluator.
- Parameters:
prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.CONTEXT_QA#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param prompt: BasePromptTemplate | None = None#
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class ContextQA[source]#
Bases:
SingleKeyEvalConfig
Configuration for a context-based QA evaluator.
- Parameters:
prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.CONTEXT_QA#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param prompt: BasePromptTemplate | None = None#
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class Criteria[source]#
Bases:
SingleKeyEvalConfig
Configuration for a reference-free criteria evaluator.
- Parameters:
criteria (Optional[CRITERIA_TYPE]) – The criteria to evaluate.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param criteria: Mapping[str, str] | Criteria | ConstitutionalPrinciple | None = None#
- param evaluator_type: EvaluatorType = EvaluatorType.CRITERIA#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class EmbeddingDistance[source]#
Bases:
SingleKeyEvalConfig
Configuration for an embedding distance evaluator.
- Parameters:
embeddings (Optional[Embeddings]) – The embeddings to use for computing the distance.
distance_metric (Optional[EmbeddingDistanceEnum]) – The distance metric to use for computing the distance.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param distance_metric: EmbeddingDistance | None = None#
- param embeddings: Embeddings | None = None#
- param evaluator_type: EvaluatorType = EvaluatorType.EMBEDDING_DISTANCE#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class ExactMatch[source]#
Bases:
SingleKeyEvalConfig
Configuration for an exact match string evaluator.
- Parameters:
ignore_case (bool) – Whether to ignore case when comparing strings.
ignore_punctuation (bool) – Whether to ignore punctuation when comparing strings.
ignore_numbers (bool) – Whether to ignore numbers when comparing strings.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.EXACT_MATCH#
- param ignore_case: bool = False#
- param ignore_numbers: bool = False#
- param ignore_punctuation: bool = False#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class JsonEqualityEvaluator[source]#
Bases:
EvalConfig
Configuration for a json equality evaluator.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.JSON_EQUALITY#
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class JsonValidity[source]#
Bases:
SingleKeyEvalConfig
Configuration for a json validity evaluator.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.JSON_VALIDITY#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class LabeledCriteria[source]#
Bases:
SingleKeyEvalConfig
Configuration for a labeled (with references) criteria evaluator.
- Parameters:
criteria (Optional[CRITERIA_TYPE]) – The criteria to evaluate.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param criteria: Mapping[str, str] | Criteria | ConstitutionalPrinciple | None = None#
- param evaluator_type: EvaluatorType = EvaluatorType.LABELED_CRITERIA#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class LabeledScoreString[source]#
Bases:
ScoreString
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param criteria: Mapping[str, str] | Criteria | ConstitutionalPrinciple | None = None#
- param evaluator_type: EvaluatorType = EvaluatorType.LABELED_SCORE_STRING#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param normalize_by: float | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param prompt: BasePromptTemplate | None = None#
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class QA[source]#
Bases:
SingleKeyEvalConfig
Configuration for a QA evaluator.
- Parameters:
prompt (Optional[BasePromptTemplate]) – The prompt template to use for generating the question.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.QA#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param prompt: BasePromptTemplate | None = None#
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class RegexMatch[source]#
Bases:
SingleKeyEvalConfig
Configuration for a regex match string evaluator.
- Parameters:
flags (int) – The flags to pass to the regex. Example: re.IGNORECASE.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param evaluator_type: EvaluatorType = EvaluatorType.REGEX_MATCH#
- param flags: int = 0#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class ScoreString[source]#
Bases:
SingleKeyEvalConfig
Configuration for a score string evaluator. This is like the criteria evaluator but it is configured by default to return a score on the scale from 1-10.
It is recommended to normalize these scores by setting normalize_by to 10.
- Parameters:
criteria (Optional[CRITERIA_TYPE]) – The criteria to evaluate.
llm (Optional[BaseLanguageModel]) – The language model to use for the evaluation chain.
normalize_by (Optional[int] = None) – If you want to normalize the score, the denominator to use. If not provided, the score will be between 1 and 10 (by default).
prompt (Optional[BasePromptTemplate]) –
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param criteria: Mapping[str, str] | Criteria | ConstitutionalPrinciple | None = None#
- param evaluator_type: EvaluatorType = EvaluatorType.SCORE_STRING#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param llm: BaseLanguageModel | None = None#
- param normalize_by: float | None = None#
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param prompt: BasePromptTemplate | None = None#
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]
- class StringDistance[source]#
Bases:
SingleKeyEvalConfig
Configuration for a string distance evaluator.
- Parameters:
distance (Optional[StringDistanceEnum]) – The string distance metric to use.
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param distance: StringDistance | None = None#
The string distance metric to use. damerau_levenshtein: The Damerau-Levenshtein distance. levenshtein: The Levenshtein distance. jaro: The Jaro distance. jaro_winkler: The Jaro-Winkler distance.
- param evaluator_type: EvaluatorType = EvaluatorType.STRING_DISTANCE#
- param input_key: str | None = None#
The key from the traced run’s inputs dictionary to use to represent the input. If not provided, it will be inferred automatically.
- param normalize_score: bool = True#
Whether to normalize the distance to between 0 and 1. Applies only to the Levenshtein and Damerau-Levenshtein distances.
- param prediction_key: str | None = None#
The key from the traced run’s outputs dictionary to use to represent the prediction. If not provided, it will be inferred automatically.
- param reference_key: str | None = None#
The key in the dataset run to use as the reference string. If not provided, we will attempt to infer automatically.
- get_kwargs() Dict[str, Any] #
Get the keyword arguments for the load_evaluator call.
- Returns:
The keyword arguments for the load_evaluator call.
- Return type:
Dict[str, Any]