PairwiseStringEvaluator#

class langchain.evaluation.schema.PairwiseStringEvaluator[source]#

Compare the output of two models (or two outputs of the same model).

Attributes

`requires_input`	Whether this evaluator requires an input string.
`requires_reference`	Whether this evaluator requires a reference label.

Methods

`aevaluate_string_pairs`(*, prediction, ...[, ...])	Asynchronously evaluate the output string pairs.
`evaluate_string_pairs`(*, prediction, ...[, ...])	Evaluate the output string pairs.

async aevaluate_string_pairs(

*,

prediction: str,

prediction_b: str,

reference: str | None = None,

input: str | None = None,

**kwargs: Any,

) → dict[source]#

Asynchronously evaluate the output string pairs.

Parameters:

prediction (str) – The output string from the first model.
prediction_b (str) – The output string from the second model.
reference (Optional[str], optional) – The expected output / reference string.
input (Optional[str], optional) – The input string.
kwargs (Any) – Additional keyword arguments, such as callbacks and optional reference strings.

Returns:

A dictionary containing the preference, scores, and/or other information.

Return type:

dict

evaluate_string_pairs(

*,

prediction: str,

prediction_b: str,

reference: str | None = None,

input: str | None = None,

**kwargs: Any,

) → dict[source]#

Evaluate the output string pairs.

Parameters:

prediction (str) – The output string from the first model.
prediction_b (str) – The output string from the second model.
reference (Optional[str], optional) – The expected output / reference string.
input (Optional[str], optional) – The input string.
kwargs (Any) – Additional keyword arguments, such as callbacks and optional reference strings.

Returns:

A dictionary containing the preference, scores, and/or other information.

Return type:

dict