SyntheticDataGenerator#

class langchain_experimental.tabular_synthetic_data.base.SyntheticDataGenerator[source]#

Bases: BaseModel

Generate synthetic data using the given LLM and few-shot template.

Utilizes the provided LLM to produce synthetic data based on the few-shot prompt template.

template#

Template for few-shot prompting.

Type:

FewShotPromptTemplate

llm#

Large Language Model to use for generation.

Type:

Optional[BaseLanguageModel]

llm_chain#

LLM chain with the LLM and few-shot template.

Type:

Optional[Chain]

example_input_key#

Key to use for storing example inputs.

Type:

str

Usage Example:
>>> template = FewShotPromptTemplate(...)
>>> llm = BaseLanguageModel(...)
>>> generator = SyntheticDataGenerator(template=template, llm=llm)
>>> results = generator.generate(subject="climate change", runs=5)

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param example_input_key: str = 'example'#
param llm: BaseLanguageModel | None = None#
param llm_chain: Chain | None = None#
param results: list = []#
param template: FewShotPromptTemplate [Required]#
async agenerate(subject: str, runs: int, extra: str = '', *args: Any, **kwargs: Any) β†’ List[str][source]#

Generate synthetic data using the given subject asynchronously.

Note: Since the LLM calls run concurrently, you may have fewer duplicates by adding specific instructions to the β€œextra” keyword argument.

Parameters:
  • subject (str) – The subject the synthetic data will be about.

  • runs (int) – Number of times to generate the data asynchronously.

  • extra (str) – Extra instructions for steerability in data generation.

  • args (Any)

  • kwargs (Any)

Returns:

List of generated synthetic data for the given subject.

Return type:

List[str]

Usage Example:
>>> results = await generator.agenerate(subject="climate change", runs=5,
extra="Focus on env impacts.")
generate(subject: str, runs: int, *args: Any, **kwargs: Any) β†’ List[str][source]#

Generate synthetic data using the given subject string.

Parameters:
  • subject (str) – The subject the synthetic data will be about.

  • runs (int) – Number of times to generate the data.

  • extra (str) – Extra instructions for steerability in data generation.

  • args (Any)

  • kwargs (Any)

Returns:

List of generated synthetic data.

Return type:

List[str]

Usage Example:
>>> results = generator.generate(subject="climate change", runs=5,
extra="Focus on environmental impacts.")