SyntheticDataGenerator#
- class langchain_experimental.tabular_synthetic_data.base.SyntheticDataGenerator[source]#
Bases:
BaseModel
Generate synthetic data using the given LLM and few-shot template.
Utilizes the provided LLM to produce synthetic data based on the few-shot prompt template.
- template#
Template for few-shot prompting.
- Type:
- llm#
Large Language Model to use for generation.
- Type:
Optional[BaseLanguageModel]
- example_input_key#
Key to use for storing example inputs.
- Type:
str
- Usage Example:
>>> template = FewShotPromptTemplate(...) >>> llm = BaseLanguageModel(...) >>> generator = SyntheticDataGenerator(template=template, llm=llm) >>> results = generator.generate(subject="climate change", runs=5)
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- param example_input_key: str = 'example'#
- param llm: BaseLanguageModel | None = None#
- param results: list = []#
- param template: FewShotPromptTemplate [Required]#
- async agenerate(subject: str, runs: int, extra: str = '', *args: Any, **kwargs: Any) List[str] [source]#
Generate synthetic data using the given subject asynchronously.
Note: Since the LLM calls run concurrently, you may have fewer duplicates by adding specific instructions to the βextraβ keyword argument.
- Parameters:
subject (str) β The subject the synthetic data will be about.
runs (int) β Number of times to generate the data asynchronously.
extra (str) β Extra instructions for steerability in data generation.
args (Any)
kwargs (Any)
- Returns:
List of generated synthetic data for the given subject.
- Return type:
List[str]
- Usage Example:
>>> results = await generator.agenerate(subject="climate change", runs=5, extra="Focus on env impacts.")
- generate(subject: str, runs: int, *args: Any, **kwargs: Any) List[str] [source]#
Generate synthetic data using the given subject string.
- Parameters:
subject (str) β The subject the synthetic data will be about.
runs (int) β Number of times to generate the data.
extra (str) β Extra instructions for steerability in data generation.
args (Any)
kwargs (Any)
- Returns:
List of generated synthetic data.
- Return type:
List[str]
- Usage Example:
>>> results = generator.generate(subject="climate change", runs=5, extra="Focus on environmental impacts.")