TensorflowDatasets#
- class langchain_community.utilities.tensorflow_datasets.TensorflowDatasets[source]#
Bases:
BaseModel
Access to the TensorFlow Datasets.
The Current implementation can work only with datasets that fit in a memory.
TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks, such as Jax. All datasets are exposed as tf.data.Datasets. To get started see the Guide: https://www.tensorflow.org/datasets/overview and the list of datasets: https://www.tensorflow.org/datasets/catalog/
overview#all_datasets
- You have to provide the sample_to_document_function: a function that
a sample from the dataset-specific format to the Document.
- dataset_name#
the name of the dataset to load
- split_name#
the name of the split to load. Defaults to βtrainβ.
- load_max_docs#
a limit to the number of loaded documents. Defaults to 100.
- sample_to_document_function#
a function that converts a dataset sample to a Document
Example
from langchain_community.utilities import TensorflowDatasets def mlqaen_example_to_document(example: dict) -> Document: return Document( page_content=decode_to_str(example["context"]), metadata={ "id": decode_to_str(example["id"]), "title": decode_to_str(example["title"]), "question": decode_to_str(example["question"]), "answer": decode_to_str(example["answers"]["text"][0]), }, ) tsds_client = TensorflowDatasets( dataset_name="mlqa/en", split_name="train", load_max_docs=MAX_DOCS, sample_to_document_function=mlqaen_example_to_document, )
Create a new model by parsing and validating input data from keyword arguments.
Raises ValidationError if the input data cannot be parsed to form a valid model.
- param dataset_name: str = ''#
- param load_max_docs: int = 100#
- param split_name: str = 'train'#