TensorflowDatasetLoader#

class langchain_community.document_loaders.tensorflow_datasets.TensorflowDatasetLoader(dataset_name: str, split_name: str, load_max_docs: int | None = 100, sample_to_document_function: Callable[[Dict], Document] | None = None)[source]#

Load from TensorFlow Dataset.

dataset_name#: the name of the dataset to load

split_name#: the name of the split to load.

load_max_docs#: a limit to the number of loaded documents. Defaults to 100.

sample_to_document_function#: a function that converts a dataset sample into a Document

Example

from langchain_community.document_loaders import TensorflowDatasetLoader

def mlqaen_example_to_document(example: dict) -> Document:
    return Document(
        page_content=decode_to_str(example["context"]),
        metadata={
            "id": decode_to_str(example["id"]),
            "title": decode_to_str(example["title"]),
            "question": decode_to_str(example["question"]),
            "answer": decode_to_str(example["answers"]["text"][0]),
        },
    )

tsds_client = TensorflowDatasetLoader(
        dataset_name="mlqa/en",
        split_name="test",
        load_max_docs=100,
        sample_to_document_function=mlqaen_example_to_document,
    )

Initialize the TensorflowDatasetLoader.

Parameters:

dataset_name (str) – the name of the dataset to load
split_name (str) – the name of the split to load.
load_max_docs (int | None) – a limit to the number of loaded documents. Defaults to 100.
sample_to_document_function (Callable[[Dict], Document] | None) – a function that converts a dataset sample into a Document.

Attributes

Methods

`__init__`(dataset_name, split_name[, ...])	Initialize the TensorflowDatasetLoader.
`alazy_load`()	A lazy loader for Documents.
`aload`()	Load data into Document objects.
`lazy_load`()	A lazy loader for Documents.
`load`()	Load data into Document objects.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

__init__(dataset_name: str, split_name: str, load_max_docs: int | None = 100, sample_to_document_function: Callable[[Dict], Document] | None = None)[source]#

Initialize the TensorflowDatasetLoader.

Parameters:

dataset_name (str) – the name of the dataset to load
split_name (str) – the name of the split to load.
load_max_docs (int | None) – a limit to the number of loaded documents. Defaults to 100.
sample_to_document_function (Callable[[Dict], Document] | None) – a function that converts a dataset sample into a Document.

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

Return type:: AsyncIterator[Document]

async aload() → List[Document]#

Load data into Document objects.

Return type:: List[Document]

lazy_load() → Iterator[Document][source]#

A lazy loader for Documents.

Return type:: Iterator[Document]

load() → List[Document]#

Load data into Document objects.

Return type:: List[Document]

load_and_split(text_splitter: TextSplitter | None = None) → List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns:: List of Documents.
Return type:: List[Document]

Examples using TensorflowDatasetLoader

TensorFlow Datasets