ApifyDatasetLoader#
- class langchain_community.document_loaders.apify_dataset.ApifyDatasetLoader[source]#
Bases:
BaseLoader
,BaseModel
Load datasets from Apify web scraping, crawling, and data extraction platform.
For details, see https://docs.apify.com/platform/integrations/langchain
Example
from langchain_community.document_loaders import ApifyDatasetLoader from langchain_core.documents import Document loader = ApifyDatasetLoader( dataset_id="YOUR-DATASET-ID", dataset_mapping_function=lambda dataset_item: Document( page_content=dataset_item["text"], metadata={"source": dataset_item["url"]} ), ) documents = loader.load()
Initialize the loader with an Apify dataset ID and a mapping function.
- Parameters:
dataset_id (str) – The ID of the dataset on the Apify platform.
dataset_mapping_function (Callable) – A function that takes a single dictionary (an Apify dataset item) and converts it to an instance of the Document class.
- param apify_client: Any = None#
An instance of the ApifyClient class from the apify-client Python package.
- param dataset_id: str [Required]#
The ID of the dataset on the Apify platform.
- param dataset_mapping_function: Callable[[Dict], Document] [Required]#
A custom function that takes a single dictionary (an Apify dataset item) and converts it to an instance of the Document class.
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- load_and_split(text_splitter: TextSplitter | None = None) List[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
List[Document]
Examples using ApifyDatasetLoader