ConcurrentLoader#

class langchain_community.document_loaders.concurrent.ConcurrentLoader(blob_loader: BlobLoader, blob_parser: BaseBlobParser, num_workers: int = 4)[source]#

Load and pars Documents concurrently.

A generic document loader.

Parameters:
  • blob_loader (BlobLoader) – A blob loader which knows how to yield blobs

  • blob_parser (BaseBlobParser) – A blob parser which knows how to parse blobs into documents

  • num_workers (int) –

Methods

__init__(blob_loader,Β blob_parser[,Β num_workers])

A generic document loader.

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

from_filesystem(path,Β *[,Β glob,Β exclude,Β ...])

Create a concurrent generic document loader using a filesystem blob loader.

get_parser(**kwargs)

Override this method to associate a default parser with the class.

lazy_load()

Load documents lazily with concurrent parsing.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load all documents and split them into sentences.

__init__(blob_loader: BlobLoader, blob_parser: BaseBlobParser, num_workers: int = 4) β†’ None[source]#

A generic document loader.

Parameters:
  • blob_loader (BlobLoader) – A blob loader which knows how to yield blobs

  • blob_parser (BaseBlobParser) – A blob parser which knows how to parse blobs into documents

  • num_workers (int) –

Return type:

None

async alazy_load() β†’ AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() β†’ List[Document]#

Load data into Document objects.

Return type:

List[Document]

classmethod from_filesystem(path: str | Path, *, glob: str = '**/[!.]*', exclude: Sequence[str] = (), suffixes: Sequence[str] | None = None, show_progress: bool = False, parser: Literal['default'] | BaseBlobParser = 'default', num_workers: int = 4, parser_kwargs: dict | None = None) β†’ ConcurrentLoader[source]#

Create a concurrent generic document loader using a filesystem blob loader.

Parameters:
  • path (str | Path) – The path to the directory to load documents from.

  • glob (str) – The glob pattern to use to find documents.

  • suffixes (Sequence[str] | None) – The suffixes to use to filter documents. If None, all files matching the glob will be loaded.

  • exclude (Sequence[str]) – A list of patterns to exclude from the loader.

  • show_progress (bool) – Whether to show a progress bar or not (requires tqdm). Proxies to the file system loader.

  • parser (Literal['default'] | ~langchain_core.document_loaders.base.BaseBlobParser) – A blob parser which knows how to parse blobs into documents

  • num_workers (int) – Max number of concurrent workers to use.

  • parser_kwargs (dict | None) – Keyword arguments to pass to the parser.

Return type:

ConcurrentLoader

static get_parser(**kwargs: Any) β†’ BaseBlobParser#

Override this method to associate a default parser with the class.

Parameters:

kwargs (Any) –

Return type:

BaseBlobParser

lazy_load() β†’ Iterator[Document][source]#

Load documents lazily with concurrent parsing.

Return type:

Iterator[Document]

load() β†’ List[Document]#

Load data into Document objects.

Return type:

List[Document]

load_and_split(text_splitter: TextSplitter | None = None) β†’ List[Document]#

Load all documents and split them into sentences.

Parameters:

text_splitter (Optional[TextSplitter]) –

Return type:

List[Document]

Examples using ConcurrentLoader