LinkExtractorTransformer#

Beta

This feature is in beta. It is actively being worked on, so the API may change.

DocumentTransformer for applying one or more LinkExtractors.

Example

extract_links = LinkExtractorTransformer([
    HtmlLinkExtractor().as_document_extractor(),
])
extract_links.transform_documents(docs)

Create a DocumentTransformer which adds extracted links to each document.

Methods

__init__(link_extractors)

Create a DocumentTransformer which adds extracted links to each document.

atransform_documents(documents, **kwargs)

Asynchronously transform a list of documents.

transform_documents(documents, **kwargs)

Transform a list of documents.

Parameters:

link_extractors (Sequence[LinkExtractor[Document]])

Create a DocumentTransformer which adds extracted links to each document.

Parameters:

link_extractors (Sequence[LinkExtractor[Document]])

Asynchronously transform a list of documents.

Parameters:
  • documents (Sequence[Document]) – A sequence of Documents to be transformed.

  • kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

Transform a list of documents.

Parameters:
  • documents (Sequence[Document]) – A sequence of Documents to be transformed.

  • kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]