HierarchyLinkExtractor#

Beta

This feature is in beta. It is actively being worked on, so the API may change.

Extract links from a document hierarchy.

Example

# Given three paths (in this case, within the "Root" document):
h1 = ["Root", "H1"]
h1a = ["Root", "H1", "a"]
h1b = ["Root", "H1", "b"]

# Parent links `h1a` and `h1b` to `h1`.
# Child links `h1` to `h1a` and `h1b`.
# Sibling links `h1a` and `h1b` together (both directions).
Example use with documents:
Parameters:
  • kind (str) – Kind of links to produce with this extractor.

  • parent_links (bool) – Link from a section to its parent.

  • child_links (bool) – Link from a section to its children.

  • sibling_links (bool) – Link from a section to other sections with the same parent.

Methods

__init__(*[,Β kind,Β parent_links,Β ...])

Extract links from a document hierarchy.

as_document_extractor(hierarchy)

Create a LinkExtractor from Document.

extract_many(inputs)

Add edges from each input to the corresponding documents.

extract_one(input)

Add edges from each input to the corresponding documents.

Extract links from a document hierarchy.

Example

# Given three paths (in this case, within the "Root" document):
h1 = ["Root", "H1"]
h1a = ["Root", "H1", "a"]
h1b = ["Root", "H1", "b"]

# Parent links `h1a` and `h1b` to `h1`.
# Child links `h1` to `h1a` and `h1b`.
# Sibling links `h1a` and `h1b` together (both directions).
Example use with documents:
Parameters:
  • kind (str) – Kind of links to produce with this extractor.

  • parent_links (bool) – Link from a section to its parent.

  • child_links (bool) – Link from a section to its children.

  • sibling_links (bool) – Link from a section to other sections with the same parent.

Create a LinkExtractor from Document.

Parameters:

hierarchy (Callable[[Document], List[str]]) – Function that returns the path for the given document.

Returns:

A LinkExtractor[Document] suitable for application to Documents directly or with LinkExtractorTransformer.

Return type:

LinkExtractor[Document]

Add edges from each input to the corresponding documents.

Parameters:

inputs (Iterable[InputT]) – The input content to extract edges from.

Returns:

Iterable over the set of links extracted from the input.

Return type:

Iterable[Set[Link]]

Add edges from each input to the corresponding documents.

Parameters:

input (List[str]) – The input content to extract edges from.

Returns:

Set of links extracted from the input.

Return type:

Set[Link]