MarkdownifyTransformer#

class langchain_community.document_transformers.markdownify.MarkdownifyTransformer(strip: str | List[str] | None = None, convert: str | List[str] | None = None, autolinks: bool = True, heading_style: str = 'ATX', **kwargs: Any)[source]#

Converts HTML documents to Markdown format with customizable options for handling links, images, other tags and heading styles using the markdownify library.

Parameters:
  • strip (str | List[str] | None) โ€“ A list of tags to strip. This option canโ€™t be used with the convert option.

  • convert (str | List[str] | None) โ€“ A list of tags to convert. This option canโ€™t be used with the strip option.

  • autolinks (bool) โ€“ A boolean indicating whether the โ€œautomatic linkโ€ style should be used when a a tagโ€™s contents match its href. Defaults to True.

  • heading_style (str) โ€“ Defines how headings should be converted. Accepted values are ATX, ATX_CLOSED, SETEXT, and UNDERLINED (which is an alias for SETEXT). Defaults to ATX.

  • kwargs (Any) โ€“ Additional options to pass to markdownify.

Example

More configuration options can be found at the markdownify GitHub page: matthewwithanm/python-markdownify

Methods

__init__([strip, convert, autolinks, ...])

atransform_documents(documents, **kwargs)

Asynchronously transform a list of documents.

transform_documents(documents, **kwargs)

Transform a list of documents.

__init__(strip: str | List[str] | None = None, convert: str | List[str] | None = None, autolinks: bool = True, heading_style: str = 'ATX', **kwargs: Any) โ†’ None[source]#
Parameters:
  • strip (str | List[str] | None)

  • convert (str | List[str] | None)

  • autolinks (bool)

  • heading_style (str)

  • kwargs (Any)

Return type:

None

async atransform_documents(documents: Sequence[Document], **kwargs: Any) โ†’ Sequence[Document]#

Asynchronously transform a list of documents.

Parameters:
  • documents (Sequence[Document]) โ€“ A sequence of Documents to be transformed.

  • kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

transform_documents(documents: Sequence[Document], **kwargs: Any) โ†’ Sequence[Document][source]#

Transform a list of documents.

Parameters:
  • documents (Sequence[Document]) โ€“ A sequence of Documents to be transformed.

  • kwargs (Any)

Returns:

A sequence of transformed Documents.

Return type:

Sequence[Document]

Examples using MarkdownifyTransformer