MarkdownifyTransformer#
- class langchain_community.document_transformers.markdownify.MarkdownifyTransformer(strip: str | List[str] | None = None, convert: str | List[str] | None = None, autolinks: bool = True, heading_style: str = 'ATX', **kwargs: Any)[source]#
Converts HTML documents to Markdown format with customizable options for handling links, images, other tags and heading styles using the markdownify library.
- Parameters:
strip (str | List[str] | None) β A list of tags to strip. This option canβt be used with the convert option.
convert (str | List[str] | None) β A list of tags to convert. This option canβt be used with the strip option.
autolinks (bool) β A boolean indicating whether the βautomatic linkβ style should be used when a a tagβs contents match its href. Defaults to True.
heading_style (str) β Defines how headings should be converted. Accepted values are ATX, ATX_CLOSED, SETEXT, and UNDERLINED (which is an alias for SETEXT). Defaults to ATX.
kwargs (Any) β Additional options to pass to markdownify.
Example
More configuration options can be found at the markdownify GitHub page: matthewwithanm/python-markdownify
Methods
__init__
([strip, convert, autolinks, ...])atransform_documents
(documents, **kwargs)Asynchronously transform a list of documents.
transform_documents
(documents, **kwargs)Transform a list of documents.
- __init__(strip: str | List[str] | None = None, convert: str | List[str] | None = None, autolinks: bool = True, heading_style: str = 'ATX', **kwargs: Any) None [source]#
- Parameters:
strip (str | List[str] | None) β
convert (str | List[str] | None) β
autolinks (bool) β
heading_style (str) β
kwargs (Any) β
- Return type:
None
- async atransform_documents(documents: Sequence[Document], **kwargs: Any) Sequence[Document] [source]#
Asynchronously transform a list of documents.
Examples using MarkdownifyTransformer