MarkdownHeaderTextSplitter#

class langchain_text_splitters.markdown.MarkdownHeaderTextSplitter( headers_to_split_on: list[tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True, custom_header_patterns: dict[str, int] | None = None, )[source]#

Splitting markdown files based on specified headers.

Create a new MarkdownHeaderTextSplitter.

Parameters:

headers_to_split_on (list[tuple[str, str]]) – Headers we want to track
return_each_line (bool) – Return each line w/ associated headers
strip_headers (bool) – Strip split headers from the content of the chunk
custom_header_patterns (Optional[dict[str, int]]) – Optional dict mapping header patterns to their levels. For example: {“**”: 1, “*”: 2} to treat **Header as level 1 and *Header* as level 2 headers.

Methods

`__init__`(headers_to_split_on[, ...])	Create a new MarkdownHeaderTextSplitter.
`aggregate_lines_to_chunks`(lines)	Combine lines with common metadata into chunks.
`split_text`(text)	Split markdown file.

__init__( headers_to_split_on: list[tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True, custom_header_patterns: dict[str, int] | None = None, ) → None[source]#

Create a new MarkdownHeaderTextSplitter.

Parameters:

headers_to_split_on (list[tuple[str, str]]) – Headers we want to track
return_each_line (bool) – Return each line w/ associated headers
strip_headers (bool) – Strip split headers from the content of the chunk
custom_header_patterns (dict[str, int] | None) – Optional dict mapping header patterns to their levels. For example: {“**”: 1, “*”: 2} to treat **Header as level 1 and *Header* as level 2 headers.

Return type:

None

aggregate_lines_to_chunks( lines: list[LineType], ) → list[Document][source]#

Combine lines with common metadata into chunks.

Parameters:: lines (list[LineType]) – Line of text / associated header metadata
Return type:: list[Document]

split_text( text: str, ) → list[Document][source]#

Split markdown file.

Parameters:: text (str) – Markdown file
Return type:: list[Document]

Examples using MarkdownHeaderTextSplitter

How to split Markdown by Headers