MarkdownHeaderTextSplitter#
- class langchain_text_splitters.markdown.MarkdownHeaderTextSplitter(
- headers_to_split_on: list[tuple[str, str]],
- return_each_line: bool = False,
- strip_headers: bool = True,
- custom_header_patterns: dict[str, int] | None = None,
Splitting markdown files based on specified headers.
Create a new MarkdownHeaderTextSplitter.
- Parameters:
headers_to_split_on (list[tuple[str, str]]) – Headers we want to track
return_each_line (bool) – Return each line w/ associated headers
strip_headers (bool) – Strip split headers from the content of the chunk
custom_header_patterns (Optional[dict[str, int]]) – Optional dict mapping header patterns to their levels. For example: {“**”: 1, “*”: 2} to treat **Header as level 1 and *Header* as level 2 headers.
Methods
__init__
(headers_to_split_on[, ...])Create a new MarkdownHeaderTextSplitter.
aggregate_lines_to_chunks
(lines)Combine lines with common metadata into chunks.
split_text
(text)Split markdown file.
- __init__(
- headers_to_split_on: list[tuple[str, str]],
- return_each_line: bool = False,
- strip_headers: bool = True,
- custom_header_patterns: dict[str, int] | None = None,
Create a new MarkdownHeaderTextSplitter.
- Parameters:
headers_to_split_on (list[tuple[str, str]]) – Headers we want to track
return_each_line (bool) – Return each line w/ associated headers
strip_headers (bool) – Strip split headers from the content of the chunk
custom_header_patterns (dict[str, int] | None) – Optional dict mapping header patterns to their levels. For example: {“**”: 1, “*”: 2} to treat **Header as level 1 and *Header* as level 2 headers.
- Return type:
None
Examples using MarkdownHeaderTextSplitter