MarkdownHeaderTextSplitter#

class langchain_text_splitters.markdown.MarkdownHeaderTextSplitter(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]#

Splitting markdown files based on specified headers.

Create a new MarkdownHeaderTextSplitter.

Parameters:
  • headers_to_split_on (List[Tuple[str, str]]) – Headers we want to track

  • return_each_line (bool) – Return each line w/ associated headers

  • strip_headers (bool) – Strip split headers from the content of the chunk

Methods

__init__(headers_to_split_on[, ...])

Create a new MarkdownHeaderTextSplitter.

aggregate_lines_to_chunks(lines)

Combine lines with common metadata into chunks.

split_text(text)

Split markdown file.

__init__(headers_to_split_on: List[Tuple[str, str]], return_each_line: bool = False, strip_headers: bool = True)[source]#

Create a new MarkdownHeaderTextSplitter.

Parameters:
  • headers_to_split_on (List[Tuple[str, str]]) – Headers we want to track

  • return_each_line (bool) – Return each line w/ associated headers

  • strip_headers (bool) – Strip split headers from the content of the chunk

aggregate_lines_to_chunks(lines: List[LineType]) List[Document][source]#

Combine lines with common metadata into chunks.

Parameters:

lines (List[LineType]) – Line of text / associated header metadata

Return type:

List[Document]

split_text(text: str) List[Document][source]#

Split markdown file.

Parameters:

text (str) – Markdown file

Return type:

List[Document]

Examples using MarkdownHeaderTextSplitter