langchain-text-splitters: 0.3.4#
Text Splitters are classes for splitting text.
Class hierarchy:
BaseDocumentTransformer --> TextSplitter --> <name>TextSplitter # Example: CharacterTextSplitter
RecursiveCharacterTextSplitter --> <name>TextSplitter
Note: MarkdownHeaderTextSplitter and **HTMLHeaderTextSplitter do not derive from TextSplitter.
Main helpers:
Document, Tokenizer, Language, LineType, HeaderType
base#
Classes
|
Enum of the programming languages. |
|
Interface for splitting text into chunks. |
|
Splitting text to tokens using model tokenizer. |
|
Tokenizer data class. |
Functions
|
Split incoming text and return chunks using tokenizer. |
character#
Classes
|
Splitting text that looks at characters. |
Splitting text by recursively look at characters. |
html#
Classes
Element type as typed dict. |
|
|
Splitting HTML files based on specified headers. |
|
Splitting HTML files based on specified tag and font sizes. |
|
json#
Classes
|
Splits JSON data into smaller, structured chunks while preserving hierarchy. |
konlpy#
Classes
|
Splitting text using Konlpy package. |
latex#
Classes
|
Attempts to split the text along Latex-formatted layout elements. |
markdown#
Classes
An experimental text splitter for handling Markdown syntax. |
|
Header type as typed dict. |
|
Line type as typed dict. |
|
|
Splitting markdown files based on specified headers. |
|
Attempts to split the text along Markdown-formatted headings. |
nltk#
Classes
|
Splitting text using NLTK package. |
python#
Classes
|
Attempts to split the text along Python syntax. |
sentence_transformers#
Classes
|
Splitting text to tokens using sentence model tokenizer. |
spacy#
Classes
|
Splitting text using Spacy package. |