JSFrameworkTextSplitter#
- class langchain_text_splitters.jsx.JSFrameworkTextSplitter(
- separators: list[str] | None = None,
- chunk_size: int = 2000,
- chunk_overlap: int = 0,
- **kwargs: Any,
Text splitter that handles React (JSX), Vue, and Svelte code.
This splitter extends RecursiveCharacterTextSplitter to handle React (JSX), Vue, and Svelte code by:
Detecting and extracting custom component tags from the text
Using those tags as additional separators along with standard JS syntax
The splitter combines:
Custom component tags as separators (e.g. <Component, <div)
JavaScript syntax elements (function, const, if, etc)
Standard text splitting on newlines
This allows chunks to break at natural boundaries in React, Vue, and Svelte component code.
Initialize the JS Framework text splitter.
- Parameters:
separators (list[str] | None) β Optional list of custom separator strings to use
chunk_size (int) β Maximum size of chunks to return
chunk_overlap (int) β Overlap in characters between chunks
**kwargs (Any) β Additional arguments to pass to parent class
Methods
__init__
([separators,Β chunk_size,Β chunk_overlap])Initialize the JS Framework text splitter.
atransform_documents
(documents,Β **kwargs)Asynchronously transform a list of documents.
create_documents
(texts[,Β metadatas])Create documents from a list of texts.
from_huggingface_tokenizer
(tokenizer,Β **kwargs)Text splitter that uses HuggingFace tokenizer to count length.
from_language
(language,Β **kwargs)Return an instance of this class based on a specific language.
from_tiktoken_encoder
([encoding_name,Β ...])Text splitter that uses tiktoken encoder to count length.
get_separators_for_language
(language)Retrieve a list of separators specific to the given language.
split_documents
(documents)Split documents.
split_text
(text)Split text into chunks.
transform_documents
(documents,Β **kwargs)Transform sequence of documents by splitting them.
- __init__(
- separators: list[str] | None = None,
- chunk_size: int = 2000,
- chunk_overlap: int = 0,
- **kwargs: Any,
Initialize the JS Framework text splitter.
- Parameters:
separators (list[str] | None) β Optional list of custom separator strings to use
chunk_size (int) β Maximum size of chunks to return
chunk_overlap (int) β Overlap in characters between chunks
**kwargs (Any) β Additional arguments to pass to parent class
- Return type:
None
- async atransform_documents(
- documents: Sequence[Document],
- **kwargs: Any,
Asynchronously transform a list of documents.
- create_documents(
- texts: list[str],
- metadatas: list[dict[Any, Any]] | None = None,
Create documents from a list of texts.
- Parameters:
texts (list[str])
metadatas (list[dict[Any, Any]] | None)
- Return type:
list[Document]
- classmethod from_huggingface_tokenizer(
- tokenizer: Any,
- **kwargs: Any,
Text splitter that uses HuggingFace tokenizer to count length.
- Parameters:
tokenizer (Any)
kwargs (Any)
- Return type:
- classmethod from_language(
- language: Language,
- **kwargs: Any,
Return an instance of this class based on a specific language.
This method initializes the text splitter with language-specific separators.
- Parameters:
language (Language) β The language to configure the text splitter for.
**kwargs (Any) β Additional keyword arguments to customize the splitter.
- Returns:
An instance of the text splitter configured for the specified language.
- Return type:
- classmethod from_tiktoken_encoder(
- encoding_name: str = 'gpt2',
- model_name: str | None = None,
- allowed_special: Literal['all'] | AbstractSet[str] = {},
- disallowed_special: Literal['all'] | Collection[str] = 'all',
- **kwargs: Any,
Text splitter that uses tiktoken encoder to count length.
- Parameters:
encoding_name (str)
model_name (Optional[str])
allowed_special (Union[Literal['all'], AbstractSet[str]])
disallowed_special (Union[Literal['all'], Collection[str]])
kwargs (Any)
- Return type:
Self
- static get_separators_for_language(
- language: Language,
Retrieve a list of separators specific to the given language.
- Parameters:
language (Language) β The language for which to get the separators.
- Returns:
A list of separators appropriate for the specified language.
- Return type:
List[str]
- split_text(text: str) list[str] [source]#
Split text into chunks.
This method splits the text into chunks by:
Extracting unique opening component tags using regex
Creating separators list with extracted tags and JS separators
Splitting the text using the separators by calling the parent class method
- Parameters:
text (str) β String containing code to split
- Returns:
List of text chunks split on component and JS boundaries
- Return type:
list[str]