Skip to main content
Ctrl+K
🦜🔗 LangChain  documentation - Home 🦜🔗 LangChain  documentation - Home
  • Reference
Ctrl+K
Docs
  • GitHub
  • X / Twitter
Ctrl+K
  • Reference
Docs
  • GitHub
  • X / Twitter

Section Navigation

Base packages

  • Core
  • Langchain
  • Text Splitters
    • base
      • Language
      • TextSplitter
      • TokenTextSplitter
      • Tokenizer
      • split_text_on_tokens
    • character
    • html
    • json
    • jsx
    • konlpy
    • latex
    • markdown
    • nltk
    • python
    • sentence_transformers
    • spacy
  • Community
  • Experimental

Integrations

  • AI21
  • Anthropic
  • AstraDB
  • AWS
  • Azure Ai
  • Azure Dynamic Sessions
  • Cerebras
  • Chroma
  • Cohere
  • Deepseek
  • Elasticsearch
  • Exa
  • Fireworks
  • Google Community
  • Google GenAI
  • Google VertexAI
  • Groq
  • Huggingface
  • IBM
  • Milvus
  • MistralAI
  • MongoDB
  • Neo4J
  • Nomic
  • Nvidia Ai Endpoints
  • Ollama
  • OpenAI
  • Perplexity
  • Pinecone
  • Postgres
  • Prompty
  • Qdrant
  • Redis
  • Sema4
  • Snowflake
  • Sqlserver
  • Standard Tests
  • Tavily
  • Together
  • Unstructured
  • Upstage
  • Weaviate
  • XAI
  • LangChain Python API Reference
  • langchain-text-splitters: 0.3.8
  • base

base#

Classes

base.Language(value)

Enum of the programming languages.

base.TextSplitter(chunk_size, chunk_overlap, ...)

Interface for splitting text into chunks.

base.TokenTextSplitter([encoding_name, ...])

Splitting text to tokens using model tokenizer.

base.Tokenizer(chunk_overlap, ...)

Tokenizer data class.

Functions

base.split_text_on_tokens(*, text, tokenizer)

Split incoming text and return chunks using tokenizer.

© Copyright 2025, LangChain Inc.