RecursiveJsonSplitter#

class langchain_text_splitters.json.RecursiveJsonSplitter(max_chunk_size: int = 2000, min_chunk_size: int | None = None)[source]#

Splits JSON data into smaller, structured chunks while preserving hierarchy.

This class provides methods to split JSON data into smaller dictionaries or JSON-formatted strings based on configurable maximum and minimum chunk sizes. It supports nested JSON structures, optionally converts lists into dictionaries for better chunking, and allows the creation of document objects for further use.

max_chunk_size#

The maximum size for each chunk. Defaults to 2000.

Type:

int

min_chunk_size#

The minimum size for each chunk, derived from max_chunk_size if not explicitly provided.

Type:

int

Initialize the chunk size configuration for text processing.

This constructor sets up the maximum and minimum chunk sizes, ensuring that the min_chunk_size defaults to a value slightly smaller than the max_chunk_size if not explicitly provided.

Parameters:
  • max_chunk_size (int) – The maximum size for a chunk. Defaults to 2000.

  • min_chunk_size (Optional[int]) – The minimum size for a chunk. If None, defaults to the maximum chunk size minus 200, with a lower bound of 50.

max_chunk_size#

The configured maximum size for each chunk.

Type:

int

min_chunk_size#

The configured minimum size for each chunk, derived from max_chunk_size if not explicitly provided.

Type:

int

Methods

__init__([max_chunk_size, min_chunk_size])

Initialize the chunk size configuration for text processing.

create_documents(texts[, convert_lists, ...])

Create documents from a list of json objects (Dict).

split_json(json_data[, convert_lists])

Splits JSON into a list of JSON chunks.

split_text(json_data[, convert_lists, ...])

Splits JSON into a list of JSON formatted strings.

__init__(max_chunk_size: int = 2000, min_chunk_size: int | None = None)[source]#

Initialize the chunk size configuration for text processing.

This constructor sets up the maximum and minimum chunk sizes, ensuring that the min_chunk_size defaults to a value slightly smaller than the max_chunk_size if not explicitly provided.

Parameters:
  • max_chunk_size (int) – The maximum size for a chunk. Defaults to 2000.

  • min_chunk_size (Optional[int]) – The minimum size for a chunk. If None, defaults to the maximum chunk size minus 200, with a lower bound of 50.

max_chunk_size#

The configured maximum size for each chunk.

Type:

int

min_chunk_size#

The configured minimum size for each chunk, derived from max_chunk_size if not explicitly provided.

Type:

int

create_documents(texts: List[Dict], convert_lists: bool = False, ensure_ascii: bool = True, metadatas: List[dict] | None = None) List[Document][source]#

Create documents from a list of json objects (Dict).

Parameters:
  • texts (List[Dict])

  • convert_lists (bool)

  • ensure_ascii (bool)

  • metadatas (List[dict] | None)

Return type:

List[Document]

split_json(json_data: Dict[str, Any], convert_lists: bool = False) List[Dict][source]#

Splits JSON into a list of JSON chunks.

Parameters:
  • json_data (Dict[str, Any])

  • convert_lists (bool)

Return type:

List[Dict]

split_text(json_data: Dict[str, Any], convert_lists: bool = False, ensure_ascii: bool = True) List[str][source]#

Splits JSON into a list of JSON formatted strings.

Parameters:
  • json_data (Dict[str, Any])

  • convert_lists (bool)

  • ensure_ascii (bool)

Return type:

List[str]

Examples using RecursiveJsonSplitter