BoxLoader#

class langchain_box.document_loaders.box.BoxLoader[source]#

Bases: BaseLoader, BaseModel

BoxLoader.

This class will help you load files from your Box instance. You must have a Box account. If you need one, you can sign up for a free developer account. You will also need a Box application created in the developer portal, where you can select your authorization type.

If you wish to use either of the Box AI options, you must be on an Enterprise Plus plan or above. The free developer account does not have access to Box AI.

In addition, using the Box AI API requires a few prerequisite steps:

  • Your administrator must enable the Box AI API

  • You must enable the Manage AI scope in your app in the developer console.

  • Your administrator must install and enable your application.

Setup:

Install langchain-box and set environment variable BOX_DEVELOPER_TOKEN.

pip install -U langchain-box
export BOX_DEVELOPER_TOKEN="your-api-key"

This loader returns Document objects built from text representations of files in Box. It will skip any document without a text representation available. You can provide either a List[str] containing Box file IDS, or you can provide a str contining a Box folder ID. If providing a folder ID, you can also enable recursive mode to get the full tree under that folder.

Note

A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.

Instantiate:

Initialization variables#

Variable

Description

Type

Default

box_developer_token

Token to use for auth.

str

None

box_auth

client id for you app. Used for CCG

langchain_box.utilities.BoxAuth

None

box_file_ids

client id for you app. Used for CCG

List[str]

None

box_folder_id

client id for you app. Used for CCG

str

None

recursive

client id for you app. Used for CCG

Bool

False

character_limit

client id for you app. Used for CCG

int

-1

Get files — this method requires you pass the box_file_ids parameter. This is a List[str] containing the file IDs you wish to index.

from langchain_box.document_loaders import BoxLoader

box_file_ids = ["1514555423624", "1514553902288"]

loader = BoxLoader(
    box_file_ids=box_file_ids,
    character_limit=10000  # Optional. Defaults to no limit
)

Get files in a folder — this method requires you pass the box_folder_id parameter. This is a str containing the folder ID you wish to index.

from langchain_box.document_loaders import BoxLoader

box_folder_id = "260932470532"

loader = BoxLoader(
    box_folder_id=box_folder_id,
    recursive=False  # Optional. return entire tree, defaults to False
)
Load:
docs = loader.load()
docs[0]
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/
internal_files/1514555423624/versions/1663171610024/representations
/extracted_text/content/', 'title': 'Invoice-A5555_txt'},
page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine
Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet
Terrarium: $120\nTotal: $920')
Lazy load:
docs = []
docs_lazy = loader.lazy_load()

for doc in docs_lazy:
    docs.append(doc)
print(docs[0].page_content[:100])
print(docs[0].metadata)
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/
internal_files/1514555423624/versions/1663171610024/representations
/extracted_text/content/', 'title': 'Invoice-A5555_txt'},
page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine
Items:\n    - Gravitational Wave Detector Kit: $800\n    - Exoplanet
Terrarium: $120\nTotal: $920')

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

param box_auth: BoxAuth | None = None#

Configured BoxAuth object

param box_developer_token: str | None [Optional]#

String containing the Box Developer Token generated in the developer console

param box_file_ids: List[str] | None = None#

List[str] containing Box file ids

param box_folder_id: str | None = None#

String containing box folder id to load files from

param character_limit: int | None = -1#

character_limit is an int that caps the number of characters to return per document.

param recursive: bool | None = False#

If getting files by folder id, recursive is a bool to determine if you wish to traverse subfolders to return child documents. Default is False

async alazy_load() AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() list[Document]#

Load data into Document objects.

Return type:

list[Document]

lazy_load() Iterator[Document][source]#

Load documents. Accepts no arguments. Returns Iterator[Document]

Return type:

Iterator[Document]

load() list[Document]#

Load data into Document objects.

Return type:

list[Document]

load_and_split(text_splitter: TextSplitter | None = None) list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

list[Document]