BoxLoader#
- class langchain_box.document_loaders.box.BoxLoader[source]#
Bases:
BaseLoader
,BaseModel
BoxLoader.
This class will help you load files from your Box instance. You must have a Box account. If you need one, you can sign up for a free developer account. You will also need a Box application created in the developer portal, where you can select your authorization type.
If you wish to use either of the Box AI options, you must be on an Enterprise Plus plan or above. The free developer account does not have access to Box AI.
In addition, using the Box AI API requires a few prerequisite steps:
Your administrator must enable the Box AI API
You must enable the
Manage AI
scope in your app in the developer console.Your administrator must install and enable your application.
- Setup:
Install
langchain-box
and set environment variableBOX_DEVELOPER_TOKEN
.pip install -U langchain-box export BOX_DEVELOPER_TOKEN="your-api-key"
This loader returns
Document
objects built from text representations of files in Box. It will skip any document without a text representation available. You can provide either aList[str]
containing Box file IDS, or you can provide astr
contining a Box folder ID. If providing a folder ID, you can also enable recursive mode to get the full tree under that folder.Note
A Box instance can contain Petabytes of files, and folders can contain millions of files. Be intentional when choosing what folders you choose to index. And we recommend never getting all files from folder 0 recursively. Folder ID 0 is your root folder.
Instantiate:
# Variable
Description
Type
Default
box_developer_token
Token to use for auth.
str
None
box_auth
client id for you app. Used for CCG
langchain_box.utilities.BoxAuth
None
box_file_ids
client id for you app. Used for CCG
List[str]
None
box_folder_id
client id for you app. Used for CCG
str
None
recursive
client id for you app. Used for CCG
Bool
False
character_limit
client id for you app. Used for CCG
int
-1
Get files — this method requires you pass the
box_file_ids
parameter. This is aList[str]
containing the file IDs you wish to index.from langchain_box.document_loaders import BoxLoader box_file_ids = ["1514555423624", "1514553902288"] loader = BoxLoader( box_file_ids=box_file_ids, character_limit=10000 # Optional. Defaults to no limit )
Get files in a folder — this method requires you pass the
box_folder_id
parameter. This is astr
containing the folder ID you wish to index.from langchain_box.document_loaders import BoxLoader box_folder_id = "260932470532" loader = BoxLoader( box_folder_id=box_folder_id, recursive=False # Optional. return entire tree, defaults to False )
- Load:
docs = loader.load() docs[0]
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/ internal_files/1514555423624/versions/1663171610024/representations /extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Document(metadata={'source': 'https://dl.boxcloud.com/api/2.0/ internal_files/1514555423624/versions/1663171610024/representations /extracted_text/content/', 'title': 'Invoice-A5555_txt'}, page_content='Vendor: AstroTech Solutions\nInvoice Number: A5555\n\nLine Items:\n - Gravitational Wave Detector Kit: $800\n - Exoplanet Terrarium: $120\nTotal: $920')
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- param box_developer_token: str | None [Optional]#
String containing the Box Developer Token generated in the developer console
- param box_file_ids: List[str] | None = None#
List[str] containing Box file ids
- param box_folder_id: str | None = None#
String containing box folder id to load files from
- param character_limit: int | None = -1#
character_limit is an int that caps the number of characters to return per document.
- param recursive: bool | None = False#
If getting files by folder id, recursive is a bool to determine if you wish to traverse subfolders to return child documents. Default is False
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]#
Load documents. Accepts no arguments. Returns Iterator[Document]
- Return type:
Iterator[Document]
- load_and_split(text_splitter: TextSplitter | None = None) list[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
list[Document]