SharePointLoader#

class langchain_community.document_loaders.sharepoint.SharePointLoader[source]#

Bases: O365BaseLoader, BaseLoader

Load from SharePoint.

param auth_with_token: bool = False#: Whether to authenticate with a token or not. Defaults to False.

param chunk_size: int | str = 5242880#: Number of bytes to retrieve from each api call to the server. int or ‘auto’.

param document_library_id: str [Required]#: The ID of the SharePoint document library to load data from.

param folder_id: str | None = None#: The ID of the folder to load data from.

param folder_path: str | None = None#: The path to the folder to load data from.

param handlers: Dict[str, Any] | None = {}#

Provide custom handlers for MimeTypeBasedParser.

Pass a dictionary mapping either file extensions (like “doc”, “pdf”, etc.) or MIME types (like “application/pdf”, “text/plain”, etc.) to parsers. Note that you must use either file extensions or MIME types exclusively and cannot mix them.

Do not include the leading dot for file extensions.

Example using file extensions: ```python

handlers = {
“doc”: MsWordParser(), “pdf”: PDFMinerParser(), “txt”: TextParser()

}

```

Example using MIME types: ```python

handlers = {
“application/msword”: MsWordParser(), “application/pdf”: PDFMinerParser(), “text/plain”: TextParser()

}

```

param load_auth: bool | None = False#: Whether to load authorization identities.

param load_extended_metadata: bool | None = False#: Whether to load extended metadata. Size, Owner and full_path.

param modified_since: datetime | None = None#: Only fetch documents modified since given datetime. The datetime object must be timezone aware.

param object_ids: List[str] | None = None#: The IDs of the objects to load data from.

param recursive: bool = False#: Should the loader recursively load subfolders?

param settings: _O365Settings [Optional]#: Settings for the Office365 API client.

param token_path: Path = PosixPath('/home/runner/.credentials/o365_token.txt')#: The path to the token to make api calls

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

Yields:: the documents.
Return type:: AsyncIterator[Document]

async aload() → list[Document]#

Load data into Document objects.

Returns:: the documents.
Return type:: list[Document]

authorized_identities( file_id: str, ) → List[source]#

Retrieve the access identities (user/group emails) for a given file. :param file_id: The ID of the file. :type file_id: str

Returns:

A list of group names (email addresses) that have: access to the file.

Return type:

List

Parameters:

file_id (str)

get_extended_metadata( file_id: str, ) → Dict[source]#

Retrieve extended metadata for a file in SharePoint. As of today, following fields are supported in the extended metadata: - size: size of the source file. - owner: display name of the owner of the source file. - full_path: pretty human readable path of the source file. :param file_id: The ID of the file. :type file_id: str

Returns:

A dictionary containing the extended metadata of the file,: including size, owner, and full path.

Return type:

dict

Parameters:

file_id (str)

lazy_load() → Iterator[Document][source]#

Load documents lazily. Use this when working at a large scale. :Yields: Document – A document object representing the parsed blob.

Return type:: Iterator[Document]

load() → list[Document]#

Load data into Document objects.

Returns:: the documents.
Return type:: list[Document]

load_and_split( text_splitter: TextSplitter | None = None, ) → list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Raises:: ImportError – If langchain-text-splitters is not installed and no text_splitter is provided.
Returns:: List of Documents.
Return type:: list[Document]

Examples using SharePointLoader