S3DirectoryLoader#

class langchain_community.document_loaders.s3_directory.S3DirectoryLoader(bucket: str, prefix: str = '', *, region_name: str | None = None, api_version: str | None = None, use_ssl: bool | None = True, verify: str | bool | None = None, endpoint_url: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, aws_session_token: str | None = None, boto_config: botocore.client.Config | None = None)[source]#

Load from Amazon AWS S3 directory.

Initialize with bucket and key name.

Parameters:
  • bucket (str) – The name of the S3 bucket.

  • prefix (str) – The prefix of the S3 key. Defaults to “”.

  • region_name (Optional[str]) – The name of the region associated with the client. A client is associated with a single region.

  • api_version (Optional[str]) – The API version to use. By default, botocore will use the latest API version when creating a client. You only need to specify this parameter if you want to use a previous API version of the client.

  • use_ssl (Optional[bool]) – Whether to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.

  • verify (Union[str, bool, None]) –

    Whether to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:

    • False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • endpoint_url (Optional[str]) – The complete URL to use for the constructed client. Normally, botocore will automatically construct the appropriate URL to use when communicating with a service. You can specify a complete URL (including the “http/https” scheme) to override this behavior. If this value is provided, then use_ssl is ignored.

  • aws_access_key_id (Optional[str]) – The access key to use when creating the client. This is entirely optional, and if not provided, the credentials configured for the session will automatically be used. You only need to provide this argument if you want to override the credentials used for this specific client.

  • aws_secret_access_key (Optional[str]) – The secret key to use when creating the client. Same semantics as aws_access_key_id above.

  • aws_session_token (Optional[str]) – The session token to use when creating the client. Same semantics as aws_access_key_id above.

  • boto_config (botocore.client.Config) – Advanced boto3 client configuration options. If a value is specified in the client config, its value will take precedence over environment variables and configuration values, but not over a value passed explicitly to the method. If a default config object is set on the session, the config object used when creating the client will be the result of calling merge() on the default config with the config provided to this call.

Methods

__init__(bucket[, prefix, region_name, ...])

Initialize with bucket and key name.

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

lazy_load()

A lazy loader for Documents.

load()

Load documents.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(bucket: str, prefix: str = '', *, region_name: str | None = None, api_version: str | None = None, use_ssl: bool | None = True, verify: str | bool | None = None, endpoint_url: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, aws_session_token: str | None = None, boto_config: botocore.client.Config | None = None)[source]#

Initialize with bucket and key name.

Parameters:
  • bucket (str) – The name of the S3 bucket.

  • prefix (str) – The prefix of the S3 key. Defaults to “”.

  • region_name (Optional[str]) – The name of the region associated with the client. A client is associated with a single region.

  • api_version (Optional[str]) – The API version to use. By default, botocore will use the latest API version when creating a client. You only need to specify this parameter if you want to use a previous API version of the client.

  • use_ssl (Optional[bool]) – Whether to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.

  • verify (Union[str, bool, None]) –

    Whether to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:

    • False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • endpoint_url (Optional[str]) – The complete URL to use for the constructed client. Normally, botocore will automatically construct the appropriate URL to use when communicating with a service. You can specify a complete URL (including the “http/https” scheme) to override this behavior. If this value is provided, then use_ssl is ignored.

  • aws_access_key_id (Optional[str]) – The access key to use when creating the client. This is entirely optional, and if not provided, the credentials configured for the session will automatically be used. You only need to provide this argument if you want to override the credentials used for this specific client.

  • aws_secret_access_key (Optional[str]) – The secret key to use when creating the client. Same semantics as aws_access_key_id above.

  • aws_session_token (Optional[str]) – The session token to use when creating the client. Same semantics as aws_access_key_id above.

  • boto_config (botocore.client.Config) – Advanced boto3 client configuration options. If a value is specified in the client config, its value will take precedence over environment variables and configuration values, but not over a value passed explicitly to the method. If a default config object is set on the session, the config object used when creating the client will be the result of calling merge() on the default config with the config provided to this call.

async alazy_load() AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() List[Document]#

Load data into Document objects.

Return type:

List[Document]

lazy_load() Iterator[Document]#

A lazy loader for Documents.

Return type:

Iterator[Document]

load() List[Document][source]#

Load documents.

Return type:

List[Document]

load_and_split(text_splitter: TextSplitter | None = None) List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

List[Document]

Examples using S3DirectoryLoader