S3FileLoader#

class langchain_community.document_loaders.s3_file.S3FileLoader(bucket: str, key: str, *, region_name: str | None = None, api_version: str | None = None, use_ssl: bool | None = True, verify: str | bool | None = None, endpoint_url: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, aws_session_token: str | None = None, boto_config: botocore.client.Config | None = None, mode: str = 'single', post_processors: List[Callable] | None = None, **unstructured_kwargs: Any)[source]#

Load from Amazon AWS S3 file.

Initialize with bucket and key name.

Parameters:
  • bucket (str) – The name of the S3 bucket.

  • key (str) – The key of the S3 object.

  • region_name (Optional[str]) – The name of the region associated with the client. A client is associated with a single region.

  • api_version (Optional[str]) – The API version to use. By default, botocore will use the latest API version when creating a client. You only need to specify this parameter if you want to use a previous API version of the client.

  • use_ssl (Optional[bool]) – Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.

  • verify (Union[str, bool, None]) –

    Whether or not to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:

    • False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • endpoint_url (Optional[str]) – The complete URL to use for the constructed client. Normally, botocore will automatically construct the appropriate URL to use when communicating with a service. You can specify a complete URL (including the β€œhttp/https” scheme) to override this behavior. If this value is provided, then use_ssl is ignored.

  • aws_access_key_id (Optional[str]) – The access key to use when creating the client. This is entirely optional, and if not provided, the credentials configured for the session will automatically be used. You only need to provide this argument if you want to override the credentials used for this specific client.

  • aws_secret_access_key (Optional[str]) – The secret key to use when creating the client. Same semantics as aws_access_key_id above.

  • aws_session_token (Optional[str]) – The session token to use when creating the client. Same semantics as aws_access_key_id above.

  • boto_config (botocore.client.Config) – Advanced boto3 client configuration options. If a value is specified in the client config, its value will take precedence over environment variables and configuration values, but not over a value passed explicitly to the method. If a default config object is set on the session, the config object used when creating the client will be the result of calling merge() on the default config with the config provided to this call.

  • mode (str) – Mode in which to read the file. Valid options are: single, paged and elements.

  • post_processors (Optional[List[Callable]]) – Post processing functions to be applied to extracted elements.

  • **unstructured_kwargs (Any) –

    Arbitrary additional kwargs to pass in when calling partition

Methods

__init__(bucket,Β key,Β *[,Β region_name,Β ...])

Initialize with bucket and key name.

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

lazy_load()

Load file.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(bucket: str, key: str, *, region_name: str | None = None, api_version: str | None = None, use_ssl: bool | None = True, verify: str | bool | None = None, endpoint_url: str | None = None, aws_access_key_id: str | None = None, aws_secret_access_key: str | None = None, aws_session_token: str | None = None, boto_config: botocore.client.Config | None = None, mode: str = 'single', post_processors: List[Callable] | None = None, **unstructured_kwargs: Any)[source]#

Initialize with bucket and key name.

Parameters:
  • bucket (str) – The name of the S3 bucket.

  • key (str) – The key of the S3 object.

  • region_name (Optional[str]) – The name of the region associated with the client. A client is associated with a single region.

  • api_version (Optional[str]) – The API version to use. By default, botocore will use the latest API version when creating a client. You only need to specify this parameter if you want to use a previous API version of the client.

  • use_ssl (Optional[bool]) – Whether or not to use SSL. By default, SSL is used. Note that not all services support non-ssl connections.

  • verify (Union[str, bool, None]) –

    Whether or not to verify SSL certificates. By default SSL certificates are verified. You can provide the following values:

    • False - do not validate SSL certificates. SSL will still be used (unless use_ssl is False), but SSL certificates will not be verified.

    • path/to/cert/bundle.pem - A filename of the CA cert bundle to uses. You can specify this argument if you want to use a different CA cert bundle than the one used by botocore.

  • endpoint_url (Optional[str]) – The complete URL to use for the constructed client. Normally, botocore will automatically construct the appropriate URL to use when communicating with a service. You can specify a complete URL (including the β€œhttp/https” scheme) to override this behavior. If this value is provided, then use_ssl is ignored.

  • aws_access_key_id (Optional[str]) – The access key to use when creating the client. This is entirely optional, and if not provided, the credentials configured for the session will automatically be used. You only need to provide this argument if you want to override the credentials used for this specific client.

  • aws_secret_access_key (Optional[str]) – The secret key to use when creating the client. Same semantics as aws_access_key_id above.

  • aws_session_token (Optional[str]) – The session token to use when creating the client. Same semantics as aws_access_key_id above.

  • boto_config (botocore.client.Config) – Advanced boto3 client configuration options. If a value is specified in the client config, its value will take precedence over environment variables and configuration values, but not over a value passed explicitly to the method. If a default config object is set on the session, the config object used when creating the client will be the result of calling merge() on the default config with the config provided to this call.

  • mode (str) – Mode in which to read the file. Valid options are: single, paged and elements.

  • post_processors (Optional[List[Callable]]) – Post processing functions to be applied to extracted elements.

  • **unstructured_kwargs (Any) –

    Arbitrary additional kwargs to pass in when calling partition

async alazy_load() β†’ AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() β†’ list[Document]#

Load data into Document objects.

Return type:

list[Document]

lazy_load() β†’ Iterator[Document]#

Load file.

Return type:

Iterator[Document]

load() β†’ list[Document]#

Load data into Document objects.

Return type:

list[Document]

load_and_split(text_splitter: TextSplitter | None = None) β†’ list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

list[Document]

Examples using S3FileLoader