GlueCatalogLoader#

class langchain_community.document_loaders.glue_catalog.GlueCatalogLoader(database: str, *, session: Session | None = None, profile_name: str | None = None, table_filter: List[str] | None = None)[source]#

Load table schemas from AWS Glue.

This loader fetches the schema of each table within a specified AWS Glue database. The schema details include column names and their data types, similar to pandas dtype representation.

AWS credentials are automatically loaded using boto3, following the standard AWS method: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

If a specific AWS profile is required, it can be specified and will be used to establish the session.

Initialize Glue database loader.

Parameters:
  • database (str) – The name of the Glue database from which to load table schemas.

  • session (Optional[Session]) – Optional. A boto3 Session object. If not provided, a new session will be created.

  • profile_name (Optional[str]) – Optional. The name of the AWS profile to use for credentials.

  • table_filter (Optional[List[str]]) – Optional. List of table names to fetch schemas for, fetching all if None.

Methods

__init__(database,Β *[,Β session,Β ...])

Initialize Glue database loader.

alazy_load()

A lazy loader for Documents.

aload()

Load data into Document objects.

lazy_load()

Lazily load table schemas as Document objects.

load()

Load data into Document objects.

load_and_split([text_splitter])

Load Documents and split into chunks.

__init__(database: str, *, session: Session | None = None, profile_name: str | None = None, table_filter: List[str] | None = None)[source]#

Initialize Glue database loader.

Parameters:
  • database (str) – The name of the Glue database from which to load table schemas.

  • session (Optional[Session]) – Optional. A boto3 Session object. If not provided, a new session will be created.

  • profile_name (Optional[str]) – Optional. The name of the AWS profile to use for credentials.

  • table_filter (Optional[List[str]]) – Optional. List of table names to fetch schemas for, fetching all if None.

async alazy_load() β†’ AsyncIterator[Document]#

A lazy loader for Documents.

Return type:

AsyncIterator[Document]

async aload() β†’ List[Document]#

Load data into Document objects.

Return type:

List[Document]

lazy_load() β†’ Iterator[Document][source]#

Lazily load table schemas as Document objects.

Yields:

Document objects, each representing the schema of a table.

Return type:

Iterator[Document]

load() β†’ List[Document]#

Load data into Document objects.

Return type:

List[Document]

load_and_split(text_splitter: TextSplitter | None = None) β†’ List[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:

text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.

Returns:

List of Documents.

Return type:

List[Document]

Examples using GlueCatalogLoader