AzureAIDocumentIntelligenceLoader#
- class langchain_community.document_loaders.doc_intelligence.AzureAIDocumentIntelligenceLoader(api_endpoint: str, api_key: str, file_path: str | None = None, url_path: str | None = None, bytes_source: bytes | None = None, api_version: str | None = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown', *, analysis_features: List[str] | None = None)[source]#
Load a PDF with Azure Document Intelligence.
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:#
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- bytes_sourceOptional[bytes]
The bytes array of the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from azure-ai-documentintelligence package.
- api_model: str
Unique document model name. Default value is “prebuilt-layout”. Note that overriding this default value may result in unsupported behavior.
- mode: Optional[str]
The type of content representation of the generated Documents. Use either “single”, “page”, or “markdown”. Default value is “markdown”.
- analysis_features: Optional[List[str]]
List of optional analysis features, each feature should be passed as a str that conforms to the enum DocumentAnalysisFeature in azure-ai-documentintelligence package. Default value is None.
Examples:#
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... api_model="prebuilt-layout", ... mode="markdown" ... )
Methods
__init__
(api_endpoint, api_key[, file_path, ...])Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
A lazy loader for Documents.
aload
()Load data into Document objects.
Lazy load the document as pages.
load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(api_endpoint: str, api_key: str, file_path: str | None = None, url_path: str | None = None, bytes_source: bytes | None = None, api_version: str | None = None, api_model: str = 'prebuilt-layout', mode: str = 'markdown', *, analysis_features: List[str] | None = None) None [source]#
Initialize the object for file processing with Azure Document Intelligence (formerly Form Recognizer).
This constructor initializes a AzureAIDocumentIntelligenceParser object to be used for parsing files using the Azure Document Intelligence API. The load method generates Documents whose content representations are determined by the mode parameter.
Parameters:#
- api_endpoint: str
The API endpoint to use for DocumentIntelligenceClient construction.
- api_key: str
The API key to use for DocumentIntelligenceClient construction.
- file_pathOptional[str]
The path to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- url_pathOptional[str]
The URL to the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- bytes_sourceOptional[bytes]
The bytes array of the file that needs to be loaded. Either file_path, url_path or bytes_source must be specified.
- api_version: Optional[str]
The API version for DocumentIntelligenceClient. Setting None to use the default value from azure-ai-documentintelligence package.
- api_model: str
Unique document model name. Default value is “prebuilt-layout”. Note that overriding this default value may result in unsupported behavior.
- mode: Optional[str]
The type of content representation of the generated Documents. Use either “single”, “page”, or “markdown”. Default value is “markdown”.
- analysis_features: Optional[List[str]]
List of optional analysis features, each feature should be passed as a str that conforms to the enum DocumentAnalysisFeature in azure-ai-documentintelligence package. Default value is None.
Examples:#
>>> obj = AzureAIDocumentIntelligenceLoader( ... file_path="path/to/file", ... api_endpoint="https://endpoint.azure.com", ... api_key="APIKEY", ... api_version="2023-10-31-preview", ... api_model="prebuilt-layout", ... mode="markdown" ... )
- Parameters:
api_endpoint (str)
api_key (str)
file_path (str | None)
url_path (str | None)
bytes_source (bytes | None)
api_version (str | None)
api_model (str)
mode (str)
analysis_features (List[str] | None)
- Return type:
None
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]#
Lazy load the document as pages.
- Return type:
Iterator[Document]
- load_and_split(text_splitter: TextSplitter | None = None) list[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
list[Document]
- Parameters:
api_endpoint (str)
api_key (str)
file_path (str | None)
url_path (str | None)
bytes_source (bytes | None)
api_version (str | None)
api_model (str)
mode (str)
analysis_features (List[str] | None)
Examples using AzureAIDocumentIntelligenceLoader