AzureOpenAIWhisperParser#
- class langchain_community.document_loaders.parsers.audio.AzureOpenAIWhisperParser(*, api_key: str | None = None, azure_endpoint: str | None = None, api_version: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, language: str | None = None, prompt: str | None = None, response_format: Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | None = None, temperature: float | None = None, deployment_name: str, max_retries: int = 3)[source]#
Transcribe and parse audio files using Azure OpenAI Whisper.
This parser integrates with the Azure OpenAI Whisper model to transcribe audio files. It differs from the standard OpenAI Whisper parser, requiring an Azure endpoint and credentials. The parser is limited to files under 25 MB.
Note: This parser uses the Azure OpenAI API, providing integration with the Azure
ecosystem, and making it suitable for workflows involving other Azure services.
For files larger than 25 MB, consider using Azure AI Speech batch transcription: https://learn.microsoft.com/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api#use-a-whisper-model
- Setup:
Follow the instructions here to deploy Azure Whisper: https://learn.microsoft.com/azure/ai-services/openai/whisper-quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python
Install
langchain
and set the following environment variables:
pip install -U langchain langchain-community export AZURE_OPENAI_API_KEY="your-api-key" export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/" export OPENAI_API_VERSION="your-api-version"
- Example Usage:
from langchain.community import AzureOpenAIWhisperParser whisper_parser = AzureOpenAIWhisperParser( deployment_name="your-whisper-deployment", api_version="2024-06-01", api_key="your-api-key", # other params... ) audio_blob = Blob(path="your-audio-file-path") response = whisper_parser.lazy_parse(audio_blob) for document in response: print(document.page_content)
- Integration with Other Loaders:
The AzureOpenAIWhisperParser can be used with video/audio loaders and GenericLoader to automate retrieval and parsing.
- YoutubeAudioLoader Example:
from langchain_community.document_loaders.blob_loaders import ( YoutubeAudioLoader ) from langchain_community.document_loaders.generic import GenericLoader # Must be a list youtube_url = ["https://your-youtube-url"] save_dir = "directory-to-download-videos" loader = GenericLoader( YoutubeAudioLoader(youtube_url, save_dir), AzureOpenAIWhisperParser(deployment_name="your-deployment-name") ) docs = loader.load()
Initialize the AzureOpenAIWhisperParser.
- Parameters:
api_key (Optional[str]) – Azure OpenAI API key. If not provided, defaults to the AZURE_OPENAI_API_KEY environment variable.
azure_endpoint (Optional[str]) – Azure OpenAI service endpoint. Defaults to AZURE_OPENAI_ENDPOINT environment variable if not set.
api_version (Optional[str]) – API version to use, defaults to the OPENAI_API_VERSION environment variable.
azure_ad_token_provider (Union[Callable[[], str], None]) – Azure Active Directory token for authentication (if applicable).
language (Optional[str]) – Language in which the request should be processed.
prompt (Optional[str]) – Custom instructions or prompt for the Whisper model.
response_format (Union[str, None]) – The desired output format. Options: “json”, “text”, “srt”, “verbose_json”, “vtt”.
temperature (Optional[float]) – Controls the randomness of the model’s output.
deployment_name (str) – The deployment name of the Whisper model.
max_retries (int) – Maximum number of retries for failed API requests.
- Raises:
ImportError – If the required package openai is not installed.
Methods
__init__
(*[, api_key, azure_endpoint, ...])Initialize the AzureOpenAIWhisperParser.
lazy_parse
(blob)Lazily parse the provided audio blob for transcription.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(*, api_key: str | None = None, azure_endpoint: str | None = None, api_version: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, language: str | None = None, prompt: str | None = None, response_format: Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | None = None, temperature: float | None = None, deployment_name: str, max_retries: int = 3)[source]#
Initialize the AzureOpenAIWhisperParser.
- Parameters:
api_key (Optional[str]) – Azure OpenAI API key. If not provided, defaults to the AZURE_OPENAI_API_KEY environment variable.
azure_endpoint (Optional[str]) – Azure OpenAI service endpoint. Defaults to AZURE_OPENAI_ENDPOINT environment variable if not set.
api_version (Optional[str]) – API version to use, defaults to the OPENAI_API_VERSION environment variable.
azure_ad_token_provider (Union[Callable[[], str], None]) – Azure Active Directory token for authentication (if applicable).
language (Optional[str]) – Language in which the request should be processed.
prompt (Optional[str]) – Custom instructions or prompt for the Whisper model.
response_format (Union[str, None]) – The desired output format. Options: “json”, “text”, “srt”, “verbose_json”, “vtt”.
temperature (Optional[float]) – Controls the randomness of the model’s output.
deployment_name (str) – The deployment name of the Whisper model.
max_retries (int) – Maximum number of retries for failed API requests.
- Raises:
ImportError – If the required package openai is not installed.
- lazy_parse(blob: Blob) Iterator[Document] [source]#
Lazily parse the provided audio blob for transcription.