AzureOpenAIWhisperParser#

class langchain_community.document_loaders.parsers.audio.AzureOpenAIWhisperParser(*, api_key: str | None = None, azure_endpoint: str | None = None, api_version: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, language: str | None = None, prompt: str | None = None, response_format: Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | None = None, temperature: float | None = None, deployment_name: str, max_retries: int = 3)[source]#

Transcribe and parse audio files using Azure OpenAI Whisper.

This parser integrates with the Azure OpenAI Whisper model to transcribe audio files. It differs from the standard OpenAI Whisper parser, requiring an Azure endpoint and credentials. The parser is limited to files under 25 MB.

Note: This parser uses the Azure OpenAI API, providing integration with the Azure

ecosystem, and making it suitable for workflows involving other Azure services.

For files larger than 25 MB, consider using Azure AI Speech batch transcription: https://learn.microsoft.com/azure/ai-services/speech-service/batch-transcription-create?pivots=rest-api#use-a-whisper-model

Setup:
  1. Follow the instructions here to deploy Azure Whisper: https://learn.microsoft.com/azure/ai-services/openai/whisper-quickstart?tabs=command-line%2Cpython-new&pivots=programming-language-python

  2. Install langchain and set the following environment variables:

pip install -U langchain langchain-community

export AZURE_OPENAI_API_KEY="your-api-key"
export AZURE_OPENAI_ENDPOINT="https://your-endpoint.openai.azure.com/"
export OPENAI_API_VERSION="your-api-version"
Example Usage:
from langchain.community import AzureOpenAIWhisperParser

whisper_parser = AzureOpenAIWhisperParser(
    deployment_name="your-whisper-deployment",
    api_version="2024-06-01",
    api_key="your-api-key",
    # other params...
)

audio_blob = Blob(path="your-audio-file-path")
response = whisper_parser.lazy_parse(audio_blob)

for document in response:
    print(document.page_content)
Integration with Other Loaders:

The AzureOpenAIWhisperParser can be used with video/audio loaders and GenericLoader to automate retrieval and parsing.

YoutubeAudioLoader Example:
from langchain_community.document_loaders.blob_loaders import (
    YoutubeAudioLoader
    )
from langchain_community.document_loaders.generic import GenericLoader

# Must be a list
youtube_url = ["https://your-youtube-url"]
save_dir = "directory-to-download-videos"

loader = GenericLoader(
    YoutubeAudioLoader(youtube_url, save_dir),
    AzureOpenAIWhisperParser(deployment_name="your-deployment-name")
)

docs = loader.load()

Initialize the AzureOpenAIWhisperParser.

Parameters:
  • api_key (Optional[str]) – Azure OpenAI API key. If not provided, defaults to the AZURE_OPENAI_API_KEY environment variable.

  • azure_endpoint (Optional[str]) – Azure OpenAI service endpoint. Defaults to AZURE_OPENAI_ENDPOINT environment variable if not set.

  • api_version (Optional[str]) – API version to use, defaults to the OPENAI_API_VERSION environment variable.

  • azure_ad_token_provider (Union[Callable[[], str], None]) – Azure Active Directory token for authentication (if applicable).

  • language (Optional[str]) – Language in which the request should be processed.

  • prompt (Optional[str]) – Custom instructions or prompt for the Whisper model.

  • response_format (Union[str, None]) – The desired output format. Options: “json”, “text”, “srt”, “verbose_json”, “vtt”.

  • temperature (Optional[float]) – Controls the randomness of the model’s output.

  • deployment_name (str) – The deployment name of the Whisper model.

  • max_retries (int) – Maximum number of retries for failed API requests.

Raises:

ImportError – If the required package openai is not installed.

Methods

__init__(*[, api_key, azure_endpoint, ...])

Initialize the AzureOpenAIWhisperParser.

lazy_parse(blob)

Lazily parse the provided audio blob for transcription.

parse(blob)

Eagerly parse the blob into a document or documents.

__init__(*, api_key: str | None = None, azure_endpoint: str | None = None, api_version: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, language: str | None = None, prompt: str | None = None, response_format: Literal['json', 'text', 'srt', 'verbose_json', 'vtt'] | None = None, temperature: float | None = None, deployment_name: str, max_retries: int = 3)[source]#

Initialize the AzureOpenAIWhisperParser.

Parameters:
  • api_key (Optional[str]) – Azure OpenAI API key. If not provided, defaults to the AZURE_OPENAI_API_KEY environment variable.

  • azure_endpoint (Optional[str]) – Azure OpenAI service endpoint. Defaults to AZURE_OPENAI_ENDPOINT environment variable if not set.

  • api_version (Optional[str]) – API version to use, defaults to the OPENAI_API_VERSION environment variable.

  • azure_ad_token_provider (Union[Callable[[], str], None]) – Azure Active Directory token for authentication (if applicable).

  • language (Optional[str]) – Language in which the request should be processed.

  • prompt (Optional[str]) – Custom instructions or prompt for the Whisper model.

  • response_format (Union[str, None]) – The desired output format. Options: “json”, “text”, “srt”, “verbose_json”, “vtt”.

  • temperature (Optional[float]) – Controls the randomness of the model’s output.

  • deployment_name (str) – The deployment name of the Whisper model.

  • max_retries (int) – Maximum number of retries for failed API requests.

Raises:

ImportError – If the required package openai is not installed.

lazy_parse(blob: Blob) Iterator[Document][source]#

Lazily parse the provided audio blob for transcription.

Parameters:

blob (Blob) – The audio file in Blob format to be transcribed.

Yields:

Document – Parsed transcription from the audio file.

Raises:

Exception – If an error occurs during transcription.

Return type:

Iterator[Document]

parse(blob: Blob) list[Document]#

Eagerly parse the blob into a document or documents.

This is a convenience method for interactive development environment.

Production applications should favor the lazy_parse method instead.

Subclasses should generally not over-ride this parse method.

Parameters:

blob (Blob) – Blob instance

Returns:

List of documents

Return type:

list[Document]