AssemblyAIAudioTranscriptLoader#

class langchain_community.document_loaders.assemblyai.AssemblyAIAudioTranscriptLoader( file_path: str | Path, *, transcript_format: TranscriptFormat = TranscriptFormat.TEXT, config: assemblyai.TranscriptionConfig | None = None, api_key: str | None = None, )[source]#

Load AssemblyAI audio transcripts.

It uses the AssemblyAI API to transcribe audio files and loads the transcribed text into one or more Documents, depending on the specified format.

To use, you should have the assemblyai python package installed, and the environment variable ASSEMBLYAI_API_KEY set with your API key. Alternatively, the API key can also be passed as an argument.

Audio files can be specified via an URL or a local file path.

Initializes the AssemblyAI AudioTranscriptLoader.

Parameters:

file_path (Union[str, Path]) – An URL or a local file path.
transcript_format (TranscriptFormat) – Transcript format to use. See class TranscriptFormat for more info.
config (Optional[assemblyai.TranscriptionConfig]) – Transcription options and features. If None is given, the Transcriber’s default configuration will be used.
api_key (Optional[str]) – AssemblyAI API key.

Methods

`__init__`(file_path, *[, transcript_format, ...])	Initializes the AssemblyAI AudioTranscriptLoader.
`alazy_load`()	A lazy loader for Documents.
`aload`()	Load data into Document objects.
`lazy_load`()	Transcribes the audio file and loads the transcript into documents.
`load`()	Load data into Document objects.
`load_and_split`([text_splitter])	Load Documents and split into chunks.

__init__( file_path: str | Path, *, transcript_format: TranscriptFormat = TranscriptFormat.TEXT, config: assemblyai.TranscriptionConfig | None = None, api_key: str | None = None, )[source]#

Initializes the AssemblyAI AudioTranscriptLoader.

Parameters:

file_path (Union[str, Path]) – An URL or a local file path.
transcript_format (TranscriptFormat) – Transcript format to use. See class TranscriptFormat for more info.
config (Optional[assemblyai.TranscriptionConfig]) – Transcription options and features. If None is given, the Transcriber’s default configuration will be used.
api_key (Optional[str]) – AssemblyAI API key.

async alazy_load() → AsyncIterator[Document]#

A lazy loader for Documents.

Return type:: AsyncIterator[Document]

async aload() → list[Document]#

Load data into Document objects.

Return type:: list[Document]

lazy_load() → Iterator[Document][source]#

Transcribes the audio file and loads the transcript into documents.

It uses the AssemblyAI API to transcribe the audio file and blocks until the transcription is finished.

Return type:: Iterator[Document]

load() → list[Document]#

Load data into Document objects.

Return type:: list[Document]

load_and_split( text_splitter: TextSplitter | None = None, ) → list[Document]#

Load Documents and split into chunks. Chunks are returned as Documents.

Do not override this method. It should be considered to be deprecated!

Parameters:: text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
Returns:: List of Documents.
Return type:: list[Document]

Examples using AssemblyAIAudioTranscriptLoader