FasterWhisperParser#
- class langchain_community.document_loaders.parsers.audio.FasterWhisperParser(*, device: str | None = 'cuda', model_size: str | None = None)[source]#
Transcribe and parse audio files with faster-whisper.
faster-whisper is a reimplementation of OpenAI’s Whisper model using CTranslate2, which is up to 4 times faster than openai/whisper for the same accuracy while using less memory. The efficiency can be further improved with 8-bit quantization on both CPU and GPU.
It can automatically detect the following 14 languages and transcribe the text into their respective languages: en, zh, fr, de, ja, ko, ru, es, th, it, pt, vi, ar, tr.
The gitbub repository for faster-whisper is : SYSTRAN/faster-whisper
- Example: Load a YouTube video and transcribe the video speech into a document.
from langchain.document_loaders.generic import GenericLoader from langchain_community.document_loaders.parsers.audio import FasterWhisperParser from langchain.document_loaders.blob_loaders.youtube_audio import YoutubeAudioLoader url="https://www.youtube.com/watch?v=your_video" save_dir="your_dir/" loader = GenericLoader( YoutubeAudioLoader([url],save_dir), FasterWhisperParser() ) docs = loader.load()
Initialize the parser.
- Parameters:
device (str | None) – It can be “cuda” or “cpu” based on the available device.
model_size (str | None) – There are four model sizes to choose from: “base”, “small”, “medium”, and “large-v3”, based on the available GPU memory.
Methods
__init__
(*[, device, model_size])Initialize the parser.
lazy_parse
(blob)Lazily parse the blob.
parse
(blob)Eagerly parse the blob into a document or documents.
- __init__(*, device: str | None = 'cuda', model_size: str | None = None)[source]#
Initialize the parser.
- Parameters:
device (str | None) – It can be “cuda” or “cpu” based on the available device.
model_size (str | None) – There are four model sizes to choose from: “base”, “small”, “medium”, and “large-v3”, based on the available GPU memory.