Image captions
By default, the loader utilizes the pre-trained Salesforce BLIP image captioning model.
This notebook shows how to use the ImageCaptionLoader
to generate a queryable index of image captions.
%pip install -qU transformers langchain_openai langchain_chroma
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()
Prepare a list of image urls from Wikimediaโ
from langchain_community.document_loaders import ImageCaptionLoader
list_image_urls = [
"https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg",
"https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg",
]
API Reference:ImageCaptionLoader
Create the loaderโ
loader = ImageCaptionLoader(images=list_image_urls)
list_docs = loader.load()
list_docs
[Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/e/ec/Ara_ararauna_Luc_Viatour.jpg/1554px-Ara_ararauna_Luc_Viatour.jpg'}, page_content='an image of a bird flying in the air [SEP]'),
Document(metadata={'image_path': 'https://upload.wikimedia.org/wikipedia/commons/thumb/0/0c/1928_Model_A_Ford.jpg/640px-1928_Model_A_Ford.jpg'}, page_content='an image of a vintage car parked on the street [SEP]')]
import requests
from PIL import Image
Image.open(requests.get(list_image_urls[0], stream=True).raw).convert("RGB")