Google Cloud Storage File

Google Cloud Storage is a managed service for storing unstructured data.

This covers how to load document objects from an Google Cloud Storage (GCS) file object (blob).

# !pip install google-cloud-storage
from langchain.document_loaders import GCSFileLoader
loader = GCSFileLoader(project_name="aist", bucket="testing-hwc", blob="fake.docx")
/Users/harrisonchase/workplace/langchain/.venv/lib/python3.10/site-packages/google/auth/ UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. We recommend you rerun `gcloud auth application-default login` and make sure a quota project is added. Or you can use service accounts instead. For more information about service accounts, see
[Document(page_content='Lorem ipsum dolor sit amet.', lookup_str='', metadata={'source': '/var/folders/y6/8_bzdg295ld6s1_97_12m4lr0000gn/T/tmp3srlf8n8/fake.docx'}, lookup_index=0)]

If you want to use an alternative loader, you can provide a custom function, for example:

from langchain.document_loaders import PyPDFLoader

def load_pdf(file_path):
return PyPDFLoader(file_path)

loader = GCSFileLoader(
project_name="aist", bucket="testing-hwc", blob="fake.pdf", loader_func=load_pdf