Google Drive#
Google Drive is a file storage and synchronization service developed by Google.
This notebook covers how to load documents from Google Drive
. Currently, only Google Docs
are supported.
Prerequisites#
Create a Google Cloud project or use an existing project
Enable the Google Drive API
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
🧑 Instructions for ingesting your Google Docs data#
By default, the GoogleDriveLoader
expects the credentials.json
file to be ~/.credentials/credentials.json
, but this is configurable using the credentials_path
keyword argument. Same thing with token.json
- token_path
. Note that token.json
will be created automatically the first time you use the loader.
GoogleDriveLoader
can load from a list of Google Docs document ids or a folder id. You can obtain your folder and document id from the URL:
Folder: https://drive.google.com/drive/u/0/folders/1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5 -> folder id is
"1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5"
Document: https://docs.google.com/document/d/1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw/edit -> document id is
"1bfaMQ18_i56204VaQDVeAFpqEijJTgvurupdEDiaUQw"
!pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
from langchain.document_loaders import GoogleDriveLoader
loader = GoogleDriveLoader(
folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
# Optional: configure whether to recursively fetch files from subfolders. Defaults to False.
recursive=False
)
docs = loader.load()
When you pass a folder_id
by default all files of type document, sheet and pdf are loaded. You can modify this behaviour by passing a file_types
argument
loader = GoogleDriveLoader(
folder_id="1yucgL9WGgWZdM1TOuKkeghlPizuzMYb5",
file_types=["document", "sheet"]
recursive=False
)