Skip to main content

Pebblo Safe DocumentLoader

Pebblo enables developers to safely load data and promote their Gen AI app to deployment without worrying about the organizationโ€™s compliance and security requirements. The project identifies semantic topics and entities found in the loaded data and summarizes them on the UI or a PDF report.

Pebblo has two components.

  1. Pebblo Safe DocumentLoader for Langchain
  2. Pebblo Daemon

This document describes how to augment your existing Langchain DocumentLoader with Pebblo Safe DocumentLoader to get deep data visibility on the types of Topics and Entities ingested into the Gen-AI Langchain application. For details on Pebblo Daemon see this pebblo daemon document.

Pebblo Safeloader enables safe data ingestion for Langchain DocumentLoader. This is done by wrapping the document loader call with Pebblo Safe DocumentLoader.

How to Pebblo enable Document Loading?โ€‹

Assume a Langchain RAG application snippet using CSVLoader to read a CSV document for inference.

Here is the snippet of Document loading using CSVLoader.

from langchain.document_loaders.csv_loader import CSVLoader

loader = CSVLoader("data/corp_sens_data.csv")
documents = loader.load()

The Pebblo SafeLoader can be enabled with few lines of code change to the above snippet.

from langchain.document_loaders.csv_loader import CSVLoader
from langchain_community.document_loaders import PebbloSafeLoader

loader = PebbloSafeLoader(
name="acme-corp-rag-1", # App name (Mandatory)
owner="Joe Smith", # Owner (Optional)
description="Support productivity RAG application", # Description (Optional)
documents = loader.load()