Skip to main content
Open on GitHub

Docling

Docling parses PDF, DOCX, PPTX, HTML, and other formats into a rich unified representation including document layout, tables etc., making them ready for generative AI workflows like RAG.

This integration provides Docling's capabilities via the DoclingLoader document loader.

Installation and Setup

Simply install langchain-docling from your package manager, e.g. pip:

pip install langchain-docling

Document Loader

The DoclingLoader class in langchain-docling seamlessly integrates Docling into LangChain, enabling you to:

  • use various document types in your LLM applications with ease and speed, and
  • leverage Docling's rich representation for advanced, document-native grounding.

Basic usage looks as follows:

from langchain_docling import DoclingLoader

FILE_PATH = ["https://arxiv.org/pdf/2408.09869"] # Docling Technical Report

loader = DoclingLoader(file_path=FILE_PATH)

docs = loader.load()

For end-to-end usage check out this example.

Additional Resources


Was this page helpful?