PullMdLoader
Loader for converting URLs into Markdown using the pull.md service.
This package implements a document loader for web content. Unlike traditional web scrapers, PullMdLoader can handle web pages built with dynamic JavaScript frameworks like React, Angular, or Vue.js, converting them into Markdown without local rendering.
Overviewโ
Integration detailsโ
Class | Package | Local | Serializable | JS Support |
---|---|---|---|---|
PullMdLoader | langchain-pull-md | โ | โ | โ |
Setupโ
Installationโ
pip install langchain-pull-md
Initializationโ
from langchain_pull_md.markdown_loader import PullMdLoader
# Instantiate the loader with a URL
loader = PullMdLoader(url="https://example.com")
Loadโ
documents = loader.load()
documents[0].metadata
{'source': 'https://example.com',
'page_content': '# Example Domain\nThis domain is used for illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.'}
Lazy Loadโ
No lazy loading is implemented. PullMdLoader
performs a real-time conversion of the provided URL into Markdown format whenever the load
method is called.
API reference:โ
Relatedโ
- Document loader conceptual guide
- Document loader how-to guides