FireCrawlLoader#
- class langchain_community.document_loaders.firecrawl.FireCrawlLoader(url: str, *, api_key: str | None = None, api_url: str | None = None, mode: Literal['crawl', 'scrape', 'map'] = 'crawl', params: dict | None = None)[source]#
FireCrawlLoader document loader integration
- Setup:
Install
firecrawl-py
,``langchain_community`` and set environment variableFIRECRAWL_API_KEY
.pip install -U firecrawl-py langchain_community export FIRECRAWL_API_KEY="your-api-key"
- Instantiate:
from langchain_community.document_loaders import FireCrawlLoader loader = FireCrawlLoader( url = "https://firecrawl.dev", mode = "crawl" # other params = ... )
- Lazy load:
docs = [] docs_lazy = loader.lazy_load() # async variant: # docs_lazy = await loader.alazy_load() for doc in docs_lazy: docs.append(doc) print(docs[0].page_content[:100]) print(docs[0].metadata)
Introducing [Smart Crawl!](https://www.firecrawl.dev/smart-crawl) Join the waitlist to turn any web {'ogUrl': 'https://www.firecrawl.dev/', 'title': 'Home - Firecrawl', 'robots': 'follow, index', 'ogImage': 'https://www.firecrawl.dev/og.png?123', 'ogTitle': 'Firecrawl', 'sitemap': {'lastmod': '2024-08-12T00:28:16.681Z', 'changefreq': 'weekly'}, 'keywords': 'Firecrawl,Markdown,Data,Mendable,Langchain', 'sourceURL': 'https://www.firecrawl.dev/', 'ogSiteName': 'Firecrawl', 'description': 'Firecrawl crawls and converts any website into clean markdown.', 'ogDescription': 'Turn any website into LLM-ready data.', 'pageStatusCode': 200, 'ogLocaleAlternate': []}
- Async load:
docs = await loader.aload() print(docs[0].page_content[:100]) print(docs[0].metadata)
Introducing [Smart Crawl!](https://www.firecrawl.dev/smart-crawl) Join the waitlist to turn any web {'ogUrl': 'https://www.firecrawl.dev/', 'title': 'Home - Firecrawl', 'robots': 'follow, index', 'ogImage': 'https://www.firecrawl.dev/og.png?123', 'ogTitle': 'Firecrawl', 'sitemap': {'lastmod': '2024-08-12T00:28:16.681Z', 'changefreq': 'weekly'}, 'keywords': 'Firecrawl,Markdown,Data,Mendable,Langchain', 'sourceURL': 'https://www.firecrawl.dev/', 'ogSiteName': 'Firecrawl', 'description': 'Firecrawl crawls and converts any website into clean markdown.', 'ogDescription': 'Turn any website into LLM-ready data.', 'pageStatusCode': 200, 'ogLocaleAlternate': []}
Initialize with API key and url.
- Parameters:
url (str) – The url to be crawled.
api_key (str | None) – The Firecrawl API key. If not specified will be read from env var FIRECRAWL_API_KEY. Get an API key
api_url (str | None) – The Firecrawl API URL. If not specified will be read from env var FIRECRAWL_API_URL or defaults to https://api.firecrawl.dev.
mode (Literal['crawl', 'scrape', 'map']) – The mode to run the loader in. Default is “crawl”. Options include “scrape” (single url), “crawl” (all accessible sub pages), “map” (returns list of links that are semantically related).
params (dict | None) – The parameters to pass to the Firecrawl API. Examples include crawlerOptions. For more details, visit: mendableai/firecrawl-py
Methods
__init__
(url, *[, api_key, api_url, mode, ...])Initialize with API key and url.
A lazy loader for Documents.
aload
()Load data into Document objects.
A lazy loader for Documents.
legacy_crawler_options_adapter
(params)legacy_scrape_options_adapter
(params)load
()Load data into Document objects.
load_and_split
([text_splitter])Load Documents and split into chunks.
- __init__(url: str, *, api_key: str | None = None, api_url: str | None = None, mode: Literal['crawl', 'scrape', 'map'] = 'crawl', params: dict | None = None)[source]#
Initialize with API key and url.
- Parameters:
url (str) – The url to be crawled.
api_key (str | None) – The Firecrawl API key. If not specified will be read from env var FIRECRAWL_API_KEY. Get an API key
api_url (str | None) – The Firecrawl API URL. If not specified will be read from env var FIRECRAWL_API_URL or defaults to https://api.firecrawl.dev.
mode (Literal['crawl', 'scrape', 'map']) – The mode to run the loader in. Default is “crawl”. Options include “scrape” (single url), “crawl” (all accessible sub pages), “map” (returns list of links that are semantically related).
params (dict | None) – The parameters to pass to the Firecrawl API. Examples include crawlerOptions. For more details, visit: mendableai/firecrawl-py
- async alazy_load() AsyncIterator[Document] #
A lazy loader for Documents.
- Return type:
AsyncIterator[Document]
- lazy_load() Iterator[Document] [source]#
A lazy loader for Documents.
- Return type:
Iterator[Document]
- legacy_crawler_options_adapter(params: dict) dict [source]#
- Parameters:
params (dict)
- Return type:
dict
- legacy_scrape_options_adapter(params: dict) dict [source]#
- Parameters:
params (dict)
- Return type:
dict
- load_and_split(text_splitter: TextSplitter | None = None) list[Document] #
Load Documents and split into chunks. Chunks are returned as Documents.
Do not override this method. It should be considered to be deprecated!
- Parameters:
text_splitter (Optional[TextSplitter]) – TextSplitter instance to use for splitting documents. Defaults to RecursiveCharacterTextSplitter.
- Returns:
List of Documents.
- Return type:
list[Document]
Examples using FireCrawlLoader