Crawler#
- class langchain.chains.natbot.crawler.Crawler[source]#
A crawler for web pages.
- Security Note: This is an implementation of a crawler that uses a browser via
Playwright.
This crawler can be used to load arbitrary webpages INCLUDING content from the local file system.
Control access to who can submit crawling requests and what network access the crawler has.
Make sure to scope permissions to the minimal permissions necessary for the application.
See https://python.langchain.com/docs/security for more information.
Initialize the crawler.
Methods
__init__
()Initialize the crawler.
click
(id_)Click on an element with the given id.
crawl
()Crawl the current page.
enter
()Press the Enter key.
go_to_page
(url)Navigate to the given URL.
scroll
(direction)Scroll the page in the given direction.
type
(id_, text)Type text into an element with the given id.
- click(id_: str | int) None [source]#
Click on an element with the given id.
- Parameters:
id – The id of the element to click on.
id_ (str | int)
- Return type:
None
- crawl() list[str] [source]#
Crawl the current page.
- Returns:
A list of the elements in the viewport.
- Return type:
list[str]
- go_to_page(url: str) None [source]#
Navigate to the given URL.
- Parameters:
url (str) – The URL to navigate to. If it does not contain a scheme, it will be prefixed with “http://”.
- Return type:
None