Crawler#

class langchain.chains.natbot.crawler.Crawler[source]#

A crawler for web pages.

Security Note: This is an implementation of a crawler that uses a browser via

Playwright.

This crawler can be used to load arbitrary webpages INCLUDING content from the local file system.

Control access to who can submit crawling requests and what network access the crawler has.

Make sure to scope permissions to the minimal permissions necessary for the application.

Initialize the crawler.

Methods

`__init__`()	Initialize the crawler.
`click`(id_)	Click on an element with the given id.
`crawl`()	Crawl the current page.
`enter`()	Press the Enter key.
`go_to_page`(url)	Navigate to the given URL.
`scroll`(direction)	Scroll the page in the given direction.
`type`(id_, text)	Type text into an element with the given id.

__init__() → None[source]#

Initialize the crawler.

click(id_: str | int) → None[source]#

Click on an element with the given id.

Parameters:

Return type:

None

crawl() → list[str][source]#

Crawl the current page.

enter() → None[source]#

Press the Enter key.

go_to_page(url: str) → None[source]#

Navigate to the given URL.

Parameters:: url (str) – The URL to navigate to. If it does not contain a scheme, it will be prefixed with “http://”.
Return type:: None

scroll(direction: str) → None[source]#

Scroll the page in the given direction.

Parameters:: direction (str) – The direction to scroll in, either “up” or “down”.
Return type:: None

type(id_: str | int, text: str) → None[source]#

Type text into an element with the given id.

Parameters:

Return type:

None