Crawler#

class langchain.chains.natbot.crawler.Crawler[source]#

A crawler for web pages.

Security Note: This is an implementation of a crawler that uses a browser via

Playwright.

This crawler can be used to load arbitrary webpages INCLUDING content from the local file system.

Control access to who can submit crawling requests and what network access the crawler has.

Make sure to scope permissions to the minimal permissions necessary for the application.

See https://python.langchain.com/docs/security for more information.

Initialize the crawler.

Methods

__init__()

Initialize the crawler.

click(id_)

Click on an element with the given id.

crawl()

Crawl the current page.

enter()

Press the Enter key.

go_to_page(url)

Navigate to the given URL.

scroll(direction)

Scroll the page in the given direction.

type(id_, text)

Type text into an element with the given id.

__init__() None[source]#

Initialize the crawler.

Return type:

None

click(id_: str | int) None[source]#

Click on an element with the given id.

Parameters:
  • id – The id of the element to click on.

  • id_ (str | int)

Return type:

None

crawl() list[str][source]#

Crawl the current page.

Returns:

A list of the elements in the viewport.

Return type:

list[str]

enter() None[source]#

Press the Enter key.

Return type:

None

go_to_page(url: str) None[source]#

Navigate to the given URL.

Parameters:

url (str) – The URL to navigate to. If it does not contain a scheme, it will be prefixed with “http://”.

Return type:

None

scroll(direction: str) None[source]#

Scroll the page in the given direction.

Parameters:

direction (str) – The direction to scroll in, either “up” or “down”.

Return type:

None

type(id_: str | int, text: str) None[source]#

Type text into an element with the given id.

Parameters:
  • id – The id of the element to type into.

  • text (str) – The text to type into the element.

  • id_ (str | int)

Return type:

None