Skip to main content

Beautiful Soup

Beautiful Soup is a Python package for parsing HTML and XML documents (including having malformed markup, i.e. non-closed tags, so named after tag soup). It creates a parse tree for parsed pages that can be used to extract data from HTML,[3] which is useful for web scraping.

Installation and Setup​

pip install beautifulsoup4

Document Transformer​

See a usage example.

from langchain_community.document_loaders import BeautifulSoupTransformer

Help us out by providing feedback on this documentation page: