aid: beautiful-soup
name: Beautiful Soup
description: >-
  Beautiful Soup is a Python library for pulling data out of HTML and XML files,
  widely used for web scraping and screen scraping tasks. It provides a parse tree
  API with simple methods for navigating, searching, and modifying parsed HTML/XML
  documents. Beautiful Soup automatically handles encoding, supports multiple parsers
  (html.parser, lxml, html5lib), and integrates with CSS selectors via the Soup Sieve
  library. Current stable version is 4.14.3.
type: Index
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - Data Extraction
  - HTML Parsing
  - Python
  - Scraping
  - Web Scraping
  - XML Parsing
url: >-
  https://raw.githubusercontent.com/api-evangelist/beautiful-soup/refs/heads/main/apis.yml
created: '2026-03-29'
modified: '2026-04-19'
specificationVersion: '0.19'
apis:
  - aid: beautiful-soup:beautiful-soup
    name: Beautiful Soup
    description: >-
      Beautiful Soup 4 is a Python library providing a parse tree API for HTML and
      XML documents. It exposes Tag, NavigableString, BeautifulSoup, and Comment
      objects with navigation methods (find, find_all, CSS selectors), tree traversal
      (parents, children, siblings), and modification methods (append, extract, replace).
      Supports html.parser, lxml, and html5lib parsers with automatic encoding detection.
    humanURL: https://www.crummy.com/software/BeautifulSoup/
    tags:
      - Data Extraction
      - HTML Parsing
      - Python
      - Scraping
      - Web Scraping
      - XML Parsing
    properties:
      - type: Documentation
        url: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
      - type: GettingStarted
        url: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start
      - type: SDK
        url: https://pypi.org/project/beautifulsoup4/
        title: Python Package (PyPI)

common:
  - type: Website
    url: https://www.crummy.com/software/BeautifulSoup/
  - type: Documentation
    url: https://www.crummy.com/software/BeautifulSoup/bs4/doc/
  - type: SDK
    url: https://pypi.org/project/beautifulsoup4/
    title: PyPI Package
  - type: GitHubOrganization
    url: https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4
  - type: ChangeLog
    url: https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/CHANGELOG
  - type: Features
    data:
      - name: Multi-Parser Support
        description: Supports html.parser (built-in), lxml (fast), and html5lib (browser-like) parsers for flexible HTML/XML parsing.
      - name: CSS Selector Support
        description: Full CSS4 selector support via the Soup Sieve library for familiar CSS-based element selection.
      - name: Tree Navigation API
        description: Rich API for navigating the parse tree upward, downward, and sideways including find(), find_all(), parents, children, and siblings.
      - name: Automatic Encoding Detection
        description: Automatically detects and handles document encoding using Unicode, Dammit, ensuring correct text extraction.
      - name: Tree Modification
        description: Full tree modification support including append, insert, extract, decompose, replace_with, wrap, and unwrap operations.
      - name: Output Formatting
        description: Multiple output formatters including prettify(), get_text(), and custom formatters for controlled serialization.
  - type: UseCases
    data:
      - name: Web Scraping
        description: Extract data from websites by parsing HTML pages with Beautiful Soup and navigating the DOM tree to find target elements.
      - name: Data Mining
        description: Mine structured data from HTML tables, lists, and other markup patterns across large numbers of web pages.
      - name: Content Extraction
        description: Extract article text, product information, or other content from web pages for NLP pipelines and data analysis.
      - name: Screen Scraping Legacy Systems
        description: Automate data extraction from legacy HTML web interfaces that lack modern APIs.
      - name: HTML Sanitization
        description: Parse and clean HTML documents by removing unwanted tags, scripts, and formatting.
      - name: XML Processing
        description: Parse and query XML documents using Beautiful Soup's tree navigation and search capabilities.
  - type: Integrations
    data:
      - name: Requests
        description: Python HTTP library used in combination with Beautiful Soup to fetch and parse web pages.
      - name: Scrapy
        description: Python web crawling framework that can use Beautiful Soup selectors for content extraction.
      - name: lxml
        description: Fast XML and HTML parsing library used as an alternate parser backend for Beautiful Soup.
      - name: html5lib
        description: Pure-Python HTML5 parser used with Beautiful Soup for browser-compatible HTML parsing.
      - name: Pandas
        description: DataFrame library commonly used with Beautiful Soup to convert scraped HTML tables into structured data.
      - name: Selenium
        description: Browser automation tool used with Beautiful Soup to scrape JavaScript-rendered pages.
maintainers:
  - FN: Kin Lane
    email: kin@apievangelist.com