aid: sketches
url: https://raw.githubusercontent.com/api-evangelist/sketches/refs/heads/main/apis.yml
name: Sketches
description: >-
  Sketches are probabilistic data structures used in computing and data engineering
  to approximate answers to queries over large data streams with controlled error bounds
  and dramatically reduced memory requirements. Common sketches include Count-Min Sketch
  (frequency estimation), HyperLogLog (cardinality estimation), Bloom Filter (membership
  testing), and T-Digest (quantile estimation). APIs in this domain include sketch-native
  databases like Apache DataSketches, Redis probabilistic data structures, and cloud
  analytics services that implement sketch algorithms for real-time analytics, approximate
  query processing, and streaming analytics.
tags:
  - Data Structures
  - Probabilistic Algorithms
  - Streaming Analytics
  - Approximate Query Processing
  - Big Data
  - Real-Time Analytics
created: '2025'
modified: '2026-05-02'

apis:
  - aid: sketches:apache-datasketches-api
    name: Apache DataSketches API
    description: >-
      Apache DataSketches is the open-source library providing production-quality
      implementations of sketch algorithms including Theta Sketches (set operations),
      Quantiles Sketches (percentile estimation), HLL (HyperLogLog for cardinality),
      CPC, Frequency, and Tuple sketches. It is widely used in data warehouses and
      OLAP systems including Apache Druid, Apache Spark, and Amazon Redshift. The
      library provides Java, C++, and Python APIs.
    humanURL: https://datasketches.apache.org
    baseURL: https://datasketches.apache.org
    tags:
      - Open Source
      - Apache
      - Data Structures
      - Probabilistic Algorithms
      - Analytics
    properties:
      - url: https://datasketches.apache.org
        type: Documentation
      - url: https://github.com/apache/datasketches-java
        type: GitHubOrg

  - aid: sketches:redis-probabilistic-api
    name: Redis Probabilistic Data Structures API
    description: >-
      Redis provides native probabilistic data structure commands through the Redis
      Stack (RedisBloom module), offering server-side implementations of Bloom Filter,
      Cuckoo Filter, Count-Min Sketch, Top-K, and HyperLogLog. These are accessible
      via the Redis command interface and all official Redis client libraries, enabling
      high-throughput approximate data processing without external dependencies.
    humanURL: https://redis.io/docs/data-types/probabilistic/
    baseURL: https://redis.io
    tags:
      - Redis
      - Probabilistic Data Structures
      - Real-Time
      - In-Memory
    properties:
      - url: https://redis.io/docs/data-types/probabilistic/
        type: Documentation
      - url: https://redis.io
        type: Website

  - aid: sketches:amazon-redshift-sketches-api
    name: Amazon Redshift Approximate Query API
    description: >-
      Amazon Redshift supports approximate query processing using HyperLogLog sketch
      functions (HLL_CREATE_SKETCH, HLL_COMBINE, HLL_CARDINALITY) for fast cardinality
      estimation on large datasets. These native SQL functions enable analytics teams
      to run near-instantaneous approximate COUNT DISTINCT queries on billions of rows
      with controlled error bounds.
    humanURL: https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_function.html
    baseURL: https://aws.amazon.com
    tags:
      - AWS
      - Redshift
      - Analytics
      - HyperLogLog
      - SQL
    properties:
      - url: https://docs.aws.amazon.com/redshift/latest/dg/r_HLL_function.html
        type: Documentation

common:
  - type: Website
    url: https://datasketches.apache.org
  - type: JSON-LD
    url: json-ld/sketches-context.jsonld
  - type: Vocabulary
    url: vocabulary/sketches-vocabulary.yml