name: Scalability Vocabulary
description: >-
  Normative vocabulary for the scalability topic domain, covering key terms,
  concepts, patterns, and classifications used in application scaling, infrastructure
  elasticity, load balancing, and performance engineering.
created: '2026-05-02'
modified: '2026-05-02'
tags:
  - Auto Scaling
  - Cloud Computing
  - Distributed Systems
  - Elasticity
  - Infrastructure
  - Scalability

terms:

  - term: Auto Scaling
    definition: >-
      The process of automatically adjusting the number of compute resources allocated
      to an application based on current demand, predefined rules, or observed metrics.
    synonyms:
      - Autoscaling
      - Elastic Scaling
      - Dynamic Scaling
    related:
      - Horizontal Scaling
      - Vertical Scaling
      - Scale-to-Zero

  - term: Horizontal Scaling
    definition: >-
      Adding more instances (pods, VMs, nodes) to handle increased load. Also called
      scale-out. The dominant pattern for cloud-native applications because it allows
      stateless services to run across many machines.
    synonyms:
      - Scale-Out
      - Horizontal Pod Autoscaling (HPA)
    related:
      - Replication
      - Load Balancing
      - Stateless Architecture

  - term: Vertical Scaling
    definition: >-
      Increasing the resources (CPU, RAM, disk) allocated to a single instance rather
      than adding more instances. Also called scale-up. Limited by the maximum size
      of available hardware.
    synonyms:
      - Scale-Up
      - Vertical Pod Autoscaling (VPA)
    related:
      - Resource Limits
      - Resource Requests

  - term: Scale-to-Zero
    definition: >-
      The ability to reduce a workload's replica count to zero when there is no demand,
      eliminating idle resource consumption. When demand returns, the system scales back
      up from zero. Supported by KEDA, Knative, and serverless platforms.
    related:
      - Event-Driven Autoscaling
      - KEDA
      - Cold Start

  - term: Cold Start
    definition: >-
      The latency penalty incurred when a scaled-to-zero workload must spin up new
      instances to handle the first request after a period of inactivity.
    related:
      - Scale-to-Zero
      - Warm Pool

  - term: Event-Driven Autoscaling
    definition: >-
      Scaling decisions based on external event sources such as message queue depth,
      stream lag, or custom metrics, rather than purely on CPU or memory usage.
      KEDA is the primary Kubernetes implementation.
    related:
      - KEDA
      - Message Queue
      - Scaling Trigger

  - term: KEDA
    definition: >-
      Kubernetes Event-Driven Autoscaling. A CNCF graduate project that extends
      Kubernetes HPA to scale workloads based on external event sources. Supports
      70+ built-in scalers including Kafka, SQS, Prometheus, Redis, and more.
    acronym: KEDA
    fullName: Kubernetes Event-Driven Autoscaling
    url: https://keda.sh/
    related:
      - ScaledObject
      - ScaledJob
      - TriggerAuthentication

  - term: ScaledObject
    definition: >-
      A KEDA Custom Resource Definition (CRD) that defines autoscaling behavior
      for a Kubernetes workload (Deployment, StatefulSet, etc.) based on one or
      more scaling triggers.
    related:
      - KEDA
      - Scaling Trigger

  - term: ScaledJob
    definition: >-
      A KEDA CRD for scaling Kubernetes Jobs based on event sources. Unlike
      ScaledObject which scales long-running workloads, ScaledJob creates new
      Job instances per batch of events.
    related:
      - KEDA
      - Batch Processing

  - term: Load Balancer
    definition: >-
      A device or software component that distributes incoming network requests
      across a pool of backend servers to prevent overload on any single instance
      and to improve availability and throughput.
    related:
      - Round Robin
      - Least Connections
      - Health Check
      - Session Affinity

  - term: Round Robin
    definition: >-
      A load balancing algorithm that cycles through backend servers in order,
      sending each new request to the next server in the rotation. Simple and
      works well when backends have similar capacity.
    related:
      - Load Balancer
      - Weighted Round Robin

  - term: Least Connections
    definition: >-
      A load balancing algorithm that routes new requests to the backend with
      the fewest active connections. Better for workloads with variable request
      durations.
    related:
      - Load Balancer

  - term: Session Affinity
    definition: >-
      A load balancing feature that routes all requests from a given client to
      the same backend server for the duration of a session. Also called sticky
      sessions. Useful for stateful applications.
    synonyms:
      - Sticky Sessions
    related:
      - Load Balancer

  - term: Health Check
    definition: >-
      Periodic probes sent to backend servers to verify they are alive and capable
      of handling requests. Failed health checks remove the backend from the load
      balancer pool until it recovers.
    related:
      - Liveness Probe
      - Readiness Probe

  - term: Liveness Probe
    definition: >-
      A Kubernetes mechanism that periodically checks if a container is still
      running. If the probe fails, Kubernetes restarts the container.
    related:
      - Health Check
      - Readiness Probe

  - term: Readiness Probe
    definition: >-
      A Kubernetes mechanism that checks if a container is ready to accept traffic.
      If the probe fails, the container is removed from the Service's endpoint list.
    related:
      - Health Check
      - Liveness Probe

  - term: Cooldown Period
    definition: >-
      A period after a scaling event during which further scale-down actions are
      suppressed. Prevents thrashing (rapid scale-up/scale-down cycles).
    synonyms:
      - Stabilization Window
    related:
      - Auto Scaling

  - term: Replication Factor
    definition: >-
      The number of copies (replicas) of a service, database partition, or data
      object maintained to ensure availability and fault tolerance.
    related:
      - Horizontal Scaling
      - Fault Tolerance

  - term: Throughput
    definition: >-
      The number of requests or operations a system can process per unit of time,
      typically measured in requests per second (RPS) or transactions per second (TPS).
    related:
      - Latency
      - Capacity
      - Performance

  - term: Latency
    definition: >-
      The time elapsed between sending a request and receiving the first byte of
      the response. Typically measured in milliseconds (ms) or percentile buckets
      (p50, p95, p99).
    related:
      - Throughput
      - SLA

  - term: SLA
    definition: >-
      Service Level Agreement. A commitment between provider and user on the
      availability, throughput, and latency the service will maintain, along with
      remedies if commitments are not met.
    acronym: SLA
    fullName: Service Level Agreement
    related:
      - SLO
      - SLI

  - term: SLO
    definition: >-
      Service Level Objective. A specific measurable characteristic of an SLA, such
      as p99 latency under 200ms or 99.9% uptime.
    acronym: SLO
    fullName: Service Level Objective
    related:
      - SLA
      - SLI

  - term: SLI
    definition: >-
      Service Level Indicator. The actual measured metric used to determine if an
      SLO is being met (e.g., request success rate, latency percentile).
    acronym: SLI
    fullName: Service Level Indicator
    related:
      - SLO
      - SLA

  - term: Horizontal Pod Autoscaler (HPA)
    definition: >-
      A Kubernetes resource that automatically scales the number of pods in a
      Deployment or ReplicaSet based on CPU utilization or custom metrics exposed
      via the Kubernetes Metrics API.
    acronym: HPA
    fullName: Horizontal Pod Autoscaler
    related:
      - KEDA
      - Vertical Pod Autoscaler

  - term: Vertical Pod Autoscaler (VPA)
    definition: >-
      A Kubernetes resource that automatically adjusts CPU and memory resource
      requests and limits for containers based on historical usage patterns.
    acronym: VPA
    fullName: Vertical Pod Autoscaler
    related:
      - Horizontal Pod Autoscaler

  - term: Prometheus
    definition: >-
      An open-source monitoring and alerting toolkit, and CNCF graduate project,
      that collects time-series metrics and is widely used as a scaling trigger
      source for KEDA and custom metrics HPA configurations.
    related:
      - Grafana
      - Metrics
      - Alerting

  - term: Grafana
    definition: >-
      An open-source analytics and observability platform used to visualize metrics,
      logs, and traces from multiple data sources including Prometheus. Widely used
      in scalability engineering for dashboards and alerts.
    related:
      - Prometheus
      - Observability

  - term: Sharding
    definition: >-
      A database scalability technique that partitions data across multiple database
      instances (shards) based on a shard key, allowing horizontal scaling of
      databases that would otherwise be limited to a single node.
    related:
      - Database Scaling
      - Horizontal Scaling
      - Partition Key

  - term: Caching
    definition: >-
      Storing copies of frequently accessed data in faster storage (in-memory systems
      like Redis or Memcached) to reduce the load on primary data stores and improve
      response times at scale.
    related:
      - Redis
      - CDN
      - Cache Eviction

  - term: CDN
    definition: >-
      Content Delivery Network. A geographically distributed network of proxy servers
      that caches content closer to users to reduce latency and the load on origin
      servers, enabling global scalability.
    acronym: CDN
    fullName: Content Delivery Network
    related:
      - Caching
      - Edge Computing

  - term: Circuit Breaker
    definition: >-
      A resilience pattern that prevents cascading failures by detecting when a
      downstream service is failing and short-circuiting requests to it for a
      cooldown period before retrying.
    related:
      - Retry
      - Bulkhead
      - Resilience

  - term: Bulkhead
    definition: >-
      A resilience pattern that isolates components of a system into pools so that
      if one pool is exhausted or fails, it does not affect other pools. Analogous
      to watertight compartments in a ship.
    related:
      - Circuit Breaker
      - Resilience

categories:
  - name: Scaling Mechanisms
    terms:
      - Auto Scaling
      - Horizontal Scaling
      - Vertical Scaling
      - Scale-to-Zero
      - Event-Driven Autoscaling

  - name: Kubernetes Scaling
    terms:
      - KEDA
      - ScaledObject
      - ScaledJob
      - Horizontal Pod Autoscaler (HPA)
      - Vertical Pod Autoscaler (VPA)

  - name: Traffic Distribution
    terms:
      - Load Balancer
      - Round Robin
      - Least Connections
      - Session Affinity

  - name: Observability
    terms:
      - Prometheus
      - Grafana
      - Health Check
      - Liveness Probe
      - Readiness Probe

  - name: Performance Metrics
    terms:
      - Throughput
      - Latency
      - SLA
      - SLO
      - SLI

  - name: Data Scalability
    terms:
      - Sharding
      - Caching
      - CDN
      - Replication Factor

  - name: Resilience Patterns
    terms:
      - Circuit Breaker
      - Bulkhead
      - Cooldown Period