specificationVersion: "0.1"
id: seldon-rate-limits
name: Seldon Rate Limits
description: >
  Seldon Core and the Seldon Enterprise Platform are self-hosted, Kubernetes-native platforms.
  As a result, there are no platform-enforced API rate limits set by Seldon itself — throughput
  and concurrency are governed by the infrastructure resources allocated to the deployment
  (CPU, memory, GPU) and the Kubernetes ingress controller configuration (e.g., Ambassador,
  Istio, or Traefik). Organizations configure their own rate limiting at the ingress layer.
  The Open Inference Protocol endpoints (REST and gRPC) are limited by Kubernetes pod replica
  scaling and resource quotas set by the cluster administrator.
provider: seldon
url: https://docs.seldon.ai/seldon-core-2

rateLimits:

  - scope: Inference Endpoints (REST/gRPC)
    description: >
      Seldon inference endpoints do not have hard-coded platform rate limits. Throughput is
      bounded by Kubernetes pod resources and horizontal pod autoscaler (HPA) configuration.
      Administrators should configure ingress-level rate limiting (e.g., Ambassador rate limit
      service or Istio EnvoyFilter) to protect against overload. Adaptive batching in MLServer
      can improve throughput under high concurrency.
    enforced: false
    configurable: true
    configurationLayer: Kubernetes Ingress / Ambassador / Istio
    references:
      - https://docs.seldon.ai/seldon-core-2/user-guide/inference/inference
      - https://docs.seldon.ai/seldon-core-2

  - scope: Enterprise Platform Management API
    description: >
      The Seldon Enterprise Platform REST API ($ML_PLATFORM_HOST/seldon-deploy/swagger/) does
      not publish specific rate limit thresholds. Access is authenticated via OIDC/OAuth2 and
      session tokens. Organizations should apply ingress-level or API gateway rate policies
      appropriate to their operational scale.
    enforced: false
    configurable: true
    configurationLayer: Kubernetes Ingress / API Gateway
    references:
      - https://deploy.seldon.io/en/v2.3/contents/product-tour/api/index.html

  - scope: Kubernetes Resource Quotas
    description: >
      All Seldon workloads run inside Kubernetes namespaces subject to cluster-level resource
      quotas (CPU, memory, GPU). Scaling behavior is managed through Kubernetes HPA policies
      and is fully controlled by the platform administrator. Seldon does not impose additional
      limits beyond what Kubernetes enforces.
    enforced: true
    configurable: true
    configurationLayer: Kubernetes Namespace ResourceQuota
    references:
      - https://github.com/SeldonIO/seldon-core

notes: >
  Because Seldon is a self-hosted, infrastructure-agnostic platform, rate limits are an
  operational responsibility of the deploying organization rather than a constraint imposed
  by Seldon as a SaaS vendor. Teams should size their Kubernetes clusters and configure
  autoscaling policies to meet their inference throughput SLOs.