specificationVersion: "1.0" id: seldon-finops name: Seldon FinOps Framework description: > Seldon is a self-hosted MLOps platform running on Kubernetes, so the primary FinOps considerations relate to the underlying cloud infrastructure costs (compute, storage, networking) rather than SaaS subscription fees. Organizations pay Seldon directly only for commercial Enterprise Platform licenses. All other costs flow through the cloud provider (AWS, GCP, Azure, or on-premises). FOCUS-aligned cost tracking applies to the Kubernetes workloads that run Seldon components and the ML models they serve. provider: seldon url: https://www.seldon.io focus: schemaVersion: "1.0" currency: USD costCategories: - id: seldon-enterprise-license name: Seldon Enterprise Platform License description: > Annual commercial license fee paid to Seldon Technologies for the Enterprise Platform (formerly Seldon Deploy), including support SLA. Pricing is custom and negotiated based on number of models and organizational scale. Covers Seldon Core+, management API, RBAC, audit logging, and enterprise support. chargeType: Purchase billingPeriod: Annual chargeFrequency: Recurring pricingModel: Custom / Negotiated estimatedRange: low: 50000 high: 500000 currency: USD basis: Annual enterprise license; varies by model count and support tier optimizationOpportunities: - Negotiate multi-year contracts for better rates - Right-size model count entitlements to actual production usage - Consolidate model deployments to reduce per-model licensing exposure - id: kubernetes-compute name: Kubernetes Compute (CPU/GPU nodes) description: > The dominant cost driver for Seldon deployments. Seldon model servers, MLServer instances, and inference pods consume CPU and GPU node resources. GPU nodes (e.g., NVIDIA A100, V100, T4) for deep learning inference are significantly more expensive than CPU nodes. Costs scale with the number of replicas, instance types, and utilization. chargeType: Usage billingPeriod: Hourly chargeFrequency: Recurring pricingModel: Pay-as-you-go (cloud provider) optimizationOpportunities: - Use Kubernetes HPA to scale inference pods down during low-traffic periods - Leverage spot/preemptible instances for non-critical inference workloads - Use MLServer adaptive batching to maximize GPU utilization per node - Right-size pod resource requests/limits to avoid over-provisioning - Use multi-model serving to pack multiple models onto shared inference pods - id: kubernetes-storage name: Kubernetes Persistent Storage (Model Artifacts) description: > Model artifacts, training data, and drift detection reference datasets must be stored in persistent volumes or object storage (S3, GCS, Azure Blob) accessible to Seldon pods. Costs depend on model sizes and the number of model versions retained. chargeType: Usage billingPeriod: Monthly chargeFrequency: Recurring pricingModel: Pay-as-you-go (cloud provider) optimizationOpportunities: - Implement model artifact lifecycle policies to delete old versions - Use storage tiers (e.g., S3 Infrequent Access) for archived model versions - Compress model artifacts where framework permits - id: kubernetes-networking name: Kubernetes Networking (Ingress/Egress) description: > Inference requests entering Kubernetes clusters via ingress controllers (Ambassador, Istio, Traefik) and any egress for model artifact downloads or external monitoring integrations incur cloud networking costs. Multi-region or cross-zone traffic can add significant cost at scale. chargeType: Usage billingPeriod: Monthly chargeFrequency: Recurring pricingModel: Pay-as-you-go (cloud provider) optimizationOpportunities: - Co-locate inference clients in the same availability zone as Seldon clusters - Use private endpoints to avoid public egress charges for internal consumers - Cache model artifacts within cluster storage to minimize cross-region pulls - id: monitoring-observability name: Monitoring and Observability Infrastructure description: > Seldon integrates with Prometheus for metrics scraping and Grafana for dashboards. The Alibi-Detect drift detection component generates monitoring data that may be stored in Elasticsearch or other backends. These observability stacks have their own infrastructure cost footprint. chargeType: Usage billingPeriod: Monthly chargeFrequency: Recurring pricingModel: Pay-as-you-go (cloud provider or SaaS observability tool) optimizationOpportunities: - Set appropriate Prometheus retention periods to control storage costs - Sample drift detection events rather than checking every request at scale - Use cloud-native managed monitoring (Amazon Managed Prometheus, Google Cloud Monitoring) to reduce operational overhead optimizationSummary: > For Seldon deployments, the largest FinOps lever is Kubernetes compute cost, especially GPU utilization for deep learning inference. Teams should instrument cost allocation using Kubernetes namespace labels to map inference costs to business units or ML products. MLServer's multi-model serving and adaptive batching are the primary efficiency mechanisms to maximize hardware utilization before scaling out. Spot/preemptible instances with appropriate fallback policies are recommended for batch inference workloads.