specification: FinOps Framework specificationVersion: '1.0' provider: smolagents providerId: smolagents name: smolagents FinOps description: > smolagents is a free open-source library; FinOps considerations apply to the Hugging Face Inference Provider costs incurred when running agents against cloud-hosted LLMs. Hugging Face uses a transparent pass-through billing model with no markup over provider rates. Costs are metered per inference request based on compute time and hardware cost. All tiers receive monthly included credits; pay-as-you-go credits can be purchased after credits are exhausted. created: '2026-06-12' modified: '2026-06-12' billingModel: type: metered description: > Pay-per-inference-request based on compute time multiplied by hardware price. Hugging Face passes through provider costs at no markup. Monthly free credits ($0.10 free / $2.00 PRO+Team+Enterprise per seat) offset initial usage. Beyond credits, users purchase additional credit bundles. granularity: per-request currency: USD invoicingCadence: monthly freeCredits: freeUser: 0.10 proUser: 2.00 teamPerSeat: 2.00 enterprisePerSeat: 2.00 computeExample: description: 10-second GPU request at $0.00012/second = $0.0012 per request hardwareCostPerSecond: 0.00012 exampleDurationSeconds: 10 exampleCost: 0.0012 focusColumns: - ChargeCategory: Usage - ChargeClass: Regular - ServiceName: Hugging Face Inference Providers - ServiceCategory: AI and Machine Learning - ChargeDescription: LLM inference compute time - Region: Provider-dependent (Cerebras, Together AI, SambaNova, etc.) - PricingUnit: compute-second - ListUnitPrice: variable-by-provider-and-hardware - BilledCurrency: USD - ResourceType: GPU/CPU inference hardware - SkuId: model-id + provider combination meters: - name: Inference Compute description: > Metered cost per inference request computed as hardware-cost-per-second multiplied by request duration in seconds. Reported per model and provider in the Hugging Face Inference Providers usage dashboard. unit: compute-second billingDimension: duration × hardware-price monitoringUrl: https://huggingface.co/settings/inference-providers/overview - name: Hub Storage description: > Optional storage costs for models, datasets, and Spaces hosted on the Hugging Face Hub. Public repos: $8-12/TB/month; private repos: $12-18/TB/month with volume discounts. unit: TB/month tiers: - range: any publicPrice: 8.00 privatePrice: 12.00 currency: USD note: Volume discounts up to 33% at 500TB+ principles: - name: Visibility description: > Track inference spend per model and provider via the Hugging Face Inference Providers usage dashboard. Organizations on Team and Enterprise plans can view spend broken down by user and set spending limits from the organization billing page. Monthly credit usage is visible on the personal billing page at huggingface.co/settings/billing. actions: - Monitor usage at https://huggingface.co/settings/inference-providers/overview - Set organization spending limits in Team/Enterprise billing settings - Use X-HF-Bill-To header to centralize org billing across members - Review per-model, per-provider cost breakdown monthly - name: Optimization description: > Reduce inference costs by using smaller quantized models, running local inference with Transformers or Ollama (zero cloud cost), batching agent tasks to minimize LLM round-trips, using cached results where possible, and choosing cost-effective providers for non-latency-sensitive workloads. actions: - Prefer local TransformersModel or MLXModel for dev/test to avoid cloud charges - Use smaller quantized models (e.g., 4-bit) for tasks where quality trade-off is acceptable - Limit agent max_steps to control maximum number of LLM calls per run - Choose providers by cost using HF Inference Providers pricing comparison - Bring custom provider API keys to bypass HF routing where better provider pricing applies - Set HF_TOKEN to use authenticated (higher free limit) tier and defer credit usage - name: Accountability description: > Assign costs to teams by using organization billing (X-HF-Bill-To header). Enterprise administrators can disable specific Inference Providers and set per-organization spending caps to enforce budget controls. actions: - Use bill_to parameter in InferenceClient to route charges to specific orgs - Enable administrator spending limits in Enterprise organization settings - Disable high-cost Inference Providers not needed by the team - Use SCIM provisioning (Enterprise) to manage user access and prevent unauthorized spend references: - url: https://huggingface.co/docs/api-inference/en/pricing description: Hugging Face Inference Providers pricing documentation - url: https://huggingface.co/pricing description: Hugging Face account plan pricing - url: https://huggingface.co/settings/billing description: Personal billing and usage dashboard