aid: bentoml name: BentoML Vocabulary description: Domain vocabulary for BentoML and BentoCloud API concepts url: https://raw.githubusercontent.com/api-evangelist/bentoml/refs/heads/main/vocabulary/bentoml-vocabulary.yml terms: - term: Bento definition: A packaged ML model artifact containing the model, dependencies, and service code, analogous to a deployable container image for ML inference. - term: BentoCloud definition: Managed cloud platform by BentoML for deploying, scaling, and operating Bento inference services with GPU support and scale-to-zero. - term: Deployment definition: A running instance of a Bento on BentoCloud infrastructure, with autoscaling, resource allocation, and lifecycle management. - term: Service definition: A BentoML Python class decorated with @bentoml.service that exposes model inference logic as HTTP endpoints. - term: Runner definition: A resource-isolated unit within a BentoML service that encapsulates model loading and inference execution. - term: Model Store definition: Local or cloud-managed registry for versioned ML model artifacts used by BentoML services. - term: Yatai definition: The internal BentoCloud API server (open-source component) that provides the control-plane REST API for managing Bentos, models, and deployments. - term: Cluster definition: A Kubernetes compute cluster registered with BentoCloud for running inference deployments. - term: APIToken definition: Authentication credential for accessing BentoCloud APIs, with configurable scopes for organization and cluster access. - term: BentoRepository definition: A versioned collection of Bento artifacts within BentoCloud, analogous to a container image repository. - term: ModelRepository definition: A versioned collection of ML model artifacts registered in BentoCloud. - term: DeploymentRevision definition: An immutable snapshot of a deployment configuration, enabling rollback and audit of changes over time. - term: Endpoint definition: An HTTP route exposed by a deployed BentoML service, corresponding to a decorated service method. - term: InstanceType definition: A hardware profile (CPU, memory, GPU) available for running BentoCloud deployments. - term: ScaleToZero definition: BentoCloud feature that scales idle deployments down to zero replicas to reduce cost, spinning up on demand. - term: Autoscaling definition: Automatic horizontal scaling of deployment replicas based on request traffic and configured thresholds. - term: Organization definition: A BentoCloud tenant account that owns clusters, deployments, tokens, and billing. - term: Secret definition: An encrypted key-value credential stored in BentoCloud and injected into deployments at runtime. - term: LimitGroup definition: A policy object defining resource quota limits applied to an organization or cluster. - term: GPUConfig definition: Configuration specifying GPU type, memory, and count for a deployment or instance type. - term: AdaptiveBatching definition: BentoML feature that automatically groups concurrent inference requests into batches for throughput optimization. - term: HostCluster definition: The BentoCloud-managed Kubernetes cluster that hosts tenant deployments.