Powered by
.png)
- Overview
- Summary
- Nodes
- Namespaces
- Workloads
- Pods
- Jobs
- Networking
- Storage
- Configuration
- Security
- Kubernetes Events
- AKS Best Practices
Cluster Overview
Cluster Name: aks-181225-test-uks
Cluster Health Score
Score: 81 / 100
This score is calculated from key checks across nodes, workloads, security, and configuration best practices. A higher score means fewer issues and better adherence to Kubernetes standards.
API Server Health
latency (p99): 5.00 ms
Liveness: livez check passed
[+]ping ok [+]log ok [+]etcd ok [+]poststarthook/start-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/storage-object-count-tracker-hook ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/start-system-namespaces-controller ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok [+]poststarthook/start-legacy-token-tracking-controller ok [+]poststarthook/start-service-ip-repair-controllers ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/start-kubernetes-service-cidr-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-status-local-available-controller ok [+]poststarthook/apiservice-status-remote-available-controller ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-discovery-controller ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [+]poststarthook/apiservice-openapiv3-controller ok livez check passed
Readiness: readyz check passed
[+]ping ok [+]log ok [+]etcd ok [+]etcd-readiness ok [+]informer-sync ok [+]poststarthook/start-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/storage-object-count-tracker-hook ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/start-system-namespaces-controller ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-controller ok [+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok [+]poststarthook/start-legacy-token-tracking-controller ok [+]poststarthook/start-service-ip-repair-controllers ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/start-kubernetes-service-cidr-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-status-local-available-controller ok [+]poststarthook/apiservice-status-remote-available-controller ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-discovery-controller ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [+]poststarthook/apiservice-openapiv3-controller ok [+]shutdown ok readyz check passed
Passed / Failed Checks
This shows the number of health checks that passed out of the total checks performed across the cluster. A higher pass rate indicates better overall cluster health.
Top 5 Improvements
These are the five checks whose remediation will yield the most immediate benefit to your overall Cluster Health Score. Each card shows the cluster score points you’ll recover by fixing it.
RBAC Overexposure
Pods Running as Root
Missing Readiness and Liveness Probes
Deployment Missing Replicas
Pods Missing Secure Defaults
Issue Summary
This section shows how many checks have failed at each severity level over the last run. Click on a card below to expand and review those checks.
7 checks failed
18 checks failed
3 checks failed
Rightsizing at a Glance
Excluded Namespaces These namespaces are excluded from analysis and reporting.
aks-istio-system • calico-system • coredns • gatekeeper-system • kube-flannel • kube-node-lease • kube-public • kube-system • local-path-storage • tigera-operator
Cluster Summary
Cluster Name: aks-181225-test-uks
Kubernetes Version: v1.33.6
Cluster Metrics Summary Summary of metrics including node and pod counts, warnings, and issues.
| 🚀 Nodes: 3 | 🟩 Healthy: 3 | 🟥 Issues: 0 |
| 📦 Pods: 67 | 🟩 Running: 63 | 🟥 Failed: 0 |
| 🔄 Restarts: 22 | 🟨 Warnings: 5 | 🟥 Critical: 0 |
| ⏳ Pending Pods: 4 | 🟡 Waiting: 4 | |
| ⚠️ Stuck Pods: 4 | ❌ Stuck: 4 | |
| 📉 Job Failures: 0 | 🔴 Failed: 0 |
Pod Distribution Average, min, and max pods per node and total node count.
| Avg: 21.0 | Max: 42 | Min: 10 | Total Nodes: 3 |
Cluster Health Metrics (Last 24h) 24-hour Prometheus averages and charts for cluster CPU and memory usage.
Avg CPU: 5.23%
Avg Memory: 19.58%
Cluster CPU Usage (%)
Historical CPU metrics from Prometheus, averaged over the last 24 hours.
Cluster Memory Usage (%)
Historical memory metrics from Prometheus, averaged over the last 24 hours.
Cluster Events
Errors: 4
Warnings: 4
Node Conditions & Resources
NODE001 - Node Readiness and Conditions Detects nodes that are not in Ready state or reporting other warning conditions.
✅ All Nodes are healthy.
Show Findings
| Node | Status | Issues |
|---|---|---|
| aks-systempool-39088964-vmss00000k | ✅ Healthy | None |
| aks-systempool-39088964-vmss00000l | ✅ Healthy | None |
| aks-systempool-39088964-vmss00000m | ✅ Healthy | None |
NODE002 - Node Resource Pressure (Last 24h) Detects nodes under high CPU, memory, or disk pressure.
Data source: Prometheus (24h average)
✅ All Nodes are healthy.
Show Findings
| Node | CPU Status | CPU % | CPU Used | CPU Total | Mem Status | Mem % | Mem Used | Mem Total | Disk % | Disk Status |
|---|---|---|---|---|---|---|---|---|---|---|
| aks-systempool-39088964-vmss00000k | ✅ Normal | 7.80% | 301 mC | 3860 mC | ✅ Normal | 32.9% | 4888 Mi | 14846 Mi | 21.55% | ✅ Normal |
| aks-systempool-39088964-vmss00000l | ✅ Normal | 3.64% | 140 mC | 3860 mC | ✅ Normal | 12.7% | 1887 Mi | 14850 Mi | 20.53% | ✅ Normal |
| aks-systempool-39088964-vmss00000m | ✅ Normal | 4.26% | 164 mC | 3860 mC | ✅ Normal | 13.1% | 1943 Mi | 14846 Mi | 20.49% | ✅ Normal |
NODE003 - Max Pods per Node Alerts when any node is running too many pods according to configured thresholds.
⚠️ Total Nodes with Issues: 1
Show Recommendations
- Run
kubectl get pods -o wide --all-namespacesand group by.spec.nodeNameto see pod distribution. - Use
kubectl describe node <node-name>to inspect allocatable pods and taints. - Consider tuning the kubelet’s
--max-podsflag if you need higher density. - Scale out your node pool or add additional nodes to balance the load.
- Docs: Kubernetes Nodes
Show Findings
| Node | PodCount | Capacity | Percentage | Threshold | Status |
|---|---|---|---|---|---|
| aks-systempool-39088964-vmss00000k | 42 | 50 | 84.00% | 80% | Warning |
PROM005 - Overcommitted CPU (Prometheus) Checks if CPU requests on nodes exceed allocatable capacity over the last 24 hours.
✅ All Nodes are healthy.
PROM006 - Node Sizing Insights (Prometheus) Uses Prometheus p95 CPU and memory usage over a fixed 7-day window to highlight underutilized or saturated nodes and suggest sizing actions.
✅ All Nodes are healthy.
📅 Insufficient Prometheus history for sizing. Required: 7 days, available: 5.75 days.
Show Recommendations
Node Sizing Guidance
- Focus on sustained p95 trends, not short spikes.
- Sizing window is fixed to 7 days for stable, lower-cost query execution.
- Nodes flagged as underutilized are candidates for smaller SKUs or scale-in.
- Nodes flagged as saturated likely need larger SKUs, scale-out, or workload rebalancing.
- Validate with workload requests/limits and HPA/VPA behavior before applying changes.
Show Findings
| Status | Required Days | Available Days | Message |
|---|---|---|---|
| Insufficient Prometheus history | 7 | 5.8 | Node sizing recommendations are withheld until at least 7 days of Prometheus history is available. |
Node: aks-systempool-39088964-vmss00000kCPU: 7.80%Mem: 32.93%Disk: 21.55%
OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0
CPU: 7.80%
Memory: 32.93%
Disk: 21.55%
CPU Usage (%)
Memory Usage (%)
Disk Usage (%)
Node: aks-systempool-39088964-vmss00000lCPU: 3.64%Mem: 12.71%Disk: 20.53%
OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0
CPU: 3.64%
Memory: 12.71%
Disk: 20.53%
CPU Usage (%)
Memory Usage (%)
Disk Usage (%)
Node: aks-systempool-39088964-vmss00000mCPU: 4.26%Mem: 13.09%Disk: 20.49%
OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0
CPU: 4.26%
Memory: 13.09%
Disk: 20.49%
CPU Usage (%)
Memory Usage (%)
Disk Usage (%)
Namespaces
NS001 - Empty Namespaces Finds namespaces with no running pods.
⚠️ Total Namespaces with Issues: 1
Show Recommendations
- Check if any other resources (PVCs, Secrets) exist before deleting.
- Use
kubectl get all -nto inspect. - Clean up empty namespaces to reduce clutter.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| default | namespace/default | ⚠️ Partial | No pods, but other resources exist |
NS002 - Missing or Weak ResourceQuotas Detects namespaces with missing or incomplete ResourceQuota definitions.
⚠️ Total ResourceQuotas with Issues: 2
Show Recommendations
- Define limits using
ResourceQuotafor pods, memory, and CPU. - Helps avoid over-provisioning and noisy neighbor issues.
- Review quotas using
kubectl describe quota -n.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | namespace/azure-store | ❌ No ResourceQuota | |
| default | namespace/default | ❌ No ResourceQuota |
NS003 - Missing LimitRanges Detects namespaces without a defined LimitRange.
⚠️ Total LimitRanges with Issues: 2
Show Recommendations
- LimitRanges define default and max values for CPU/memory.
- Prevents pods from using unlimited resources.
- Use
kubectl create limitrange ...orkubectl describe limitrange -n.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | namespace/azure-store | ❌ No LimitRange | |
| default | namespace/default | ❌ No LimitRange |
NS004 - Pods in Default Namespace Flags pods running in the default namespace.
✅ All Pods are healthy.
Show Recommendations
- Use
kubectl get pods -n defaultto list them. - Re-deploy your workloads into a custom namespace:
kubectl create namespace my-app kubectl -n my-app apply -f your-manifests.yaml
- Docs: Reference
Workloads
WRK001 - DaemonSets Not Fully Running Detects DaemonSets that have fewer ready pods than desired.
✅ All Workloads are healthy.
Show Recommendations
- Run
kubectl describe dsto check for scheduling issues.-n - Check node taints and conditions.
- Ensure resource requests are not too high for nodes.
- Docs: Reference
WRK002 - Deployment Missing Replicas Detects Deployments where available replicas are less than desired.
⚠️ Total Workloads with Issues: 4
Show Recommendations
- Run
kubectl describe deploymentto view status.-n - Check for failed pods using
kubectl get pods -n. - Review rollout and events for delays or crashes.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | deployment/order-service | 0/1 | Deployment has fewer available replicas than desired. |
| azure-store | deployment/product-service | 0/1 | Deployment has fewer available replicas than desired. |
| azure-store | deployment/rabbitmq | 0/1 | Deployment has fewer available replicas than desired. |
| azure-store | deployment/store-front | 0/1 | Deployment has fewer available replicas than desired. |
WRK003 - StatefulSet Incomplete Rollout Detects StatefulSets with fewer ready replicas than desired.
✅ All Workloads are healthy.
Show Recommendations
- Run
kubectl describe sts name -n namespaceto view rollout and events. - Check pod logs and PersistentVolumeClaim bindings.
- Confirm storage class availability and node scheduling constraints.
- Docs: Reference
WRK004 - HPA Misconfiguration or Inactivity Checks for HPAs with missing targets, metrics, or inactive scaling.
✅ All Workloads are healthy.
Show Recommendations
- Check if the target workload exists using
kubectl get deploy|sts -n. - Use
kubectl describe hpato inspect HPA status and events.-n - Ensure metrics-server is running and the target exposes the required metrics.
- Docs: Reference
WRK005 - Missing Resource Requests Checks that every container has explicit CPU and memory requests.
✅ All Workloads are healthy.
Show Recommendations
- Add
resources.requests.cpuandresources.requests.memoryto every container. - Review both workload and
initContainerswithkubectl get deploy,statefulset,daemonset -A -o yaml. - Apply any missing fields, then rerun KubeBuddy to confirm.
- Docs: Reference
WRK006 - PDB Coverage and Effectiveness Detects missing or weak PodDisruptionBudgets.
✅ All Workloads are healthy.
Show Recommendations
- Set
minAvailableto a safe minimum (not 0). - Avoid setting
maxUnavailableto1or100%. - Make sure PDBs match actual workloads via label selectors.
- Docs: Reference
WRK007 - Missing Readiness and Liveness Probes Detects containers without readiness or liveness probes.
⚠️ Total Workloads with Issues: 4
Show Recommendations
- Readiness probes indicate when a container is ready to receive traffic.
- Liveness probes detect if a container is stuck or dead.
- Use
httpGet,tcpSocket, orexecprobes for most apps. - Docs: Health probes in Kubernetes
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | deployment/order-service | order-service | readiness, liveness missing |
| azure-store | deployment/product-service | product-service | readiness, liveness missing |
| azure-store | deployment/rabbitmq | rabbitmq | readiness, liveness missing |
| azure-store | deployment/store-front | store-front | readiness, liveness missing |
WRK008 - Deployment Selector Without Matching Pods Detects Deployments whose selectors do not match any existing pods.
✅ All Workloads are healthy.
Show Recommendations
- Check that Deployment's
spec.selector.matchLabelsmatches the pod template's labels. - Fix any label mismatches to allow pods to be created.
- Docs: Reference
WRK009 - Deployment, Pod, and Service Label Consistency Validates that deployments, pods, and services use aligned labels and selectors.
✅ All Workloads are healthy.
Show Recommendations
- Deployment
spec.selector.matchLabelsmust match the Pod templatemetadata.labels. - Services should have
spec.selectorthat targets the same labels used by the Deployment and Pods. - Use
kubectl get deployment,svc,pod -o yamlto compare values and fix mismatches.
- Docs: Reference
WRK010 - HPA Metrics Without Matching Resource Requests Detects HPAs that scale on CPU or memory metrics when target containers lack matching requests.
✅ All Workloads are healthy.
Show Recommendations
- Add
resources.requests.cpuand/orresources.requests.memoryfor HPA target containers. - Use consistent requests across replicas to avoid unstable scaling behavior.
- After updates, validate HPA behavior with
kubectl describe hpa.
- Docs: Reference
WRK011 - VPA Update Mode and Declarative Resource Conflict Risk Flags VPAs in Auto/Recreate mode that may conflict with declarative resource ownership or HPAs.
✅ All Workloads are healthy.
Show Recommendations
- If GitOps/Helm controls requests, consider VPA
updateMode: OfforInitial. - Avoid overlapping HPA (CPU/memory) and VPA ownership without clear boundaries.
- Document which controller owns requests per workload.
- Docs: Reference
WRK012 - PodDisruptionBudget Adequacy for Replicated Workloads Validates that replicated workloads have matching PDBs with sensible settings.
✅ All Workloads are healthy.
Show Recommendations
- Ensure replicated workloads (2+ replicas) have a matching PDB.
- Avoid
minAvailableequal to replica count for normal maintenance windows. - Use pragmatic budgets (for example
maxUnavailable: 1for many workloads).
- Docs: Reference
WRK013 - CrashLoopBackOff and OOMKilled Guardrail Flags pods with CrashLoopBackOff, OOMKilled state, or high restart counts.
✅ All Workloads are healthy.
Show Recommendations
- Investigate container logs and termination reasons for recurring restarts.
- Increase memory requests/limits when OOMKilled events are observed.
- Apply sizing changes gradually and validate SLO/error rates.
- Docs: Reference
WRK014 - Missing Memory Limits Checks that every container has an explicit memory limit.
✅ All Workloads are healthy.
Show Recommendations
- Add
resources.limits.memoryto every application and init container. - Set the limit high enough for normal peaks, then tune requests separately.
- Review workloads with
kubectl get deploy,statefulset,daemonset -A -o yamlto confirm the source manifests carry the limit.
- Docs: Reference
WRK015 - Replicated Workloads Missing Spread Constraints Detects replicated workloads that define neither anti-affinity nor topology spread constraints.
✅ All Workloads are healthy.
Show Recommendations
- Add
topologySpreadConstraintsoraffinity.podAntiAffinityto each workload with multiple replicas. - Prefer distribution across nodes and zones using stable labels such as
topology.kubernetes.io/zoneandkubernetes.io/hostname. - Update the source Deployment, StatefulSet, or Helm values so the spreading rule is maintained on future releases.
- Docs: Reference
Pods
POD001 - Pods with High Restarts Detects pods that have restarted more than the configured thresholds.
✅ All Pods are healthy.
Show Recommendations
- Use
kubectl logsto view logs and identify crash causes.-n - Run
kubectl describe podto check events and probe failures.-n - Verify readiness and liveness probes are configured properly.
- Check for missing config, secrets, or volume mounts.
- Adjust resource requests/limits to avoid OOM kills.
- Docs: Reference
POD002 - Long Running Pods Flags pods that have been running longer than configured thresholds.
✅ All Pods are healthy.
Show Recommendations
- Pods with extended uptime may indicate skipped rolling updates.
- Use
kubectl rollout statusto inspect deployment progress. - Restart pods when config changes are missed or memory use drifts.
- Check if the workload is intended to be static or ephemeral.
- Docs: Reference
POD003 - Failed Pods Detects pods in a failed phase, typically due to startup errors, crashes, or misconfiguration.
✅ All Pods are healthy.
Show Recommendations
- Check the pod events with
kubectl describe pod <pod> -n <ns> - Review logs using
kubectl logs <pod> -n <ns> - Validate container specs, resource limits, and init containers
- Check node availability or taints
- Docs: Reference
POD004 - Pending Pods Detects pods stuck in a 'Pending' state due to scheduling or resource issues.
⚠️ Total Pods with Issues: 4
Show Recommendations
- Run
kubectl describe pod <pod> -n <namespace>to check scheduling events - Check if nodes meet the pod's resource requests and tolerations
- Look for unresolved PVCs, Secrets, or ConfigMaps
- Check cluster-wide CPU and memory availability
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | Pending | Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions. |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Pending | Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions. |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Pending | Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions. |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Pending | Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions. |
POD005 - CrashLoopBackOff Pods Identifies pods stuck in a CrashLoopBackOff state due to repeated container crashes.
✅ All Pods are healthy.
Show Recommendations
- Run
kubectl logs <pod-name> -n <namespace>to see error output - Describe the pod for events and messages:
kubectl describe pod <pod> -n <ns> - Check init containers, config errors, and resource limits
- Docs: Reference
POD006 - Leftover Debug Pods Detects pods created by kubectl debug that have not been cleaned up.
✅ All Pods are healthy.
Show Recommendations
- Run
kubectl delete podto remove them-n - Ensure automation or users clean up after using
kubectl debug
- Docs: Reference
POD007 - Container images do not use latest tag Flags containers using latest or no explicit tag.
⚠️ Total Pods with Issues: 4
Show Recommendations
🛠️ Use Specific Image Tags
- Don't use the
:latesttag or leave the image tag blank. - Why: It can pull different images on each deploy, leading to drift.
- Fix: Tag images explicitly (e.g.,
:v1.2.3) and update the pod spec. - Docs: Kubernetes Image Tagging
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | ghcr.io/azure-samples/aks-store-demo/order-service:latest | Container order-service: Image uses latest tag |
| azure-store | pod/order-service-65cc8855c-ghk9m | busybox | Container wait-for-rabbitmq: Image omits explicit tag |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | ghcr.io/azure-samples/aks-store-demo/product-service:latest | Container product-service: Image uses latest tag |
| azure-store | pod/store-front-698cc8c565-f5hp5 | ghcr.io/azure-samples/aks-store-demo/store-front:latest | Container store-front: Image uses latest tag |
POD008 - Automounting API Credentials Enabled in Pods Flags pods that do not explicitly disable service account token automounting.
⚠️ Total Pods with Issues: 4
Show Recommendations
🛠️ Disable Automounting API Credentials
- Add
automountServiceAccountToken: falseto the Pod'sspec. - Edit with
kubectl edit pod.-n - Verify if the application needs API access (e.g., for controllers).
- Use RBAC to limit ServiceAccount permissions if access is required.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | <nil> | Pod automounts API credentials |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | <nil> | Pod automounts API credentials |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | <nil> | Pod automounts API credentials |
| azure-store | pod/store-front-698cc8c565-f5hp5 | <nil> | Pod automounts API credentials |
PROM001 - High CPU Pods (Prometheus) Checks for pods with sustained high CPU usage over the last 24 hours using Prometheus metrics.
✅ All Pods are healthy.
Show Recommendations
🛠️ Investigate High CPU Pods
- Use
kubectl top podto see real-time CPU usage. - Review app code or HPA settings for misbehaving containers.
- Consider raising CPU requests/limits or scaling out.
- Docs: Reference
PROM002 - High Memory Usage Pods (Prometheus) Detects pods with high memory usage over the last 24 hours based on Prometheus metrics.
✅ All Pods are healthy.
Show Recommendations
🛠️ Investigate High Memory Pods
- Use
kubectl top podto review memory usage. - Adjust
resources.limits.memoryappropriately.
- Docs: Reference
PROM003 - High Network Receive Rate (Prometheus) Detects pods receiving large amounts of network traffic over the last 24 hours.
✅ All Pods are healthy.
Show Recommendations
🛠️ Investigate Network Receive Rate
- Use
kubectl top podor Prometheus UI. - Inspect service ingress patterns.
- Docs: Reference
PROM007 - Pod Sizing Insights (Prometheus) Generates per-container CPU and memory sizing recommendations from fixed 7-day p95 Prometheus usage.
✅ All Pods are healthy.
📅 Insufficient Prometheus history for sizing. Required: 7 days, available: 5.75 days.
Show Recommendations
Pod Sizing Guidance
- Set CPU and memory
requestsfrom p95 usage with safety headroom. - Default CPU
limitsrecommendation isnoneto avoid unnecessary CPU throttling and latency spikes. - Keep memory
limitsset above memory request to control OOM blast radius. - Validate against SLOs and roll out gradually.
- Docs: Reference
Show Findings
| Status | Required Days | Available Days | Message |
|---|---|---|---|
| Insufficient Prometheus history | 7 | 5.8 | Pod sizing recommendations are withheld until at least 7 days of Prometheus history is available. |
Jobs
JOB001 - Stuck Kubernetes Jobs Finds Jobs that have started but not completed within the threshold.
✅ All Jobs are healthy.
Show Recommendations
- Check pod status for the job using
kubectl describe job. - Verify resources and restart policies.
- Check logs with
kubectl logs job/.
- Docs: Reference
JOB002 - Failed Kubernetes Jobs Detects jobs with failures and no successful completions.
✅ All Jobs are healthy.
Show Recommendations
- Inspect job with
kubectl describe job. - Check logs for errors using
kubectl logs job/. - Review pod events and resource limits.
- Docs: Reference
Networking
NET001 - Services Without Endpoints Identifies services that have no backing endpoints.
⚠️ Total Networking with Issues: 4
Show Recommendations
🔍 Services Without Endpoints
- Verify that your service has a valid selector.
- Check if pods exist and are ready in the same namespace.
- Use
kubectl describe svc <name>andkubectl get endpointslices -n <namespace> -l kubernetes.io/service-name=<name>. - Restart affected pods or fix labels as needed.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | service/order-service | No endpoints or endpoint slices | |
| azure-store | service/product-service | No endpoints or endpoint slices | |
| azure-store | service/rabbitmq | No endpoints or endpoint slices | |
| azure-store | service/store-front | No endpoints or endpoint slices |
NET002 - Publicly Accessible Services Detects services of type LoadBalancer or NodePort that may be publicly exposed.
⚠️ Total Networking with Issues: 1
Show Recommendations
🌐 Secure Exposed Services
- Use internal IP ranges or private LoadBalancers where possible.
- Restrict NodePort usage or protect with firewall rules.
- Disable external exposure for internal-only services.
- Consider network policies or service mesh for access control.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | service/store-front | LoadBalancer | Exposed via external IP: 131.145.120.106 |
NET003 - Ingress Health Validation Validates ingress classes, TLS secrets, and backend service references.
✅ All Networking are healthy.
Show Recommendations
🌐 Ingress Health Remediation
- Add
spec.ingressClassNameor annotations if missing. - Validate all backend services and ports exist.
- Fix missing TLS secrets or use valid ones.
- Avoid duplicate host/path combinations.
- Use only valid pathTypes: Exact, Prefix, or ImplementationSpecific.
- Docs: Reference
NET004 - Namespace Missing Network Policy Flags namespaces that do not define any NetworkPolicy.
⚠️ Total Networking with Issues: 1
Show Recommendations
- Apply a default
deny-allNetworkPolicy for ingress and egress. - Use additional policies to allow traffic between required pods/services.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | namespace/azure-store | No NetworkPolicy in active namespace |
NET005 - Ingress Host/Path Conflicts Detects duplicate host/path combinations across ingresses in the same namespace.
✅ All Networking are healthy.
Show Recommendations
🚫 Resolve Ingress Conflicts
- Ensure that each unique host and path combination is defined in only one Ingress resource.
- Use specific hostnames instead of broad wildcards where possible to prevent unintended conflicts.
- Review your Ingress definitions for overlapping rules and consolidate or adjust as necessary.
- Test routing after making changes to confirm correct behavior.
- Docs: Reference
NET006 - Ingress Using Wildcard Hosts Detects ingress rules that use wildcard hosts.
✅ All Networking are healthy.
Show Recommendations
⭐ Review Wildcard Ingresses
- Evaluate if a wildcard host is truly necessary for the application's routing requirements.
- Where possible, replace wildcards with specific hostnames to limit unintentional exposure.
- Ensure that security policies and firewalls are in place to control access to wildcard-enabled Ingresses.
- Docs: Reference
NET007 - Service TargetPort Mismatch Detects services whose targetPort does not exist on backing pods.
✅ All Networking are healthy.
Show Recommendations
🎯 Fix Service TargetPort Mismatches
- Verify the
targetPortin your Service definition. It should either be a numerical port or a named port. - Check the
containerPortsin the Pods selected by the Service. - Ensure the `targetPort` (by number) or `name` (for named ports) in the Pod's `containerPort` matches the Service's `targetPort`.
- A common fix is to ensure consistent naming conventions or directly use port numbers.
- Docs: Reference
NET008 - ExternalName Service to Internal IP Identifies ExternalName services that point to internal IP addresses.
✅ All Networking are healthy.
Show Recommendations
🔄 Review ExternalName to Internal IP
ExternalNameservices are primarily for CNAME-like redirection to external DNS names.- If routing to an internal IP address, consider if a standard `Service` with manually created `EndpointSlice` or a `Service` with `type: ClusterIP` backed by pods is more appropriate.
- Ensure this configuration is intentional and does not bypass intended network segmentation or security policies.
- Docs: Reference
NET009 - Overly Permissive Network Policy Identifies NetworkPolicies with empty rules or broad all-IP blocks.
✅ All Networking are healthy.
Show Recommendations
🔐 Restrict Overly Permissive Network Policies
- Ensure `policyTypes` are paired with explicit `ingress` and `egress` rules that define allowed traffic.
- Avoid empty `ingress` or `egress` sections if the `policyTypes` are defined, as this defaults to allowing all traffic for that type.
- Limit the use of `ipBlock: 0.0.0.0/0`. Instead, define specific CIDR ranges for necessary external communication.
- Adopt a "deny-by-default" approach and explicitly allow only required communication.
- Docs: Reference
NET010 - Network Policy Overly Permissive IPBlock Flags NetworkPolicies that allow 0.0.0.0/0 through ipBlock rules.
✅ All Networking are healthy.
Show Recommendations
🚫 Restrict '0.0.0.0/0' in Network Policies
- Avoid using `ipBlock: 0.0.0.0/0` in NetworkPolicies unless absolutely required for specific, well-understood use cases (e.g., public internet access).
- Identify the precise CIDR ranges or specific IP addresses that need to be allowed.
- For egress, if public internet access is needed, consider egress gateways or more restrictive network policies to control outbound traffic.
- This is a critical security vulnerability if unintended.
- Docs: Reference
NET011 - Network Policy Missing PolicyTypes Detects NetworkPolicies that do not explicitly define policyTypes.
✅ All Networking are healthy.
Show Recommendations
📝 Define Network PolicyTypes
- Always explicitly define `policyTypes` in your NetworkPolicy, such as `policyTypes: [Ingress]` or `policyTypes: [Ingress, Egress]`.
- This clearly indicates whether the policy applies to inbound, outbound, or both types of traffic.
- It prevents reliance on default behaviors, which can vary or change between Kubernetes versions or CNI implementations.
- Docs: Reference
NET012 - Pod HostNetwork Usage Identifies pods configured with hostNetwork true.
✅ All Networking are healthy.
Show Recommendations
⚠️ Avoid HostNetwork Usage
- Using `hostNetwork: true` is a security risk as it grants the pod direct access to the node's network stack.
- This bypasses many Kubernetes network security features and network policies.
- Only use `hostNetwork` for specific, highly privileged use cases (e.g., CNI plugins, network observability tools) and limit access via RBAC and Pod Security Standards.
- For typical applications, rely on ClusterIP, NodePort, or LoadBalancer services for exposure.
- Docs: Reference
NET013 - Ingress Present Without Gateway API Adoption Detects clusters still using Ingress without any Gateway API resources.
✅ All Networking are healthy.
Show Recommendations
🚦 Begin Gateway API Migration
- Create or select a
GatewayClasssupported by your controller. - Define one or more
Gatewayresources for north-south traffic entry. - Migrate Ingress rules incrementally to
HTTPRouteand validate behavior. - Run both models in parallel during transition where supported.
- Docs: Reference
NET014 - HTTPRoute Missing or Unaccepted Parent Detects HTTPRoutes with missing parentRefs or no accepted parent Gateway.
✅ All Networking are healthy.
Show Recommendations
🧭 Fix HTTPRoute Parent Binding
- Set
spec.parentRefsto an existing Gateway. - Check route status conditions and Gateway listener compatibility.
- Verify namespace permissions and allowedRoutes policy.
- Docs: Reference
NET015 - Gateways Without Attached HTTPRoutes Detects Gateway resources that have no attached HTTPRoutes.
✅ All Networking are healthy.
Show Recommendations
🧹 Clean Up or Attach Routes
- Attach one or more
HTTPRouteresources to each active Gateway. - Delete unused Gateways to avoid confusion and stale entry points.
- Confirm listener and route host/path alignment.
- Docs: Reference
NET016 - Gateway API Readiness Conditions Detects Gateway resources that are not accepted or programmed.
✅ All Networking are healthy.
Show Recommendations
🚦 Validate Gateway Readiness
- Check GatewayClass and Gateway
status.conditionsforAcceptedandProgrammed. - Verify the Gateway controller deployment is healthy and watching the relevant classes.
- Fix listener, address, or controller configuration issues before cutover.
- Docs: Reference
NET017 - Gateway TLS Secret and Cross-Namespace ReferenceGrant Validation Validates Gateway certificateRefs against existing Secrets and ReferenceGrants.
✅ All Networking are healthy.
Show Recommendations
🔐 Fix Gateway TLS References
- Verify each
certificateRefpoints to an existing Secret. - For cross-namespace refs, create a
ReferenceGrantin the Secret namespace. - Re-check Gateway listener status after grant/secret updates.
- Docs: Reference
NET018 - Duplicate Service Selectors Detects multiple Services in the same namespace with identical selectors.
✅ All Networking are healthy.
Show Recommendations
🎯 Use Unique Service Selectors
- Review Services in the same namespace that select the exact same pod label set.
- Split selectors so each Service represents a distinct routing contract, or consolidate duplicate Services where appropriate.
- Update the source manifest or Helm chart so the selector change persists across releases.
- Docs: Reference
Storage
PV001 - Orphaned Persistent Volumes Detects PersistentVolumes that are not bound to any PVC.
✅ All Storage are healthy.
Show Recommendations
🗑️ Clean Up Orphaned PVs
- Audit: Verify the PV is truly unneeded using
kubectl describe pv <name>. - Delete: Remove unneeded PVs with
kubectl delete pv <name>. - Caution: Ensure no future PVC will bind to it before deletion.
- Docs: Reference
PVC001 - Unused Persistent Volume Claims Detects PVCs not attached to any pod.
✅ All Storage are healthy.
Show Recommendations
💾 Clean Up Unused PVCs
- Audit: Confirm PVC is not needed using
kubectl describe pvc -n. - Delete: Remove PVCs no longer required with
kubectl delete pvc. - Prevent: Automate cleanup for stale environments or ephemeral workloads.
- Docs: Reference
PVC002 - PVCs Using Default StorageClass Detects PVCs that do not explicitly specify storageClassName.
✅ All Storage are healthy.
Show Recommendations
✍️ Specify StorageClass for PVCs
- Edit: Add
storageClassName: <your-storage-class-name>to the PVC spec. - Consistency: Ensure consistent storage provisioning across environments.
- Awareness: Understand which StorageClass is truly being used.
- Docs: Reference
PVC003 - ReadWriteMany PVCs on Incompatible Storage Detects ReadWriteMany PVCs backed by likely block-storage provisioners.
✅ All Storage are healthy.
Show Recommendations
⚠️ Review ReadWriteMany PVCs
- Verify: Confirm if the storage backend truly supports concurrent writes.
- Adjust: If not, change PVC access mode to
ReadWriteOnce. - Migrate: For shared data, use appropriate shared file storage solutions.
- Docs: Reference
PVC004 - Unbound Persistent Volume Claims Detects PersistentVolumeClaims that remain Pending.
✅ All Storage are healthy.
Show Recommendations
🚫 Troubleshoot Unbound PVCs
- Describe PVC: Use
kubectl describe pvc <name> -n <namespace>to see events and reasons for Pending. - Check StorageClass: Ensure the specified StorageClass exists and is correctly configured.
- Review Provisioner: Verify the storage provisioner is running and healthy.
- Docs: Reference
SC001 - Deprecated StorageClass Provisioners Detects StorageClasses still using in-tree provisioners.
✅ All Storage are healthy.
Show Recommendations
🔄 Migrate Deprecated StorageClasses
- Identify: Pinpoint PVCs using the deprecated StorageClass.
- Create: Define a new StorageClass with the appropriate CSI driver.
- Migrate: Follow the migration path for your specific storage provider to move data.
- Docs: Reference
SC002 - AKS Azure In-Tree Storage Provisioners Detects Azure in-tree storage provisioners that are not AKS Automatic compatible.
⚠️ Total Storage with Issues: 1
Show Recommendations
🔄 Migrate Azure StorageClasses to CSI
- Create replacement StorageClasses that use
disk.csi.azure.comorfile.csi.azure.com. - Move PVCs and workloads off the in-tree StorageClass before migrating to AKS Automatic.
- Validate reclaim policies, SKU, and mount options during the migration.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|
SC003 - High Cluster Storage Usage (Prometheus) Monitors overall used storage across the cluster.
✅ All Storage are healthy.
Show Recommendations
📊 Manage Storage Consumption
- Identify: Use monitoring tools to find namespaces/pods consuming the most storage.
- Clean Up: Delete old data, snapshots, or unused PVCs/PVs.
- Scale: Plan for increasing storage capacity or optimizing storage allocation.
- Docs: Reference
SC004 - StorageClass Prevents Volume Expansion Identifies StorageClasses that do not permit volume expansion, which can limit dynamic scaling of stateful applications.
⚠️ Total Storage with Issues: 1
Show Recommendations
📈 Enable Volume Expansion
- Assess: Determine if your applications need dynamic volume resizing.
- Configure: Add or set
allowVolumeExpansion: truein the StorageClass definition. - Backend Check: Ensure your storage backend supports online volume expansion.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| (cluster) | storageclass/default | true | StorageClass does not allow volume expansion. |
Configuration Hygiene
CFG001 - Orphaned ConfigMaps Detects ConfigMaps that are not referenced by workloads or related resources.
✅ All Configuration Hygiene are healthy.
Show Recommendations
🛠️ Clean Up Orphaned ConfigMaps
- Verify: Check usage (
kubectl describe cm). - Delete:
kubectl delete cmif unused. - Automation: Schedule periodic scans.
- Docs: Reference
CFG002 - Duplicate ConfigMap Names Detects ConfigMaps with identical names across multiple namespaces.
⚠️ Total Configuration Hygiene with Issues: 2
Show Recommendations
🛠️ Fix Duplicate ConfigMap Names
- Standardize: Use unique names or a naming convention that includes the environment or team name.
- Audit: Periodically review ConfigMaps across namespaces for duplication.
- Automation: Use policies or linting tools to catch duplicates pre-deploy.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| - | configmap/kube-root-ca.crt | - | Found in namespaces: azure-store, default |
| - | configmap/kube-root-ca.crt | - | Found in namespaces: azure-store, default |
CFG003 - Large ConfigMaps Finds ConfigMaps larger than 1 MiB.
✅ All Configuration Hygiene are healthy.
Show Recommendations
🛠️ Reduce ConfigMap Size
- Refactor: Move large files or data to PersistentVolumes.
- Split: Break up oversized ConfigMaps into smaller ones by function.
- Review: Check for secrets or binary blobs mistakenly stored in ConfigMaps.
- Docs: Reference
PROM004 - API Server High Latency (Prometheus) Detects high latency in Kubernetes API server requests over the last 24 hours.
✅ All Configuration are healthy.
Show Recommendations
🛠️ Investigate API Server Latency
- Check
kube-apiserverlogs. - Review
etcdperformance.
- Docs: Reference
Security
RBAC001 - RBAC Misconfigurations Detects invalid roleRefs, missing roles, orphaned service accounts, and incorrect subject namespaces.
✅ All Security are healthy.
Show Recommendations
🔐 RBAC Misconfiguration Fixes
- Don't leave roleRef blank in bindings.
- Use valid Roles/ClusterRoles that exist in the correct namespace.
- Verify ServiceAccounts exist in the namespace specified.
- Remove or correct subjects pointing to non-existent namespaces.
- Docs: Reference
RBAC002 - RBAC Overexposure Identifies dangerous RBAC grants such as cluster-admin and wildcard permissions.
⚠️ Total Security with Issues: 12
Show Recommendations
🔐 RBAC Hardening Tips
- Avoid using
cluster-admindirectly in bindings. - Don’t assign Roles or ClusterRoles with wildcard verbs/resources/apiGroups.
- Restrict access to sensitive resources like
secretsorpods/exec. - Minimize privileges for default ServiceAccounts.
- Document use of any built-in roles used in production.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| 🌍 Cluster-Wide | clusterrolebinding/aks-cluster-admin-binding | User/clusterAdmin | cluster-admin binding (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/aks-cluster-admin-binding | User/clusterUser | cluster-admin binding (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/aks-cluster-admin-binding-aad | Group/c30f2960-28f8-49cc-9308-c1e741824c4f | cluster-admin binding (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/aks-secretprovidersyncing-rolebinding | ServiceAccount/aks-secrets-store-csi-driver | Access to sensitive resources |
| 🌍 Cluster-Wide | clusterrolebinding/aks-service-rolebinding | User/aks-support | Access to sensitive resources |
| 🌍 Cluster-Wide | clusterrolebinding/ama-metrics-clusterrolebinding | ServiceAccount/ama-metrics-serviceaccount | Access to sensitive resources |
| 🌍 Cluster-Wide | clusterrolebinding/cluster-admin | Group/system:masters | cluster-admin binding (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/system:controller:clusterrole-aggregation-controller | ServiceAccount/clusterrole-aggregation-controller | Access to sensitive resources (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/system:controller:legacy-service-account-token-cleaner | ServiceAccount/legacy-service-account-token-cleaner | Access to sensitive resources (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/system:kube-controller-manager | User/system:kube-controller-manager | Access to sensitive resources (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/system:kube-scheduler | User/system:kube-scheduler | Access to sensitive resources (built-in) |
| 🌍 Cluster-Wide | clusterrolebinding/system:persistent-volume-binding | ServiceAccount/persistent-volume-binder | Access to sensitive resources (built-in) |
RBAC003 - Orphaned ServiceAccounts Finds ServiceAccounts not used by pods or RBAC bindings.
⚠️ Total Security with Issues: 1
Show Recommendations
🧾 Remove Orphaned ServiceAccounts
- Audit ServiceAccounts not referenced in RoleBindings, ClusterRoleBindings, or used by Pods.
- Delete those not actively used to reduce attack surface.
- Consider automating SA cleanup with CI/CD or policy enforcement.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| default | serviceaccount/default | default | ServiceAccount not used by pods or RBAC bindings |
RBAC004 - Orphaned and Ineffective Roles Flags roles and clusterroles that are unused or define no rules.
⚠️ Total Security with Issues: 3
Show Recommendations
🗂️ Clean up Unused or Ineffective RBAC
- Remove RoleBindings or ClusterRoleBindings without subjects.
- Prune Roles and ClusterRoles not referenced by any bindings.
- Remove roles with no defined rules unless planned for future use.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| cluster-wide | clusterrolebinding/system:node | system:node | ClusterRoleBinding has no subjects |
| cluster-wide | clusterrole/aks-secretproviderclasses-admin-role | aks-secretproviderclasses-admin-role | Unused ClusterRole |
| cluster-wide | clusterrole/aks-secretproviderclasses-viewer-role | aks-secretproviderclasses-viewer-role | Unused ClusterRole |
SEC001 - Orphaned Secrets Detects Secrets not used by workloads or related resources.
✅ All Security are healthy.
Show Recommendations
🔐 Orphaned Secrets Cleanup
- Remove Secrets not referenced in Pods, Deployments, StatefulSets, or Ingresses.
- Audit Secret content before deletion to avoid removing active credentials.
- Validate Custom Resources don’t indirectly depend on these Secrets.
- Regularly prune Secrets as part of security hygiene.
- Docs: Reference
SEC002 - Pods using hostPID or hostNetwork Flags pods that share the host PID or network namespace, which can compromise isolation and node security.
✅ All Security are healthy.
Show Recommendations
Avoid Host-Level Sharing
- Set
hostPID: falseandhostNetwork: falseunless needed for special workloads. - Review security implications of namespace sharing with the host.
- Restrict use of these settings to trusted namespaces and workloads.
- Consider using PSPs or OPA/Gatekeeper policies to prevent usage cluster-wide.
- Docs: Reference
SEC003 - Pods Running as Root Detects pods running with UID 0 or no explicit runAsUser setting, which defaults to root in many images.
⚠️ Total Security with Issues: 12
Show Recommendations
RunAsUser Hardening
- Set
runAsUserto a non-zero UID at pod or container level. - Avoid relying on container defaults and define securityContext explicitly.
- Validate any custom base images that may default to root.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | Not Set (Defaults to root) | Container order-service runs as root or has no runAsUser set |
| azure-store | pod/order-service-65cc8855c-ghk9m | Not Set (Defaults to root) | Container wait-for-rabbitmq runs as root or has no runAsUser set |
| azure-store | pod/order-service-65cc8855c-ghk9m | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Not Set (Defaults to root) | Container product-service runs as root or has no runAsUser set |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Not Set (Defaults to root) | Container rabbitmq runs as root or has no runAsUser set |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Not Set (Defaults to root) | Container store-front runs as root or has no runAsUser set |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Not Set (Defaults to root) | Container runs as root or has no runAsUser set |
SEC004 - Privileged Containers Detects containers running with privileged mode enabled.
✅ All Security are healthy.
Show Recommendations
Disable Privileged Containers
- Remove
securityContext.privileged: truefrom container specs. - Refactor workloads to avoid needing host-level access.
- Enforce restrictions using Pod Security Policies or OPA/Gatekeeper.
- Limit use to dedicated namespaces with strict controls.
- Docs: Reference
SEC005 - Pods Using hostIPC Detects pods that enable hostIPC.
✅ All Security are healthy.
Show Recommendations
🔒 Disable hostIPC for Pods
- Remove
hostIPC: truefrom pod specs. - Review workloads that require inter-process communication with the host.
- Use shared memory only through secure, scoped means.
- Docs: Reference
SEC006 - Pods Missing Secure Defaults Checks if pods are missing recommended securityContext fields such as runAsNonRoot, readOnlyRootFilesystem, or allowPrivilegeEscalation.
⚠️ Total Security with Issues: 4
Show Recommendations
- Set
securityContext.runAsNonRoot: true - Set
securityContext.readOnlyRootFilesystem: true - Set
securityContext.allowPrivilegeEscalation: false
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | Missing securityContext | Container order-service has no securityContext defined |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Missing securityContext | Container product-service has no securityContext defined |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Missing securityContext | Container rabbitmq has no securityContext defined |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Missing securityContext | Container store-front has no securityContext defined |
SEC007 - Missing Pod Security Admission Labels Flags namespaces missing pod security admission enforce labels.
⚠️ Total Security with Issues: 2
Show Recommendations
- Set
pod-security.kubernetes.io/enforce=restrictedon sensitive namespaces. - Optionally use
enforce-versionandauditlabels.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | namespace/azure-store | No pod security labels | |
| default | namespace/default | No pod security labels |
SEC008 - Secrets in Environment Variables Detects secrets exposed through environment variables.
✅ All Security are healthy.
Show Recommendations
- Use secret volumes instead of env vars to reduce accidental exposure.
- Avoid using
valueFrom.secretKeyRefinenv. - Limit permissions to read secrets.
- Docs: Reference
SEC009 - Missing Capabilities Drop Checks containers that do not drop all Linux capabilities via securityContext.capabilities.drop = ['ALL'].
⚠️ Total Security with Issues: 4
Show Recommendations
- Set
securityContext.capabilities.drop: ['ALL']in container specs. - Allow only required capabilities via
addlist, if any.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | Container order-service does not drop ALL capabilities | |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Container product-service does not drop ALL capabilities | |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Container rabbitmq does not drop ALL capabilities | |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Container store-front does not drop ALL capabilities |
SEC010 - HostPath Volume Usage Flags pods that use hostPath volumes, which mount parts of the host filesystem and bypass isolation.
✅ All Security are healthy.
Show Recommendations
- Remove hostPath volumes unless needed for host-level access.
- Consider alternatives like persistent volume claims or configMaps.
- Docs: Reference
SEC011 - Containers Running as UID 0 Flags containers explicitly configured to run as UID 0.
⚠️ Total Security with Issues: 4
Show Recommendations
- Set runAsUser to a non-root user ID.
- Use runAsNonRoot: true for validation.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | 0 | Container order-service runs as UID 0 |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | 0 | Container product-service runs as UID 0 |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | 0 | Container rabbitmq runs as UID 0 |
| azure-store | pod/store-front-698cc8c565-f5hp5 | 0 | Container store-front runs as UID 0 |
SEC012 - Added Linux Capabilities Flags containers that add extra Linux capabilities using securityContext.capabilities.add.
✅ All Security are healthy.
Show Recommendations
- Review and remove unnecessary capabilities.
- Default to dropping all, then selectively add only what is needed.
- Docs: Reference
SEC013 - EmptyDir Volume Usage EmptyDir volumes are ephemeral and cleared on pod restart. Use only if data persistence is not needed.
✅ All Security are healthy.
Show Recommendations
- Audit use of EmptyDir volumes in production workloads.
- Replace with PVCs or other managed storage if persistence is needed.
- Docs: Reference
SEC014 - Untrusted Image Registries Flags images that do not come from trusted registries.
⚠️ Total Security with Issues: 3
Show Recommendations
- Use approved internal or vendor-verified registries.
- Restrict image pull policies using Gatekeeper or admission plugins.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | ghcr.io/azure-samples/aks-store-demo/order-service:latest | Image from untrusted registry in container order-service |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | ghcr.io/azure-samples/aks-store-demo/product-service:latest | Image from untrusted registry in container product-service |
| azure-store | pod/store-front-698cc8c565-f5hp5 | ghcr.io/azure-samples/aks-store-demo/store-front:latest | Image from untrusted registry in container store-front |
SEC015 - Pods Using Default ServiceAccount Flags pods using the default service account, which may have broad permissions.
⚠️ Total Security with Issues: 4
Show Recommendations
- Create and bind a custom ServiceAccount per application.
- Avoid using the
defaultServiceAccount unless absolutely necessary.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | default | Pod uses default ServiceAccount |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | default | Pod uses default ServiceAccount |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | default | Pod uses default ServiceAccount |
| azure-store | pod/store-front-698cc8c565-f5hp5 | default | Pod uses default ServiceAccount |
SEC016 - Unconfined Seccomp Profiles Detects pods or containers explicitly using the Unconfined seccomp profile.
✅ All Security are healthy.
Show Recommendations
- Use
RuntimeDefaultor a vettedLocalhostseccomp profile. - Remove any pod- or container-level
Unconfinedseccomp setting. - Make the seccomp profile explicit in the workload spec so the policy is reviewable.
- Docs: Reference
SEC017 - Non-Default ProcMount Flags containers that set procMount to a non-default value.
✅ All Security are healthy.
Show Recommendations
- Set
securityContext.procMount: Defaultor omit the field. - Review debugging and observability agents that rely on custom proc mounts.
- Docs: Reference
SEC018 - Automounting API Credentials Enabled in ServiceAccounts Flags ServiceAccounts where automounting of API credentials is enabled, affecting associated Pods.
✅ All Security are healthy.
Show Recommendations
Disable Automounting in ServiceAccounts
- Add
automountServiceAccountToken: falseto the ServiceAccount spec. - Edit with
kubectl edit serviceaccount <sa-name> -n <namespace>. - Ensure Pods needing API access override this in their spec with
automountServiceAccountToken: true. - Use RBAC to limit ServiceAccount permissions if access is required.
- Docs: Reference
SEC019 - Unsupported AppArmor Values Detects AppArmor annotations or profile types that are not permitted by baseline Pod Security Standards.
✅ All Security are healthy.
Show Recommendations
- Allowed values are
runtime/defaultorlocalhost/*in annotations, andRuntimeDefaultorLocalhostfor structured profiles. - Remove legacy or custom profile names that AKS Automatic baseline policy would reject.
- Docs: Reference
SEC020 - Seccomp Profile Not Configured Detects pods and containers that do not explicitly configure a seccomp profile.
⚠️ Total Security with Issues: 5
Show Recommendations
- Set
securityContext.seccompProfile.type: RuntimeDefaultfor the pod or each container. - If you need a custom profile, use
Localhostand ensure the profile exists on the node. - Doing this avoids AKS Automatic seccomp warnings and makes the security posture explicit.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | pod/order-service-65cc8855c-ghk9m | Container order-service has no explicit seccomp profile | |
| azure-store | pod/order-service-65cc8855c-ghk9m | Container wait-for-rabbitmq has no explicit seccomp profile | |
| azure-store | pod/product-service-77ff9f6fd6-rzcxj | Container product-service has no explicit seccomp profile | |
| azure-store | pod/rabbitmq-5dcdf9484-kvgw7 | Container rabbitmq has no explicit seccomp profile | |
| azure-store | pod/store-front-698cc8c565-f5hp5 | Container store-front has no explicit seccomp profile |
SEC021 - Host Ports in Pod Specs Detects containers that bind host ports directly on the node.
✅ All Security are healthy.
Show Recommendations
- Remove
hostPortfrom container port definitions. - Use a Service or Ingress for north-south access.
- Reserve host networking only for platform workloads that truly require it.
- Docs: Reference
SEC022 - Non-Existent Secret References Flags pods referencing Secrets that do not exist. This may cause runtime failures.
✅ All Security are healthy.
Show Recommendations
- Check envFrom, secretKeyRef, and volume.secret.secretName references.
- Create missing Secrets or remove invalid references.
- Docs: Reference
SEC023 - Disallowed Sysctls Detects sysctls outside the Kubernetes baseline Pod Security Standards allowlist.
✅ All Security are healthy.
Show Recommendations
- Keep only baseline-allowed sysctls such as safe
net.ipv4.ip_local_port_rangeorkernel.shm_rmid_forced. - Move node-level kernel tuning into node or image configuration where possible.
- Docs: Reference
Kubernetes Warning Events
EVENT001 - Grouped Warning Events Groups recent Warning events by Reason and Message.
⚠️ Total Events with Issues: 1
Show Recommendations
- Group similar warnings to spot patterns.
- Use
kubectl describeand logs to investigate.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| (cluster) | event-group/FailedScheduling | 4 | 0/3 nodes are available: 3 node(s) had untolerated taint {CriticalAddonsOnly: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling. |
EVENT002 - Full Warning Event Log Lists all recent Warning events in the cluster.
⚠️ Total Events with Issues: 4
Show Recommendations
- Use
kubectl describeto get full context. - Check logs for root cause.
- Docs: Reference
Show Findings
| Namespace | Resource | Value | Message |
|---|---|---|---|
| azure-store | events/order-service-65cc8855c-ghk9m.18a67e5d2136c2d5 | Warning | Warning events found in recent Kubernetes logs |
| azure-store | events/product-service-77ff9f6fd6-rzcxj.18a67e5d208fd3fe | Warning | Warning events found in recent Kubernetes logs |
| azure-store | events/rabbitmq-5dcdf9484-kvgw7.18a67e5d26f82c69 | Warning | Warning events found in recent Kubernetes logs |
| azure-store | events/store-front-698cc8c565-f5hp5.18a67e5d294b6844 | Warning | Warning events found in recent Kubernetes logs |
AKS Best Practices Results
✅ Passed: 30
❌ Failed: 13
📊 Total Checks: 43
🎯 Score: 69.77%
⭐ Rating: D
Show Best Practices (7/15 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSBP001 | Allowed Container Images Policy Enforcement | High | Best Practices | ❌ FAIL | false | Container image restriction policies are not enforced, allowing deployment of images from any registry including public registries, untrusted sources, or images with known vulnerabilities. This significantly increases supply chain attack risks and compliance violations. | Deploy the Azure Policy initiative 'Kubernetes cluster pod security restricted standards' and configure specific allowed container registries. Use az policy assignment create to assign the policy and set enforcement to 'deny' mode for production environments. | Learn More |
| AKSBP002 | No Privileged Containers Policy Enforcement | High | Best Practices | ❌ FAIL | false | Privileged container policies are not enforced, allowing workloads to run with full root privileges, access host devices, mount host file systems, and potentially escape container boundaries. This creates severe security risks and violates least-privilege principles. | Enable the 'Do not allow privileged containers' Azure Policy definition in enforce mode. Use Pod Security Standards with 'restricted' profile to block privileged containers and ensure security baseline compliance. | Learn More |
| AKSBP003 | Multiple Node Pools | Medium | Best Practices | ❌ FAIL | false | Single node pool configuration limits workload isolation, scaling flexibility, and security boundaries. All workloads share the same VM size, OS configuration, and scaling parameters, making it impossible to optimize for different application requirements or implement proper security zones. | Create separate node pools for different workload types using az aks nodepool add --resource-group <rg> --cluster-name <cluster> --name <pool-name>.Use system pools for system pods, user pools for applications, and specialized pools (GPU, memory-optimized) for specific workloads. | Learn More |
| AKSBP008 | Auto Upgrade Channel Configured | Medium | Best Practices | ❌ FAIL | false | Automatic cluster upgrades are disabled, leaving the cluster vulnerable to security patches, bug fixes, and Kubernetes version support expiration. Manual upgrade management increases operational overhead and delays critical security updates. | Configure auto upgrade using az aks update --resource-group <rg> --name <cluster> --auto-upgrade-channel patch for security patches or 'stable' for minor version updates.Use maintenance windows to control upgrade timing and minimize disruption. | Learn More |
| AKSBP009 | Node OS Upgrade Channel Configured | Medium | Best Practices | ❌ FAIL | false | Node OS automatic updates are disabled, leaving nodes running outdated OS versions with potential security vulnerabilities, missing security patches, and outdated system libraries. This increases the attack surface and compliance risks. | Enable node OS upgrade using az aks update --resource-group <rg> --name <cluster> --node-os-upgrade-channel NodeImage for automatic OS updates.Use 'SecurityPatch' for security-only updates or configure maintenance windows for controlled updates. | Learn More |
| AKSBP014 | Use v5 or Newer SKU VMs for Node Pools | Medium | Best Practices | ❌ FAIL | 3 | Node pools are using older VM generations (v4 or earlier) that have reduced performance, lack modern security features, don't support ephemeral OS disks by default, and may experience more frequent maintenance events affecting availability and reliability. | Upgrade to v5 or newer VM SKUs using az aks nodepool add --vm-size Standard_D2s_v5 for new node pools.v5 SKUs provide better performance, support ephemeral OS disks by default, and have improved reliability during maintenance events and upgrades. | Learn More |
| AKSBP015 | Deployment Safeguards Enabled | Medium | Best Practices | ❌ FAIL | false | Deployment Safeguards are disabled, allowing non-compliant workloads to be deployed without validation of Kubernetes best practices. This leads to deployments without resource requests/limits, missing health probes, no anti-affinity rules, and other configuration issues that impact reliability and cost. | Enable Deployment Safeguards using az aks update --resource-group <rg> --name <cluster> --safeguards-level Warning for alerting or 'Enforcement' to block non-compliant deployments.This enforces best practices including resource requests, readiness/liveness probes, pod anti-affinity, and Pod Security Standards. | Learn More |
| AKSBP004 | Azure Linux as Host OS | High | Best Practices | ✅ PASS | No issues detected. | Migrate to Azure Linux by creating new node pools with az aks nodepool add --os-sku AzureLinux, then migrate workloads and delete old pools.Note: In-place OS SKU changes are not supported, requiring node pool replacement. | Learn More | |
| AKSBP005 | Ephemeral OS Disks Enabled | Medium | Best Practices | ✅ PASS | No issues detected. | Enable ephemeral OS disks using az aks nodepool add --os-disk-type Ephemeral for new pools or plan node pool replacement.This provides faster disk I/O, lower latency, and reduced costs by using local VM storage instead of managed disks. | Learn More | |
| AKSBP006 | Non-Ephemeral Disks with Adequate Size | Medium | Best Practices | ✅ PASS | No issues detected. | Increase OS disk size using az aks nodepool update --resource-group <rg> --cluster-name <cluster> --name <nodepool> --os-disk-size-gb 128 or higher.Larger disks provide better IOPS performance and accommodate container image layers and temporary storage needs. | Learn More | |
| AKSBP007 | System Node Pool Taint | High | Best Practices | ✅ PASS | No issues detected. | Apply system node pool taint using az aks nodepool update --resource-group <rg> --cluster-name <cluster> --name <system-pool> --node-taints CriticalAddonsOnly=true:NoSchedule.This ensures only critical system pods run on system nodes, improving reliability and resource isolation. | Learn More | |
| AKSBP010 | Customized MC_ Resource Group Name | Medium | Best Practices | ✅ PASS | No issues detected. | Use a custom node resource group name during cluster creation with az aks create --node-resource-group <custom-name>.This cannot be changed after cluster creation, so plan accordingly for better resource organization and management. | Learn More | |
| AKSBP011 | System Node Pool Has Minimum Two Nodes | High | Best Practices | ✅ PASS | No issues detected. | Scale system node pool to at least 2 nodes using az aks nodepool scale --resource-group <rg> --cluster-name <cluster> --name <system-pool> --node-count 2.Configure cluster autoscaler with --min-count 2 to ensure resiliency against node failures and maintenance events. | Learn More | |
| AKSBP012 | Node Pool Version Matches Control Plane | Medium | Best Practices | ✅ PASS | No issues detected. | Upgrade node pools to match control plane version using az aks nodepool upgrade --resource-group <rg> --cluster-name <cluster> --name <nodepool> --kubernetes-version <version>.Plan coordinated upgrades to maintain version consistency and avoid compatibility issues. | Learn More | |
| AKSBP013 | No B-Series VMs in Node Pools | High | Best Practices | ✅ PASS | No issues detected. | Replace B-series VMs with consistent performance SKUs like Standard_D2s_v5 or Standard_E2s_v5. Create new node pool with az aks nodepool add --vm-size Standard_D2s_v5, migrate workloads using kubectl drain, then delete old pool with az aks nodepool delete. | Learn More |
Show Disaster Recovery (0/2 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSDR001 | Agent Pools with Availability Zones | High | Disaster Recovery | ✅ PASS | No issues detected. | Deploy node pools across availability zones using az aks nodepool add --availability-zones 1 2 3 --resource-group <rg> --cluster-name <cluster> --name <pool>.Ensure at least 3 zones are used for production workloads to achieve 99.95% SLA and protect against datacenter failures. | Learn More | |
| AKSDR002 | Control Plane SLA | Medium | Disaster Recovery | ✅ PASS | No issues detected. | Upgrade to Standard tier using az aks update --resource-group <rg> --name <cluster> --tier Standard to get 99.95% uptime SLA, financially-backed availability guarantees, and improved support.This is essential for production workloads requiring high availability. | Learn More |
Show Identity & Access (0/7 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSIAM001 | RBAC Enabled | High | Identity & Access | ✅ PASS | No issues detected. | Enable RBAC during cluster creation using --enable-rbac or for existing clusters via Azure Portal.Create RoleBindings and ClusterRoleBindings to assign appropriate permissions to users and service accounts based on the principle of least privilege. | Learn More | |
| AKSIAM002 | Managed Identity | High | Identity & Access | ✅ PASS | No issues detected. | Create a user-assigned managed identity using az identity create --resource-group <rg> --name <identity-name> and associate it during cluster creation with --assign-identity <identity-resource-id>.This eliminates the need to manage service principal credentials and provides better security. | Learn More | |
| AKSIAM003 | Workload Identity Enabled | Medium | Identity & Access | ✅ PASS | No issues detected. | Enable Workload Identity using az aks update --resource-group <rg> --name <cluster> --enable-workload-identity (requires OIDC issuer).Create Kubernetes service accounts and federate them with Azure managed identities for secure, token-based authentication to Azure services. | Learn More | |
| AKSIAM004 | Managed Identity Used | High | Identity & Access | ✅ PASS | No issues detected. | Migrate from Service Principal to User-Assigned Managed Identity using az aks update --resource-group <rg> --name <cluster> --assign-identity <identity-resource-id>.This provides automatic credential rotation and eliminates the need to manage client secrets. | Learn More | |
| AKSIAM005 | AAD RBAC Authorization Integrated | High | Identity & Access | ✅ PASS | No issues detected. | Enable Azure RBAC for Kubernetes authorization using az aks update --resource-group <rg> --name <cluster> --enable-azure-rbac.Assign built-in roles like 'Azure Kubernetes Service RBAC Reader/Writer/Admin' to users and groups for centralized access management through Azure AD. | Learn More | |
| AKSIAM006 | AAD Managed Authentication Enabled | High | Identity & Access | ✅ PASS | No issues detected. | Enable Azure AD integration during cluster creation with --enable-aad --aad-admin-group-object-ids <group-id> or update existing cluster using az aks update --resource-group <rg> --name <cluster> --enable-aad.Configure admin groups and integrate with conditional access policies. | Learn More | |
| AKSIAM007 | Local Accounts Disabled | High | Identity & Access | ✅ PASS | No issues detected. | Disable local accounts using az aks update --resource-group <rg> --name <cluster> --disable-local-accounts.This enforces authentication exclusively through Azure AD, eliminating certificate-based admin access and improving audit capabilities. | Learn More |
Show Monitoring & Logging (0/2 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSMON001 | Azure Monitor | High | Monitoring & Logging | ✅ PASS | No issues detected. | Enable Azure Monitor Container Insights using az aks enable-addons --resource-group <rg> --name <cluster> --addons monitoring --workspace-resource-id <workspace-id> or through Azure Portal > Monitoring > Insights.Configure log retention (90+ days) and set up alerts for container failures and resource usage. | Learn More | |
| AKSMON002 | Managed Prometheus Enabled | High | Monitoring & Logging | ✅ PASS | No issues detected. | Enable managed Prometheus using az aks update --resource-group <rg> --name <cluster> --enable-azure-monitor-metrics or via Azure Portal > Monitoring > Insights.Consider integrating with Azure Managed Grafana for advanced dashboards and setting up alerting rules for critical metrics. | Learn More |
Show Networking (2/4 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSNET001 | Authorized IP Ranges Configured (Public Clusters) | High | Networking | ❌ FAIL | false | API server accepts connections from any internet IP address, creating a large attack surface for brute force attacks, credential stuffing, and vulnerability exploitation. This violates network security best practices and most compliance frameworks. | Configure authorized IP ranges using az aks update --resource-group <rg> --name <cluster> --api-server-authorized-ip-ranges <ip-ranges>.Include management networks, CI/CD systems, and jump boxes using CIDR notation. Alternatively, migrate to a private cluster for enhanced security. | Learn More |
| AKSNET003 | Web App Routing Enabled | Low | Networking | ❌ FAIL | false | Web App Routing add-on is disabled, requiring manual ingress controller management, DNS configuration, and SSL certificate handling. This increases operational overhead and may lead to inconsistent external access patterns and security configurations. | Enable Web App Routing using az aks enable-addons --resource-group <rg> --name <cluster> --addons web_application_routing.Configure DNS zones and SSL certificates for automatic ingress management. Consider using Application Gateway Ingress Controller (AGIC) for enterprise scenarios. | Learn More |
| AKSNET002 | Network Policy Check | Medium | Networking | ✅ PASS | No issues detected. | Enable network policy during cluster creation with --network-policy azure (Azure CNI) or --network-policy calico (kubenet).Create NetworkPolicy resources to define ingress/egress rules for pods, implementing micro-segmentation and zero-trust networking principles. | Learn More | |
| AKSNET004 | Azure CNI with Cilium Dataplane Recommended | Medium | Networking | ✅ PASS | No issues detected. | For new clusters, use --network-plugin azure --network-dataplane cilium --network-plugin-mode overlay for optimal performance.Azure CNI powered by Cilium provides eBPF-based packet processing, better scalability, and advanced L3-L7 network policies. Existing clusters should migrate by creating a new cluster with Cilium enabled. | Learn More |
Show Resource Management (1/5 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSRES002 | AKS Built-in Cost Tooling Enabled | Medium | Resource Management | ❌ FAIL | false | Cost analysis and OpenCost integration is disabled, providing no visibility into per-namespace, per-workload, or per-application spending. This makes it impossible to implement cost allocation, identify expensive workloads, optimize resource usage, or implement chargeback policies for different teams. | Enable cost analysis using az aks update --resource-group <rg> --name <cluster> --enable-cost-analysis to track namespace and workload-level costs.Use the cost insights to identify expensive workloads, optimize resource requests, and implement chargeback/showback policies. | Learn More |
| AKSRES001 | Cluster Autoscaler | Medium | Resource Management | ✅ PASS | No issues detected. | Enable Cluster Autoscaler using az aks update --resource-group <rg> --name <cluster> --enable-cluster-autoscaler --min-count <min> --max-count <max> on node pools.Configure appropriate min/max node counts, scale-down parameters, and node pool priorities for optimal cost and performance balance. | Learn More | |
| AKSRES003 | Vertical Pod Autoscaler (VPA) is enabled | Medium | Resource Management | ✅ PASS | No issues detected. | Enable VPA using az aks update --resource-group <rg> --name <cluster> --enable-vpa.Deploy VPA objects with 'updateMode: Auto' or 'Off' for recommendations only. Monitor VPA recommendations and adjust application resource requests/limits accordingly for better resource efficiency. | Learn More | |
| AKSRES004 | KEDA (Event-Driven Autoscaling) Enabled | Low | Resource Management | ✅ PASS | No issues detected. | Enable KEDA using az aks update --resource-group <rg> --name <cluster> --enable-keda.Deploy ScaledObject resources to define event sources (Azure Queue, Service Bus, Kafka, HTTP, etc.) and scaling behavior. KEDA complements HPA by enabling scale-to-zero and event-driven scaling patterns. | Learn More | |
| AKSRES005 | Node Auto-provisioning or Cluster Autoscaler Configured | High | Resource Management | ✅ PASS | No issues detected. | Enable Node Auto-provisioning using az aks update --resource-group <rg> --name <cluster> --node-provisioning-mode Auto for Karpenter-based dynamic provisioning.Alternatively, enable Cluster Autoscaler with az aks update --enable-cluster-autoscaler.NAP is recommended for modern workloads with diverse resource requirements. | Learn More |
Show Security (3/8 failed)
| ID | Check | Severity | Category | Status | Observed Value | Fail Message | Recommendation | URL |
|---|---|---|---|---|---|---|---|---|
| AKSSEC001 | Private Cluster | High | Security | ❌ FAIL | false | API server is publicly accessible from the internet, exposing your cluster to potential attacks, unauthorized access attempts, and compliance violations. This creates a significant security risk as attackers can attempt to exploit Kubernetes API vulnerabilities. | Configure as a private cluster using az aks create --enable-private-cluster or az aks update --enable-private-cluster for existing clusters.This routes API server traffic through private endpoints within your VNet. Configure private DNS zones and ensure network connectivity from management machines. | Learn More |
| AKSSEC006 | Image Cleaner Enabled | Medium | Security | ❌ FAIL | false | Image Cleaner is disabled, allowing stale and potentially vulnerable container images to accumulate on node disks. This increases storage costs, extends attack surface with outdated images containing known CVEs, and can impact node performance due to disk space consumption. | Enable Image Cleaner using az aks update --resource-group <rg> --name <cluster> --enable-image-cleaner.Configure cleaning interval and retention policies to automatically remove unused container images and reduce attack surface. | Learn More |
| AKSSEC008 | Pod Security Admission Enabled | High | Security | ❌ FAIL | false | Pod Security Admission is not configured on this cluster, meaning there are no built-in Kubernetes security controls to prevent insecure pod configurations. Without PSA, pods can run with dangerous settings like privileged mode, host network access, or unsafe capabilities, increasing container escape risks. | Configure Pod Security Admission by setting pod security standards on namespaces. Use kubectl label namespace <namespace> pod-security.kubernetes.io/enforce=restricted pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/warn=restricted for production namespaces.Consider 'baseline' for less restrictive environments. This is separate from Azure Policy and provides Kubernetes-native security controls. | Learn More |
| AKSSEC002 | Azure Policy Add-on | Medium | Security | ✅ PASS | No issues detected. | Enable Azure Policy add-on using az aks enable-addons --resource-group <rg> --name <cluster> --addons azure-policy.Deploy built-in policy initiatives like 'Kubernetes cluster pod security restricted standards' and create custom policies for your organization's requirements. | Learn More | |
| AKSSEC003 | Defender for Containers | High | Security | ✅ PASS | No issues detected. | Enable Defender for Containers using az aks update --resource-group <rg> --name <cluster> --enable-defender or through Security Center in Azure Portal.Configure vulnerability scanning, runtime threat detection, and compliance monitoring for comprehensive container security. | Learn More | |
| AKSSEC004 | OIDC Issuer Enabled | Medium | Security | ✅ PASS | No issues detected. | Enable OIDC issuer using az aks update --resource-group <rg> --name <cluster> --enable-oidc-issuer.This enables workload identity federation, allowing pods to authenticate to Azure services using service account tokens instead of secrets. | Learn More | |
| AKSSEC005 | Azure Key Vault Integration | High | Security | ✅ PASS | No issues detected. | Enable Key Vault CSI driver using az aks enable-addons --resource-group <rg> --name <cluster> --addons azure-keyvault-secrets-provider.Create SecretProviderClass resources to mount secrets, certificates, and keys from Azure Key Vault as volumes in pods. | Learn More | |
| AKSSEC007 | Kubernetes Dashboard Disabled | High | Security | ✅ PASS | No issues detected. | Disable the Kubernetes dashboard using az aks disable-addons --addons kube-dashboard --resource-group <rg> --name <cluster>.Use Azure Portal, kubectl, or other secure management tools instead. If dashboard access is required, implement proper authentication and network restrictions. | Learn More |
AKS Automatic Migration Readiness
AKS Automatic Migration Readiness Not Ready
🚫 Blockers: 1
⚠️ Warnings: 3
✅ Aligned Checks: 8
This view is derived from existing Kubernetes and AKS shared checks and focuses on readiness for a new AKS Automatic cluster.
Open detailed AKS Automatic action plan
Fix Before Migration
| ID | Check | Affected | Recommendation | Examples |
|---|---|---|---|---|
| POD007 | Container images do not use latest tag | 4 | Specify an explicit image tag (e.g., ':v1.2.3') on every container and initContainer to ensure consistent deployments. | pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/store-front-698cc8c565-f5hp5 |
Warnings
| ID | Check | Affected | Recommendation | Examples |
|---|---|---|---|---|
| SEC003 | Pods Running as Root | 12 | Avoid running pods as root by explicitly setting runAsUser to a non-zero UID in pod or container securityContext. | pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/rabbitmq-5dcdf9484-kvgw7, pod/store-front-698cc8c565-f5hp5 |
| SEC020 | Seccomp Profile Not Configured | 5 | Set seccompProfile.type to RuntimeDefault or Localhost at the pod or container level. | pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/rabbitmq-5dcdf9484-kvgw7, pod/store-front-698cc8c565-f5hp5 |
| WRK007 | Missing Readiness and Liveness Probes | 4 | Add readiness and liveness probes to all containers to improve availability and fault detection. | deployment/order-service, deployment/product-service, deployment/rabbitmq, deployment/store-front |