Cluster Overview

Cluster Name: aks-181225-test-uks

Cluster Health Score

Score: 81 / 100

81%

This score is calculated from key checks across nodes, workloads, security, and configuration best practices. A higher score means fewer issues and better adherence to Kubernetes standards.

API Server Health

latency (p99): 5.00 ms

Liveness: livez check passed expand_more

[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/start-service-ip-repair-controllers ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/start-kubernetes-service-cidr-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-status-local-available-controller ok
[+]poststarthook/apiservice-status-remote-available-controller ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-discovery-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
livez check passed

Readiness: readyz check passed expand_more

[+]ping ok
[+]log ok
[+]etcd ok
[+]etcd-readiness ok
[+]informer-sync ok
[+]poststarthook/start-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/storage-object-count-tracker-hook ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/start-system-namespaces-controller ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-controller ok
[+]poststarthook/start-kube-apiserver-identity-lease-garbage-collector ok
[+]poststarthook/start-legacy-token-tracking-controller ok
[+]poststarthook/start-service-ip-repair-controllers ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/start-kubernetes-service-cidr-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-status-local-available-controller ok
[+]poststarthook/apiservice-status-remote-available-controller ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-discovery-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]poststarthook/apiservice-openapiv3-controller ok
[+]shutdown ok
readyz check passed

Passed / Failed Checks

0/0 Passed

This shows the number of health checks that passed out of the total checks performed across the cluster. A higher pass rate indicates better overall cluster health.

Top 5 Improvements

These are the five checks whose remediation will yield the most immediate benefit to your overall Cluster Health Score. Each card shows the cluster score points you’ll recover by fixing it.

home_repair_serviceRBAC002+ 4.62 pts

RBAC Overexposure

home_repair_serviceSEC003+ 4.62 pts

Pods Running as Root

home_repair_serviceWRK007+ 3.2 pts

Missing Readiness and Liveness Probes

home_repair_serviceWRK002+ 2.4 pts

Deployment Missing Replicas

home_repair_serviceSEC006+ 2.4 pts

Pods Missing Secure Defaults

Issue Summary

This section shows how many checks have failed at each severity level over the last run. Click on a card below to expand and review those checks.

Critical

7 checks failed

expand_more

NET001Services Without Endpoints (Networking)

NET002Publicly Accessible Services (Networking)

POD007Container images do not use latest tag (Pods)

RBAC002RBAC Overexposure (Security)

SEC003Pods Running as Root (Security)

SEC011Containers Running as UID 0 (Security)

SEC014Untrusted Image Registries (Security)

Warning

18 checks failed

expand_more

CFG002Duplicate ConfigMap Names (Configuration Hygiene)

EVENT001Grouped Warning Events (Kubernetes Events)

EVENT002Full Warning Event Log (Kubernetes Events)

NET004Namespace Missing Network Policy (Networking)

NODE003Max Pods per Node (Nodes)

NS002Missing or Weak ResourceQuotas (Namespaces)

NS003Missing LimitRanges (Namespaces)

POD004Pending Pods (Pods)

POD008Automounting API Credentials Enabled in Pods (Pods)

RBAC003Orphaned ServiceAccounts (Security)

SC002AKS Azure In-Tree Storage Provisioners (Storage)

SC004StorageClass Prevents Volume Expansion (Storage)

SEC006Pods Missing Secure Defaults (Security)

SEC009Missing Capabilities Drop (Security)

SEC015Pods Using Default ServiceAccount (Security)

SEC020Seccomp Profile Not Configured (Security)

WRK002Deployment Missing Replicas (Workloads)

WRK007Missing Readiness and Liveness Probes (Workloads)

Info

3 checks failed

expand_more

NS001Empty Namespaces (Namespaces)

RBAC004Orphaned and Ineffective Roles (Security)

SEC007Missing Pod Security Admission Labels (Security)

Rightsizing at a Glance

Node Insights

🖥️ Underutilized Nodes0

🔥 Saturated Nodes0

✅ Right-sized Nodes3

Pod Actions

⚙️ CPU Request Changes1

🧠 Memory Request Changes1

🛡️ Memory Limit Changes1

🚫 CPU Limit Removals0

Impact Summary

🚀 High Impact0

📈 Medium Impact0

🧩 Low Impact1

🔗 Quick Links: PROM006 • PROM007

Excluded Namespaces iThese namespaces are excluded from analysis and reporting.

aks-istio-system • calico-system • coredns • gatekeeper-system • kube-flannel • kube-node-lease • kube-public • kube-system • local-path-storage • tigera-operator

Cluster Summary

Cluster Name: aks-181225-test-uks

Kubernetes Version: v1.33.6

Cluster is running an outdated version: v1.33.6 (Latest: v1.35.3)

Cluster Metrics Summary iSummary of metrics including node and pod counts, warnings, and issues.

🚀 Nodes: 3	🟩 Healthy: 3	🟥 Issues: 0
📦 Pods: 67	🟩 Running: 63	🟥 Failed: 0
🔄 Restarts: 22	🟨 Warnings: 5	🟥 Critical: 0
⏳ Pending Pods: 4	🟡 Waiting: 4
⚠️ Stuck Pods: 4	❌ Stuck: 4
📉 Job Failures: 0	🔴 Failed: 0

Pod Distribution iAverage, min, and max pods per node and total node count.

Avg: 21.0

Max: 42

Min: 10

Total Nodes: 3

Cluster Health Metrics (Last 24h) i 24-hour Prometheus averages and charts for cluster CPU and memory usage.

Avg CPU: 5.23%

Avg Memory: 19.58%

Cluster CPU Usage (%)

Historical CPU metrics from Prometheus, averaged over the last 24 hours.

Cluster Memory Usage (%)

Historical memory metrics from Prometheus, averaged over the last 24 hours.

Cluster Events

Errors: 4

Warnings: 4

Node Conditions & Resources

NODE001 - Node Readiness and Conditions iDetects nodes that are not in Ready state or reporting other warning conditions.

✅ All Nodes are healthy.

Show Findings

Node	Status	Issues
aks-systempool-39088964-vmss00000k	✅ Healthy	None
aks-systempool-39088964-vmss00000l	✅ Healthy	None
aks-systempool-39088964-vmss00000m	✅ Healthy	None

NODE002 - Node Resource Pressure (Last 24h) iDetects nodes under high CPU, memory, or disk pressure.

Data source: Prometheus (24h average)

✅ All Nodes are healthy.

Show Findings

Node	CPU Status	CPU %	CPU Used	CPU Total	Mem Status	Mem %	Mem Used	Mem Total	Disk %	Disk Status
aks-systempool-39088964-vmss00000k	✅ Normal	7.80%	301 mC	3860 mC	✅ Normal	32.9%	4888 Mi	14846 Mi	21.55%	✅ Normal
aks-systempool-39088964-vmss00000l	✅ Normal	3.64%	140 mC	3860 mC	✅ Normal	12.7%	1887 Mi	14850 Mi	20.53%	✅ Normal
aks-systempool-39088964-vmss00000m	✅ Normal	4.26%	164 mC	3860 mC	✅ Normal	13.1%	1943 Mi	14846 Mi	20.49%	✅ Normal

NODE003 - Max Pods per Node iAlerts when any node is running too many pods according to configured thresholds.

⚠️ Total Nodes with Issues: 1

Show Recommendations

Run kubectl get pods -o wide --all-namespaces and group by .spec.nodeName to see pod distribution.
Use kubectl describe node <node-name> to inspect allocatable pods and taints.
Consider tuning the kubelet’s --max-pods flag if you need higher density.
Scale out your node pool or add additional nodes to balance the load.

Docs: Kubernetes Nodes

Show Findings

Node	PodCount	Capacity	Percentage	Threshold	Status
aks-systempool-39088964-vmss00000k	42	50	84.00%	80%	Warning

PROM005 - Overcommitted CPU (Prometheus) iChecks if CPU requests on nodes exceed allocatable capacity over the last 24 hours.

✅ All Nodes are healthy.

PROM006 - Node Sizing Insights (Prometheus) iUses Prometheus p95 CPU and memory usage over a fixed 7-day window to highlight underutilized or saturated nodes and suggest sizing actions.

✅ All Nodes are healthy.

📅 Insufficient Prometheus history for sizing. Required: 7 days, available: 5.75 days.

Show Recommendations

Node Sizing Guidance

Focus on sustained p95 trends, not short spikes.
Sizing window is fixed to 7 days for stable, lower-cost query execution.
Nodes flagged as underutilized are candidates for smaller SKUs or scale-in.
Nodes flagged as saturated likely need larger SKUs, scale-out, or workload rebalancing.
Validate with workload requests/limits and HPA/VPA behavior before applying changes.

Docs: Kubernetes Node Autoscaling

Show Findings

Status	Required Days	Available Days	Message
Insufficient Prometheus history	7	5.8	Node sizing recommendations are withheld until at least 7 days of Prometheus history is available.

Search Nodes

Node: aks-systempool-39088964-vmss00000kCPU: 7.80%Mem: 32.93%Disk: 21.55%

OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0

CPU: 7.80%

Memory: 32.93%

Disk: 21.55%

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

Node: aks-systempool-39088964-vmss00000lCPU: 3.64%Mem: 12.71%Disk: 20.53%

OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0

CPU: 3.64%

Memory: 12.71%

Disk: 20.53%

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

Node: aks-systempool-39088964-vmss00000mCPU: 4.26%Mem: 13.09%Disk: 20.49%

OS: Microsoft Azure Linux 3.0
Kernel: 6.6.126.1-1.azl3
Kubelet: v1.33.6
Runtime: containerd://2.0.0

CPU: 4.26%

Memory: 13.09%

Disk: 20.49%

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

Namespaces

NS001 - Empty Namespaces iFinds namespaces with no running pods.

⚠️ Total Namespaces with Issues: 1

Show Recommendations

Check if any other resources (PVCs, Secrets) exist before deleting.
Use kubectl get all -n to inspect.
Clean up empty namespaces to reduce clutter.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
default	namespace/default	⚠️ Partial	No pods, but other resources exist

NS002 - Missing or Weak ResourceQuotas iDetects namespaces with missing or incomplete ResourceQuota definitions.

⚠️ Total ResourceQuotas with Issues: 2

Show Recommendations

Define limits using ResourceQuota for pods, memory, and CPU.
Helps avoid over-provisioning and noisy neighbor issues.
Review quotas using kubectl describe quota -n .

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	namespace/azure-store		❌ No ResourceQuota
default	namespace/default		❌ No ResourceQuota

NS003 - Missing LimitRanges iDetects namespaces without a defined LimitRange.

⚠️ Total LimitRanges with Issues: 2

Show Recommendations

LimitRanges define default and max values for CPU/memory.
Prevents pods from using unlimited resources.
Use kubectl create limitrange ... or kubectl describe limitrange -n .

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	namespace/azure-store		❌ No LimitRange
default	namespace/default		❌ No LimitRange

NS004 - Pods in Default Namespace iFlags pods running in the default namespace.

✅ All Pods are healthy.

Show Recommendations

Use kubectl get pods -n default to list them.
Re-deploy your workloads into a custom namespace:

kubectl create namespace my-app kubectl -n my-app apply -f your-manifests.yaml

Docs: Reference

Workloads

WRK001 - DaemonSets Not Fully Running iDetects DaemonSets that have fewer ready pods than desired.

✅ All Workloads are healthy.

Show Recommendations

Run kubectl describe ds -n to check for scheduling issues.
Check node taints and conditions.
Ensure resource requests are not too high for nodes.

Docs: Reference

WRK002 - Deployment Missing Replicas iDetects Deployments where available replicas are less than desired.

⚠️ Total Workloads with Issues: 4

Show Recommendations

Run kubectl describe deployment -n to view status.
Check for failed pods using kubectl get pods -n .
Review rollout and events for delays or crashes.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	deployment/order-service	0/1	Deployment has fewer available replicas than desired.
azure-store	deployment/product-service	0/1	Deployment has fewer available replicas than desired.
azure-store	deployment/rabbitmq	0/1	Deployment has fewer available replicas than desired.
azure-store	deployment/store-front	0/1	Deployment has fewer available replicas than desired.

WRK003 - StatefulSet Incomplete Rollout iDetects StatefulSets with fewer ready replicas than desired.

✅ All Workloads are healthy.

Show Recommendations

Run kubectl describe sts name -n namespace to view rollout and events.
Check pod logs and PersistentVolumeClaim bindings.
Confirm storage class availability and node scheduling constraints.

Docs: Reference

WRK004 - HPA Misconfiguration or Inactivity iChecks for HPAs with missing targets, metrics, or inactive scaling.

✅ All Workloads are healthy.

Show Recommendations

Check if the target workload exists using kubectl get deploy|sts -n .
Use kubectl describe hpa -n to inspect HPA status and events.
Ensure metrics-server is running and the target exposes the required metrics.

Docs: Reference

WRK005 - Missing Resource Requests iChecks that every container has explicit CPU and memory requests.

✅ All Workloads are healthy.

Show Recommendations

Add resources.requests.cpu and resources.requests.memory to every container.
Review both workload and initContainers with kubectl get deploy,statefulset,daemonset -A -o yaml.
Apply any missing fields, then rerun KubeBuddy to confirm.

Docs: Reference

WRK006 - PDB Coverage and Effectiveness iDetects missing or weak PodDisruptionBudgets.

✅ All Workloads are healthy.

Show Recommendations

Set minAvailable to a safe minimum (not 0).
Avoid setting maxUnavailable to 1 or 100%.
Make sure PDBs match actual workloads via label selectors.

Docs: Reference

WRK007 - Missing Readiness and Liveness Probes iDetects containers without readiness or liveness probes.

⚠️ Total Workloads with Issues: 4

Show Recommendations

Readiness probes indicate when a container is ready to receive traffic.
Liveness probes detect if a container is stuck or dead.
Use httpGet, tcpSocket, or exec probes for most apps.
Docs: Health probes in Kubernetes

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	deployment/order-service	order-service	readiness, liveness missing
azure-store	deployment/product-service	product-service	readiness, liveness missing
azure-store	deployment/rabbitmq	rabbitmq	readiness, liveness missing
azure-store	deployment/store-front	store-front	readiness, liveness missing

WRK008 - Deployment Selector Without Matching Pods iDetects Deployments whose selectors do not match any existing pods.

✅ All Workloads are healthy.

Show Recommendations

Check that Deployment's spec.selector.matchLabels matches the pod template's labels.
Fix any label mismatches to allow pods to be created.

Docs: Reference

WRK009 - Deployment, Pod, and Service Label Consistency iValidates that deployments, pods, and services use aligned labels and selectors.

✅ All Workloads are healthy.

Show Recommendations

Deployment spec.selector.matchLabels must match the Pod template metadata.labels.
Services should have spec.selector that targets the same labels used by the Deployment and Pods.
Use kubectl get deployment,svc,pod -o yaml to compare values and fix mismatches.

Docs: Reference

WRK010 - HPA Metrics Without Matching Resource Requests iDetects HPAs that scale on CPU or memory metrics when target containers lack matching requests.

✅ All Workloads are healthy.

Show Recommendations

Add resources.requests.cpu and/or resources.requests.memory for HPA target containers.
Use consistent requests across replicas to avoid unstable scaling behavior.
After updates, validate HPA behavior with kubectl describe hpa.

Docs: Reference

WRK011 - VPA Update Mode and Declarative Resource Conflict Risk iFlags VPAs in Auto/Recreate mode that may conflict with declarative resource ownership or HPAs.

✅ All Workloads are healthy.

Show Recommendations

If GitOps/Helm controls requests, consider VPA updateMode: Off or Initial.
Avoid overlapping HPA (CPU/memory) and VPA ownership without clear boundaries.
Document which controller owns requests per workload.

Docs: Reference

WRK012 - PodDisruptionBudget Adequacy for Replicated Workloads iValidates that replicated workloads have matching PDBs with sensible settings.

✅ All Workloads are healthy.

Show Recommendations

Ensure replicated workloads (2+ replicas) have a matching PDB.
Avoid minAvailable equal to replica count for normal maintenance windows.
Use pragmatic budgets (for example maxUnavailable: 1 for many workloads).

Docs: Reference

WRK013 - CrashLoopBackOff and OOMKilled Guardrail iFlags pods with CrashLoopBackOff, OOMKilled state, or high restart counts.

✅ All Workloads are healthy.

Show Recommendations

Investigate container logs and termination reasons for recurring restarts.
Increase memory requests/limits when OOMKilled events are observed.
Apply sizing changes gradually and validate SLO/error rates.

Docs: Reference

WRK014 - Missing Memory Limits iChecks that every container has an explicit memory limit.

✅ All Workloads are healthy.

Show Recommendations

Add resources.limits.memory to every application and init container.
Set the limit high enough for normal peaks, then tune requests separately.
Review workloads with kubectl get deploy,statefulset,daemonset -A -o yaml to confirm the source manifests carry the limit.

Docs: Reference

WRK015 - Replicated Workloads Missing Spread Constraints iDetects replicated workloads that define neither anti-affinity nor topology spread constraints.

✅ All Workloads are healthy.

Show Recommendations

Add topologySpreadConstraints or affinity.podAntiAffinity to each workload with multiple replicas.
Prefer distribution across nodes and zones using stable labels such as topology.kubernetes.io/zone and kubernetes.io/hostname.
Update the source Deployment, StatefulSet, or Helm values so the spreading rule is maintained on future releases.

Docs: Reference

Pods

POD001 - Pods with High Restarts iDetects pods that have restarted more than the configured thresholds.

✅ All Pods are healthy.

Show Recommendations

Use kubectl logs -n to view logs and identify crash causes.
Run kubectl describe pod -n to check events and probe failures.
Verify readiness and liveness probes are configured properly.
Check for missing config, secrets, or volume mounts.
Adjust resource requests/limits to avoid OOM kills.

Docs: Reference

POD002 - Long Running Pods iFlags pods that have been running longer than configured thresholds.

✅ All Pods are healthy.

Show Recommendations

Pods with extended uptime may indicate skipped rolling updates.
Use kubectl rollout status to inspect deployment progress.
Restart pods when config changes are missed or memory use drifts.
Check if the workload is intended to be static or ephemeral.

Docs: Reference

POD003 - Failed Pods iDetects pods in a failed phase, typically due to startup errors, crashes, or misconfiguration.

✅ All Pods are healthy.

Show Recommendations

Check the pod events with kubectl describe pod <pod> -n <ns>
Review logs using kubectl logs <pod> -n <ns>
Validate container specs, resource limits, and init containers
Check node availability or taints

Docs: Reference

POD004 - Pending Pods iDetects pods stuck in a 'Pending' state due to scheduling or resource issues.

⚠️ Total Pods with Issues: 4

Show Recommendations

Run kubectl describe pod <pod> -n <namespace> to check scheduling events
Check if nodes meet the pod's resource requests and tolerations
Look for unresolved PVCs, Secrets, or ConfigMaps
Check cluster-wide CPU and memory availability

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Pending	Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions.
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Pending	Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions.
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Pending	Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions.
azure-store	pod/store-front-698cc8c565-f5hp5	Pending	Some pods are stuck in Pending. These workloads are not running and are waiting on cluster conditions.

POD005 - CrashLoopBackOff Pods iIdentifies pods stuck in a CrashLoopBackOff state due to repeated container crashes.

✅ All Pods are healthy.

Show Recommendations

Run kubectl logs <pod-name> -n <namespace> to see error output
Describe the pod for events and messages: kubectl describe pod <pod> -n <ns>
Check init containers, config errors, and resource limits

Docs: Reference

POD006 - Leftover Debug Pods iDetects pods created by kubectl debug that have not been cleaned up.

✅ All Pods are healthy.

Show Recommendations

Run kubectl delete pod -n to remove them
Ensure automation or users clean up after using kubectl debug

Docs: Reference

POD007 - Container images do not use latest tag iFlags containers using latest or no explicit tag.

⚠️ Total Pods with Issues: 4

Show Recommendations

🛠️ Use Specific Image Tags

Don't use the :latest tag or leave the image tag blank.
Why: It can pull different images on each deploy, leading to drift.
Fix: Tag images explicitly (e.g., :v1.2.3) and update the pod spec.
Docs: Kubernetes Image Tagging

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	ghcr.io/azure-samples/aks-store-demo/order-service:latest	Container order-service: Image uses latest tag
azure-store	pod/order-service-65cc8855c-ghk9m	busybox	Container wait-for-rabbitmq: Image omits explicit tag
azure-store	pod/product-service-77ff9f6fd6-rzcxj	ghcr.io/azure-samples/aks-store-demo/product-service:latest	Container product-service: Image uses latest tag
azure-store	pod/store-front-698cc8c565-f5hp5	ghcr.io/azure-samples/aks-store-demo/store-front:latest	Container store-front: Image uses latest tag

POD008 - Automounting API Credentials Enabled in Pods iFlags pods that do not explicitly disable service account token automounting.

⚠️ Total Pods with Issues: 4

Show Recommendations

🛠️ Disable Automounting API Credentials

Add automountServiceAccountToken: false to the Pod's spec.
Edit with kubectl edit pod -n .
Verify if the application needs API access (e.g., for controllers).
Use RBAC to limit ServiceAccount permissions if access is required.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	<nil>	Pod automounts API credentials
azure-store	pod/product-service-77ff9f6fd6-rzcxj	<nil>	Pod automounts API credentials
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	<nil>	Pod automounts API credentials
azure-store	pod/store-front-698cc8c565-f5hp5	<nil>	Pod automounts API credentials

PROM001 - High CPU Pods (Prometheus) iChecks for pods with sustained high CPU usage over the last 24 hours using Prometheus metrics.

✅ All Pods are healthy.

Show Recommendations

🛠️ Investigate High CPU Pods

Use kubectl top pod to see real-time CPU usage.
Review app code or HPA settings for misbehaving containers.
Consider raising CPU requests/limits or scaling out.

Docs: Reference

PROM002 - High Memory Usage Pods (Prometheus) iDetects pods with high memory usage over the last 24 hours based on Prometheus metrics.

✅ All Pods are healthy.

Show Recommendations

🛠️ Investigate High Memory Pods

Use kubectl top pod to review memory usage.
Adjust resources.limits.memory appropriately.

Docs: Reference

PROM003 - High Network Receive Rate (Prometheus) iDetects pods receiving large amounts of network traffic over the last 24 hours.

✅ All Pods are healthy.

Show Recommendations

🛠️ Investigate Network Receive Rate

Use kubectl top pod or Prometheus UI.
Inspect service ingress patterns.

Docs: Reference

PROM007 - Pod Sizing Insights (Prometheus) iGenerates per-container CPU and memory sizing recommendations from fixed 7-day p95 Prometheus usage.

✅ All Pods are healthy.

📅 Insufficient Prometheus history for sizing. Required: 7 days, available: 5.75 days.

Show Recommendations

Pod Sizing Guidance

Set CPU and memory requests from p95 usage with safety headroom.
Default CPU limits recommendation is none to avoid unnecessary CPU throttling and latency spikes.
Keep memory limits set above memory request to control OOM blast radius.
Validate against SLOs and roll out gradually.

Docs: Reference

Show Findings

Status	Required Days	Available Days	Message
Insufficient Prometheus history	7	5.8	Pod sizing recommendations are withheld until at least 7 days of Prometheus history is available.

Jobs

JOB001 - Stuck Kubernetes Jobs iFinds Jobs that have started but not completed within the threshold.

✅ All Jobs are healthy.

Show Recommendations

Check pod status for the job using kubectl describe job .
Verify resources and restart policies.
Check logs with kubectl logs job/.

Docs: Reference

JOB002 - Failed Kubernetes Jobs iDetects jobs with failures and no successful completions.

✅ All Jobs are healthy.

Show Recommendations

Inspect job with kubectl describe job .
Check logs for errors using kubectl logs job/.
Review pod events and resource limits.

Docs: Reference

Networking

NET001 - Services Without Endpoints iIdentifies services that have no backing endpoints.

⚠️ Total Networking with Issues: 4

Show Recommendations

🔍 Services Without Endpoints

Verify that your service has a valid selector.
Check if pods exist and are ready in the same namespace.
Use kubectl describe svc <name> and kubectl get endpointslices -n <namespace> -l kubernetes.io/service-name=<name>.
Restart affected pods or fix labels as needed.

Docs: Reference

Show Findings

Namespace	Resource	Message
azure-store	service/order-service	No endpoints or endpoint slices
azure-store	service/product-service	No endpoints or endpoint slices
azure-store	service/rabbitmq	No endpoints or endpoint slices
azure-store	service/store-front	No endpoints or endpoint slices

NET002 - Publicly Accessible Services iDetects services of type LoadBalancer or NodePort that may be publicly exposed.

⚠️ Total Networking with Issues: 1

Show Recommendations

🌐 Secure Exposed Services

Use internal IP ranges or private LoadBalancers where possible.
Restrict NodePort usage or protect with firewall rules.
Disable external exposure for internal-only services.
Consider network policies or service mesh for access control.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	service/store-front	LoadBalancer	Exposed via external IP: 131.145.120.106

NET003 - Ingress Health Validation iValidates ingress classes, TLS secrets, and backend service references.

✅ All Networking are healthy.

Show Recommendations

🌐 Ingress Health Remediation

Add spec.ingressClassName or annotations if missing.
Validate all backend services and ports exist.
Fix missing TLS secrets or use valid ones.
Avoid duplicate host/path combinations.
Use only valid pathTypes: Exact, Prefix, or ImplementationSpecific.

Docs: Reference

NET004 - Namespace Missing Network Policy iFlags namespaces that do not define any NetworkPolicy.

⚠️ Total Networking with Issues: 1

Show Recommendations

Apply a default deny-all NetworkPolicy for ingress and egress.
Use additional policies to allow traffic between required pods/services.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	namespace/azure-store		No NetworkPolicy in active namespace

NET005 - Ingress Host/Path Conflicts iDetects duplicate host/path combinations across ingresses in the same namespace.

✅ All Networking are healthy.

Show Recommendations

🚫 Resolve Ingress Conflicts

Ensure that each unique host and path combination is defined in only one Ingress resource.
Use specific hostnames instead of broad wildcards where possible to prevent unintended conflicts.
Review your Ingress definitions for overlapping rules and consolidate or adjust as necessary.
Test routing after making changes to confirm correct behavior.

Docs: Reference

NET006 - Ingress Using Wildcard Hosts iDetects ingress rules that use wildcard hosts.

✅ All Networking are healthy.

Show Recommendations

⭐ Review Wildcard Ingresses

Evaluate if a wildcard host is truly necessary for the application's routing requirements.
Where possible, replace wildcards with specific hostnames to limit unintentional exposure.
Ensure that security policies and firewalls are in place to control access to wildcard-enabled Ingresses.

Docs: Reference

NET007 - Service TargetPort Mismatch iDetects services whose targetPort does not exist on backing pods.

✅ All Networking are healthy.

Show Recommendations

🎯 Fix Service TargetPort Mismatches

Verify the targetPort in your Service definition. It should either be a numerical port or a named port.
Check the containerPorts in the Pods selected by the Service.
Ensure the `targetPort` (by number) or `name` (for named ports) in the Pod's `containerPort` matches the Service's `targetPort`.
A common fix is to ensure consistent naming conventions or directly use port numbers.

Docs: Reference

NET008 - ExternalName Service to Internal IP iIdentifies ExternalName services that point to internal IP addresses.

✅ All Networking are healthy.

Show Recommendations

🔄 Review ExternalName to Internal IP

ExternalName services are primarily for CNAME-like redirection to external DNS names.
If routing to an internal IP address, consider if a standard `Service` with manually created `EndpointSlice` or a `Service` with `type: ClusterIP` backed by pods is more appropriate.
Ensure this configuration is intentional and does not bypass intended network segmentation or security policies.

Docs: Reference

NET009 - Overly Permissive Network Policy iIdentifies NetworkPolicies with empty rules or broad all-IP blocks.

✅ All Networking are healthy.

Show Recommendations

🔐 Restrict Overly Permissive Network Policies

Ensure `policyTypes` are paired with explicit `ingress` and `egress` rules that define allowed traffic.
Avoid empty `ingress` or `egress` sections if the `policyTypes` are defined, as this defaults to allowing all traffic for that type.
Limit the use of `ipBlock: 0.0.0.0/0`. Instead, define specific CIDR ranges for necessary external communication.
Adopt a "deny-by-default" approach and explicitly allow only required communication.

Docs: Reference

NET010 - Network Policy Overly Permissive IPBlock iFlags NetworkPolicies that allow 0.0.0.0/0 through ipBlock rules.

✅ All Networking are healthy.

Show Recommendations

🚫 Restrict '0.0.0.0/0' in Network Policies

Avoid using `ipBlock: 0.0.0.0/0` in NetworkPolicies unless absolutely required for specific, well-understood use cases (e.g., public internet access).
Identify the precise CIDR ranges or specific IP addresses that need to be allowed.
For egress, if public internet access is needed, consider egress gateways or more restrictive network policies to control outbound traffic.
This is a critical security vulnerability if unintended.

Docs: Reference

NET011 - Network Policy Missing PolicyTypes iDetects NetworkPolicies that do not explicitly define policyTypes.

✅ All Networking are healthy.

Show Recommendations

📝 Define Network PolicyTypes

Always explicitly define `policyTypes` in your NetworkPolicy, such as `policyTypes: [Ingress]` or `policyTypes: [Ingress, Egress]`.
This clearly indicates whether the policy applies to inbound, outbound, or both types of traffic.
It prevents reliance on default behaviors, which can vary or change between Kubernetes versions or CNI implementations.

Docs: Reference

NET012 - Pod HostNetwork Usage iIdentifies pods configured with hostNetwork true.

✅ All Networking are healthy.

Show Recommendations

⚠️ Avoid HostNetwork Usage

Using `hostNetwork: true` is a security risk as it grants the pod direct access to the node's network stack.
This bypasses many Kubernetes network security features and network policies.
Only use `hostNetwork` for specific, highly privileged use cases (e.g., CNI plugins, network observability tools) and limit access via RBAC and Pod Security Standards.
For typical applications, rely on ClusterIP, NodePort, or LoadBalancer services for exposure.

Docs: Reference

NET013 - Ingress Present Without Gateway API Adoption iDetects clusters still using Ingress without any Gateway API resources.

✅ All Networking are healthy.

Show Recommendations

🚦 Begin Gateway API Migration

Create or select a GatewayClass supported by your controller.
Define one or more Gateway resources for north-south traffic entry.
Migrate Ingress rules incrementally to HTTPRoute and validate behavior.
Run both models in parallel during transition where supported.

Docs: Reference

NET014 - HTTPRoute Missing or Unaccepted Parent iDetects HTTPRoutes with missing parentRefs or no accepted parent Gateway.

✅ All Networking are healthy.

Show Recommendations

🧭 Fix HTTPRoute Parent Binding

Set spec.parentRefs to an existing Gateway.
Check route status conditions and Gateway listener compatibility.
Verify namespace permissions and allowedRoutes policy.

Docs: Reference

NET015 - Gateways Without Attached HTTPRoutes iDetects Gateway resources that have no attached HTTPRoutes.

✅ All Networking are healthy.

Show Recommendations

🧹 Clean Up or Attach Routes

Attach one or more HTTPRoute resources to each active Gateway.
Delete unused Gateways to avoid confusion and stale entry points.
Confirm listener and route host/path alignment.

Docs: Reference

NET016 - Gateway API Readiness Conditions iDetects Gateway resources that are not accepted or programmed.

✅ All Networking are healthy.

Show Recommendations

🚦 Validate Gateway Readiness

Check GatewayClass and Gateway status.conditions for Accepted and Programmed.
Verify the Gateway controller deployment is healthy and watching the relevant classes.
Fix listener, address, or controller configuration issues before cutover.

Docs: Reference

NET017 - Gateway TLS Secret and Cross-Namespace ReferenceGrant Validation iValidates Gateway certificateRefs against existing Secrets and ReferenceGrants.

✅ All Networking are healthy.

Show Recommendations

🔐 Fix Gateway TLS References

Verify each certificateRef points to an existing Secret.
For cross-namespace refs, create a ReferenceGrant in the Secret namespace.
Re-check Gateway listener status after grant/secret updates.

Docs: Reference

NET018 - Duplicate Service Selectors iDetects multiple Services in the same namespace with identical selectors.

✅ All Networking are healthy.

Show Recommendations

🎯 Use Unique Service Selectors

Review Services in the same namespace that select the exact same pod label set.
Split selectors so each Service represents a distinct routing contract, or consolidate duplicate Services where appropriate.
Update the source manifest or Helm chart so the selector change persists across releases.

Docs: Reference

Storage

PV001 - Orphaned Persistent Volumes iDetects PersistentVolumes that are not bound to any PVC.

✅ All Storage are healthy.

Show Recommendations

🗑️ Clean Up Orphaned PVs

Audit: Verify the PV is truly unneeded using kubectl describe pv <name>.
Delete: Remove unneeded PVs with kubectl delete pv <name>.
Caution: Ensure no future PVC will bind to it before deletion.

Docs: Reference

PVC001 - Unused Persistent Volume Claims iDetects PVCs not attached to any pod.

✅ All Storage are healthy.

Show Recommendations

💾 Clean Up Unused PVCs

Audit: Confirm PVC is not needed using kubectl describe pvc -n .
Delete: Remove PVCs no longer required with kubectl delete pvc .
Prevent: Automate cleanup for stale environments or ephemeral workloads.

Docs: Reference

PVC002 - PVCs Using Default StorageClass iDetects PVCs that do not explicitly specify storageClassName.

✅ All Storage are healthy.

Show Recommendations

✍️ Specify StorageClass for PVCs

Edit: Add storageClassName: <your-storage-class-name> to the PVC spec.
Consistency: Ensure consistent storage provisioning across environments.
Awareness: Understand which StorageClass is truly being used.

Docs: Reference

PVC003 - ReadWriteMany PVCs on Incompatible Storage iDetects ReadWriteMany PVCs backed by likely block-storage provisioners.

✅ All Storage are healthy.

Show Recommendations

⚠️ Review ReadWriteMany PVCs

Verify: Confirm if the storage backend truly supports concurrent writes.
Adjust: If not, change PVC access mode to ReadWriteOnce.
Migrate: For shared data, use appropriate shared file storage solutions.

Docs: Reference

PVC004 - Unbound Persistent Volume Claims iDetects PersistentVolumeClaims that remain Pending.

✅ All Storage are healthy.

Show Recommendations

🚫 Troubleshoot Unbound PVCs

Describe PVC: Use kubectl describe pvc <name> -n <namespace> to see events and reasons for Pending.
Check StorageClass: Ensure the specified StorageClass exists and is correctly configured.
Review Provisioner: Verify the storage provisioner is running and healthy.

Docs: Reference

SC001 - Deprecated StorageClass Provisioners iDetects StorageClasses still using in-tree provisioners.

✅ All Storage are healthy.

Show Recommendations

🔄 Migrate Deprecated StorageClasses

Identify: Pinpoint PVCs using the deprecated StorageClass.
Create: Define a new StorageClass with the appropriate CSI driver.
Migrate: Follow the migration path for your specific storage provider to move data.

Docs: Reference

SC002 - AKS Azure In-Tree Storage Provisioners iDetects Azure in-tree storage provisioners that are not AKS Automatic compatible.

⚠️ Total Storage with Issues: 1

Show Recommendations

🔄 Migrate Azure StorageClasses to CSI

Create replacement StorageClasses that use disk.csi.azure.com or file.csi.azure.com.
Move PVCs and workloads off the in-tree StorageClass before migrating to AKS Automatic.
Validate reclaim policies, SKU, and mount options during the migration.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message

SC003 - High Cluster Storage Usage (Prometheus) iMonitors overall used storage across the cluster.

✅ All Storage are healthy.

Show Recommendations

📊 Manage Storage Consumption

Identify: Use monitoring tools to find namespaces/pods consuming the most storage.
Clean Up: Delete old data, snapshots, or unused PVCs/PVs.
Scale: Plan for increasing storage capacity or optimizing storage allocation.

Docs: Reference

SC004 - StorageClass Prevents Volume Expansion iIdentifies StorageClasses that do not permit volume expansion, which can limit dynamic scaling of stateful applications.

⚠️ Total Storage with Issues: 1

Show Recommendations

📈 Enable Volume Expansion

Assess: Determine if your applications need dynamic volume resizing.
Configure: Add or set allowVolumeExpansion: true in the StorageClass definition.
Backend Check: Ensure your storage backend supports online volume expansion.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
(cluster)	storageclass/default	true	StorageClass does not allow volume expansion.

Configuration Hygiene

CFG001 - Orphaned ConfigMaps iDetects ConfigMaps that are not referenced by workloads or related resources.

✅ All Configuration Hygiene are healthy.

Show Recommendations

🛠️ Clean Up Orphaned ConfigMaps

Verify: Check usage (kubectl describe cm ).
Delete: kubectl delete cm if unused.
Automation: Schedule periodic scans.

Docs: Reference

CFG002 - Duplicate ConfigMap Names iDetects ConfigMaps with identical names across multiple namespaces.

⚠️ Total Configuration Hygiene with Issues: 2

Show Recommendations

🛠️ Fix Duplicate ConfigMap Names

Standardize: Use unique names or a naming convention that includes the environment or team name.
Audit: Periodically review ConfigMaps across namespaces for duplication.
Automation: Use policies or linting tools to catch duplicates pre-deploy.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
-	configmap/kube-root-ca.crt	-	Found in namespaces: azure-store, default
-	configmap/kube-root-ca.crt	-	Found in namespaces: azure-store, default

CFG003 - Large ConfigMaps iFinds ConfigMaps larger than 1 MiB.

✅ All Configuration Hygiene are healthy.

Show Recommendations

🛠️ Reduce ConfigMap Size

Refactor: Move large files or data to PersistentVolumes.
Split: Break up oversized ConfigMaps into smaller ones by function.
Review: Check for secrets or binary blobs mistakenly stored in ConfigMaps.

Docs: Reference

PROM004 - API Server High Latency (Prometheus) iDetects high latency in Kubernetes API server requests over the last 24 hours.

✅ All Configuration are healthy.

Show Recommendations

🛠️ Investigate API Server Latency

Check kube-apiserver logs.
Review etcd performance.

Docs: Reference

Security

RBAC001 - RBAC Misconfigurations iDetects invalid roleRefs, missing roles, orphaned service accounts, and incorrect subject namespaces.

✅ All Security are healthy.

Show Recommendations

🔐 RBAC Misconfiguration Fixes

Don't leave roleRef blank in bindings.
Use valid Roles/ClusterRoles that exist in the correct namespace.
Verify ServiceAccounts exist in the namespace specified.
Remove or correct subjects pointing to non-existent namespaces.

Docs: Reference

RBAC002 - RBAC Overexposure iIdentifies dangerous RBAC grants such as cluster-admin and wildcard permissions.

⚠️ Total Security with Issues: 12

Show Recommendations

🔐 RBAC Hardening Tips

Avoid using cluster-admin directly in bindings.
Don’t assign Roles or ClusterRoles with wildcard verbs/resources/apiGroups.
Restrict access to sensitive resources like secrets or pods/exec.
Minimize privileges for default ServiceAccounts.
Document use of any built-in roles used in production.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
🌍 Cluster-Wide	clusterrolebinding/aks-cluster-admin-binding	User/clusterAdmin	cluster-admin binding (built-in)
🌍 Cluster-Wide	clusterrolebinding/aks-cluster-admin-binding	User/clusterUser	cluster-admin binding (built-in)
🌍 Cluster-Wide	clusterrolebinding/aks-cluster-admin-binding-aad	Group/c30f2960-28f8-49cc-9308-c1e741824c4f	cluster-admin binding (built-in)
🌍 Cluster-Wide	clusterrolebinding/aks-secretprovidersyncing-rolebinding	ServiceAccount/aks-secrets-store-csi-driver	Access to sensitive resources
🌍 Cluster-Wide	clusterrolebinding/aks-service-rolebinding	User/aks-support	Access to sensitive resources
🌍 Cluster-Wide	clusterrolebinding/ama-metrics-clusterrolebinding	ServiceAccount/ama-metrics-serviceaccount	Access to sensitive resources
🌍 Cluster-Wide	clusterrolebinding/cluster-admin	Group/system:masters	cluster-admin binding (built-in)
🌍 Cluster-Wide	clusterrolebinding/system:controller:clusterrole-aggregation-controller	ServiceAccount/clusterrole-aggregation-controller	Access to sensitive resources (built-in)
🌍 Cluster-Wide	clusterrolebinding/system:controller:legacy-service-account-token-cleaner	ServiceAccount/legacy-service-account-token-cleaner	Access to sensitive resources (built-in)
🌍 Cluster-Wide	clusterrolebinding/system:kube-controller-manager	User/system:kube-controller-manager	Access to sensitive resources (built-in)
🌍 Cluster-Wide	clusterrolebinding/system:kube-scheduler	User/system:kube-scheduler	Access to sensitive resources (built-in)
🌍 Cluster-Wide	clusterrolebinding/system:persistent-volume-binding	ServiceAccount/persistent-volume-binder	Access to sensitive resources (built-in)

RBAC003 - Orphaned ServiceAccounts iFinds ServiceAccounts not used by pods or RBAC bindings.

⚠️ Total Security with Issues: 1

Show Recommendations

🧾 Remove Orphaned ServiceAccounts

Audit ServiceAccounts not referenced in RoleBindings, ClusterRoleBindings, or used by Pods.
Delete those not actively used to reduce attack surface.
Consider automating SA cleanup with CI/CD or policy enforcement.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
default	serviceaccount/default	default	ServiceAccount not used by pods or RBAC bindings

RBAC004 - Orphaned and Ineffective Roles iFlags roles and clusterroles that are unused or define no rules.

⚠️ Total Security with Issues: 3

Show Recommendations

🗂️ Clean up Unused or Ineffective RBAC

Remove RoleBindings or ClusterRoleBindings without subjects.
Prune Roles and ClusterRoles not referenced by any bindings.
Remove roles with no defined rules unless planned for future use.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
cluster-wide	clusterrolebinding/system:node	system:node	ClusterRoleBinding has no subjects
cluster-wide	clusterrole/aks-secretproviderclasses-admin-role	aks-secretproviderclasses-admin-role	Unused ClusterRole
cluster-wide	clusterrole/aks-secretproviderclasses-viewer-role	aks-secretproviderclasses-viewer-role	Unused ClusterRole

SEC001 - Orphaned Secrets iDetects Secrets not used by workloads or related resources.

✅ All Security are healthy.

Show Recommendations

🔐 Orphaned Secrets Cleanup

Remove Secrets not referenced in Pods, Deployments, StatefulSets, or Ingresses.
Audit Secret content before deletion to avoid removing active credentials.
Validate Custom Resources don’t indirectly depend on these Secrets.
Regularly prune Secrets as part of security hygiene.

Docs: Reference

SEC002 - Pods using hostPID or hostNetwork iFlags pods that share the host PID or network namespace, which can compromise isolation and node security.

✅ All Security are healthy.

Show Recommendations

Avoid Host-Level Sharing

Set hostPID: false and hostNetwork: false unless needed for special workloads.
Review security implications of namespace sharing with the host.
Restrict use of these settings to trusted namespaces and workloads.
Consider using PSPs or OPA/Gatekeeper policies to prevent usage cluster-wide.

Docs: Reference

SEC003 - Pods Running as Root iDetects pods running with UID 0 or no explicit runAsUser setting, which defaults to root in many images.

⚠️ Total Security with Issues: 12

Show Recommendations

RunAsUser Hardening

Set runAsUser to a non-zero UID at pod or container level.
Avoid relying on container defaults and define securityContext explicitly.
Validate any custom base images that may default to root.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Not Set (Defaults to root)	Container order-service runs as root or has no runAsUser set
azure-store	pod/order-service-65cc8855c-ghk9m	Not Set (Defaults to root)	Container wait-for-rabbitmq runs as root or has no runAsUser set
azure-store	pod/order-service-65cc8855c-ghk9m	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Not Set (Defaults to root)	Container product-service runs as root or has no runAsUser set
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Not Set (Defaults to root)	Container rabbitmq runs as root or has no runAsUser set
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/store-front-698cc8c565-f5hp5	Not Set (Defaults to root)	Container store-front runs as root or has no runAsUser set
azure-store	pod/store-front-698cc8c565-f5hp5	Not Set (Defaults to root)	Container runs as root or has no runAsUser set
azure-store	pod/store-front-698cc8c565-f5hp5	Not Set (Defaults to root)	Container runs as root or has no runAsUser set

SEC004 - Privileged Containers iDetects containers running with privileged mode enabled.

✅ All Security are healthy.

Show Recommendations

Disable Privileged Containers

Remove securityContext.privileged: true from container specs.
Refactor workloads to avoid needing host-level access.
Enforce restrictions using Pod Security Policies or OPA/Gatekeeper.
Limit use to dedicated namespaces with strict controls.

Docs: Reference

SEC005 - Pods Using hostIPC iDetects pods that enable hostIPC.

✅ All Security are healthy.

Show Recommendations

🔒 Disable hostIPC for Pods

Remove hostIPC: true from pod specs.
Review workloads that require inter-process communication with the host.
Use shared memory only through secure, scoped means.

Docs: Reference

SEC006 - Pods Missing Secure Defaults iChecks if pods are missing recommended securityContext fields such as runAsNonRoot, readOnlyRootFilesystem, or allowPrivilegeEscalation.

⚠️ Total Security with Issues: 4

Show Recommendations

Set securityContext.runAsNonRoot: true
Set securityContext.readOnlyRootFilesystem: true
Set securityContext.allowPrivilegeEscalation: false

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Missing securityContext	Container order-service has no securityContext defined
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Missing securityContext	Container product-service has no securityContext defined
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Missing securityContext	Container rabbitmq has no securityContext defined
azure-store	pod/store-front-698cc8c565-f5hp5	Missing securityContext	Container store-front has no securityContext defined

SEC007 - Missing Pod Security Admission Labels iFlags namespaces missing pod security admission enforce labels.

⚠️ Total Security with Issues: 2

Show Recommendations

Set pod-security.kubernetes.io/enforce=restricted on sensitive namespaces.
Optionally use enforce-version and audit labels.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	namespace/azure-store		No pod security labels
default	namespace/default		No pod security labels

SEC008 - Secrets in Environment Variables iDetects secrets exposed through environment variables.

✅ All Security are healthy.

Show Recommendations

Use secret volumes instead of env vars to reduce accidental exposure.
Avoid using valueFrom.secretKeyRef in env.
Limit permissions to read secrets.

Docs: Reference

SEC009 - Missing Capabilities Drop iChecks containers that do not drop all Linux capabilities via securityContext.capabilities.drop = ['ALL'].

⚠️ Total Security with Issues: 4

Show Recommendations

Set securityContext.capabilities.drop: ['ALL'] in container specs.
Allow only required capabilities via add list, if any.

Docs: Reference

Show Findings

Namespace	Resource	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Container order-service does not drop ALL capabilities
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Container product-service does not drop ALL capabilities
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Container rabbitmq does not drop ALL capabilities
azure-store	pod/store-front-698cc8c565-f5hp5	Container store-front does not drop ALL capabilities

SEC010 - HostPath Volume Usage iFlags pods that use hostPath volumes, which mount parts of the host filesystem and bypass isolation.

✅ All Security are healthy.

Show Recommendations

Remove hostPath volumes unless needed for host-level access.
Consider alternatives like persistent volume claims or configMaps.

Docs: Reference

SEC011 - Containers Running as UID 0 iFlags containers explicitly configured to run as UID 0.

⚠️ Total Security with Issues: 4

Show Recommendations

Set runAsUser to a non-root user ID.
Use runAsNonRoot: true for validation.

Docs: Reference

Show Findings

Namespace	Resource	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Container order-service runs as UID 0
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Container product-service runs as UID 0
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Container rabbitmq runs as UID 0
azure-store	pod/store-front-698cc8c565-f5hp5	Container store-front runs as UID 0

SEC012 - Added Linux Capabilities iFlags containers that add extra Linux capabilities using securityContext.capabilities.add.

✅ All Security are healthy.

Show Recommendations

Review and remove unnecessary capabilities.
Default to dropping all, then selectively add only what is needed.

Docs: Reference

SEC013 - EmptyDir Volume Usage iEmptyDir volumes are ephemeral and cleared on pod restart. Use only if data persistence is not needed.

✅ All Security are healthy.

Show Recommendations

Audit use of EmptyDir volumes in production workloads.
Replace with PVCs or other managed storage if persistence is needed.

Docs: Reference

SEC014 - Untrusted Image Registries iFlags images that do not come from trusted registries.

⚠️ Total Security with Issues: 3

Show Recommendations

Use approved internal or vendor-verified registries.
Restrict image pull policies using Gatekeeper or admission plugins.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	ghcr.io/azure-samples/aks-store-demo/order-service:latest	Image from untrusted registry in container order-service
azure-store	pod/product-service-77ff9f6fd6-rzcxj	ghcr.io/azure-samples/aks-store-demo/product-service:latest	Image from untrusted registry in container product-service
azure-store	pod/store-front-698cc8c565-f5hp5	ghcr.io/azure-samples/aks-store-demo/store-front:latest	Image from untrusted registry in container store-front

SEC015 - Pods Using Default ServiceAccount iFlags pods using the default service account, which may have broad permissions.

⚠️ Total Security with Issues: 4

Show Recommendations

Create and bind a custom ServiceAccount per application.
Avoid using the default ServiceAccount unless absolutely necessary.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	pod/order-service-65cc8855c-ghk9m	default	Pod uses default ServiceAccount
azure-store	pod/product-service-77ff9f6fd6-rzcxj	default	Pod uses default ServiceAccount
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	default	Pod uses default ServiceAccount
azure-store	pod/store-front-698cc8c565-f5hp5	default	Pod uses default ServiceAccount

SEC016 - Unconfined Seccomp Profiles iDetects pods or containers explicitly using the Unconfined seccomp profile.

✅ All Security are healthy.

Show Recommendations

Use RuntimeDefault or a vetted Localhost seccomp profile.
Remove any pod- or container-level Unconfined seccomp setting.
Make the seccomp profile explicit in the workload spec so the policy is reviewable.

Docs: Reference

SEC017 - Non-Default ProcMount iFlags containers that set procMount to a non-default value.

✅ All Security are healthy.

Show Recommendations

Set securityContext.procMount: Default or omit the field.
Review debugging and observability agents that rely on custom proc mounts.

Docs: Reference

SEC018 - Automounting API Credentials Enabled in ServiceAccounts iFlags ServiceAccounts where automounting of API credentials is enabled, affecting associated Pods.

✅ All Security are healthy.

Show Recommendations

Disable Automounting in ServiceAccounts

Add automountServiceAccountToken: false to the ServiceAccount spec.
Edit with kubectl edit serviceaccount <sa-name> -n <namespace>.
Ensure Pods needing API access override this in their spec with automountServiceAccountToken: true.
Use RBAC to limit ServiceAccount permissions if access is required.

Docs: Reference

SEC019 - Unsupported AppArmor Values iDetects AppArmor annotations or profile types that are not permitted by baseline Pod Security Standards.

✅ All Security are healthy.

Show Recommendations

Allowed values are runtime/default or localhost/* in annotations, and RuntimeDefault or Localhost for structured profiles.
Remove legacy or custom profile names that AKS Automatic baseline policy would reject.

Docs: Reference

SEC020 - Seccomp Profile Not Configured iDetects pods and containers that do not explicitly configure a seccomp profile.

⚠️ Total Security with Issues: 5

Show Recommendations

Set securityContext.seccompProfile.type: RuntimeDefault for the pod or each container.
If you need a custom profile, use Localhost and ensure the profile exists on the node.
Doing this avoids AKS Automatic seccomp warnings and makes the security posture explicit.

Docs: Reference

Show Findings

Namespace	Resource	Message
azure-store	pod/order-service-65cc8855c-ghk9m	Container order-service has no explicit seccomp profile
azure-store	pod/order-service-65cc8855c-ghk9m	Container wait-for-rabbitmq has no explicit seccomp profile
azure-store	pod/product-service-77ff9f6fd6-rzcxj	Container product-service has no explicit seccomp profile
azure-store	pod/rabbitmq-5dcdf9484-kvgw7	Container rabbitmq has no explicit seccomp profile
azure-store	pod/store-front-698cc8c565-f5hp5	Container store-front has no explicit seccomp profile

SEC021 - Host Ports in Pod Specs iDetects containers that bind host ports directly on the node.

✅ All Security are healthy.

Show Recommendations

Remove hostPort from container port definitions.
Use a Service or Ingress for north-south access.
Reserve host networking only for platform workloads that truly require it.

Docs: Reference

SEC022 - Non-Existent Secret References iFlags pods referencing Secrets that do not exist. This may cause runtime failures.

✅ All Security are healthy.

Show Recommendations

Check envFrom, secretKeyRef, and volume.secret.secretName references.
Create missing Secrets or remove invalid references.

Docs: Reference

SEC023 - Disallowed Sysctls iDetects sysctls outside the Kubernetes baseline Pod Security Standards allowlist.

✅ All Security are healthy.

Show Recommendations

Keep only baseline-allowed sysctls such as safe net.ipv4.ip_local_port_range or kernel.shm_rmid_forced.
Move node-level kernel tuning into node or image configuration where possible.

Docs: Reference

Kubernetes Warning Events

EVENT001 - Grouped Warning Events iGroups recent Warning events by Reason and Message.

⚠️ Total Events with Issues: 1

Show Recommendations

Group similar warnings to spot patterns.
Use kubectl describe and logs to investigate.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
(cluster)	event-group/FailedScheduling	4	0/3 nodes are available: 3 node(s) had untolerated taint {CriticalAddonsOnly: true}. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling.

EVENT002 - Full Warning Event Log iLists all recent Warning events in the cluster.

⚠️ Total Events with Issues: 4

Show Recommendations

Use kubectl describe to get full context.
Check logs for root cause.

Docs: Reference

Show Findings

Namespace	Resource	Value	Message
azure-store	events/order-service-65cc8855c-ghk9m.18a67e5d2136c2d5	Warning	Warning events found in recent Kubernetes logs
azure-store	events/product-service-77ff9f6fd6-rzcxj.18a67e5d208fd3fe	Warning	Warning events found in recent Kubernetes logs
azure-store	events/rabbitmq-5dcdf9484-kvgw7.18a67e5d26f82c69	Warning	Warning events found in recent Kubernetes logs
azure-store	events/store-front-698cc8c565-f5hp5.18a67e5d294b6844	Warning	Warning events found in recent Kubernetes logs

AKS Best Practices Results

✅ Passed: 30

❌ Failed: 13

📊 Total Checks: 43

🎯 Score: 69.77%

⭐ Rating: D

Show Best Practices (7/15 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSBP001	Allowed Container Images Policy Enforcement	High	Best Practices	❌ FAIL	false	Container image restriction policies are not enforced, allowing deployment of images from any registry including public registries, untrusted sources, or images with known vulnerabilities. This significantly increases supply chain attack risks and compliance violations.	Deploy the Azure Policy initiative 'Kubernetes cluster pod security restricted standards' and configure specific allowed container registries. Use `az policy assignment create` to assign the policy and set enforcement to 'deny' mode for production environments.	Learn More
AKSBP002	No Privileged Containers Policy Enforcement	High	Best Practices	❌ FAIL	false	Privileged container policies are not enforced, allowing workloads to run with full root privileges, access host devices, mount host file systems, and potentially escape container boundaries. This creates severe security risks and violates least-privilege principles.	Enable the 'Do not allow privileged containers' Azure Policy definition in enforce mode. Use Pod Security Standards with 'restricted' profile to block privileged containers and ensure security baseline compliance.	Learn More
AKSBP003	Multiple Node Pools	Medium	Best Practices	❌ FAIL	false	Single node pool configuration limits workload isolation, scaling flexibility, and security boundaries. All workloads share the same VM size, OS configuration, and scaling parameters, making it impossible to optimize for different application requirements or implement proper security zones.	Create separate node pools for different workload types using `az aks nodepool add --resource-group <rg> --cluster-name <cluster> --name <pool-name>`. Use system pools for system pods, user pools for applications, and specialized pools (GPU, memory-optimized) for specific workloads.	Learn More
AKSBP008	Auto Upgrade Channel Configured	Medium	Best Practices	❌ FAIL	false	Automatic cluster upgrades are disabled, leaving the cluster vulnerable to security patches, bug fixes, and Kubernetes version support expiration. Manual upgrade management increases operational overhead and delays critical security updates.	Configure auto upgrade using `az aks update --resource-group <rg> --name <cluster> --auto-upgrade-channel patch` for security patches or 'stable' for minor version updates. Use maintenance windows to control upgrade timing and minimize disruption.	Learn More
AKSBP009	Node OS Upgrade Channel Configured	Medium	Best Practices	❌ FAIL	false	Node OS automatic updates are disabled, leaving nodes running outdated OS versions with potential security vulnerabilities, missing security patches, and outdated system libraries. This increases the attack surface and compliance risks.	Enable node OS upgrade using `az aks update --resource-group <rg> --name <cluster> --node-os-upgrade-channel NodeImage` for automatic OS updates. Use 'SecurityPatch' for security-only updates or configure maintenance windows for controlled updates.	Learn More
AKSBP014	Use v5 or Newer SKU VMs for Node Pools	Medium	Best Practices	❌ FAIL	3	Node pools are using older VM generations (v4 or earlier) that have reduced performance, lack modern security features, don't support ephemeral OS disks by default, and may experience more frequent maintenance events affecting availability and reliability.	Upgrade to v5 or newer VM SKUs using `az aks nodepool add --vm-size Standard_D2s_v5` for new node pools. v5 SKUs provide better performance, support ephemeral OS disks by default, and have improved reliability during maintenance events and upgrades.	Learn More
AKSBP015	Deployment Safeguards Enabled	Medium	Best Practices	❌ FAIL	false	Deployment Safeguards are disabled, allowing non-compliant workloads to be deployed without validation of Kubernetes best practices. This leads to deployments without resource requests/limits, missing health probes, no anti-affinity rules, and other configuration issues that impact reliability and cost.	Enable Deployment Safeguards using `az aks update --resource-group <rg> --name <cluster> --safeguards-level Warning` for alerting or 'Enforcement' to block non-compliant deployments. This enforces best practices including resource requests, readiness/liveness probes, pod anti-affinity, and Pod Security Standards.	Learn More
AKSBP004	Azure Linux as Host OS	High	Best Practices	✅ PASS		No issues detected.	Migrate to Azure Linux by creating new node pools with `az aks nodepool add --os-sku AzureLinux`, then migrate workloads and delete old pools. Note: In-place OS SKU changes are not supported, requiring node pool replacement.	Learn More
AKSBP005	Ephemeral OS Disks Enabled	Medium	Best Practices	✅ PASS		No issues detected.	Enable ephemeral OS disks using `az aks nodepool add --os-disk-type Ephemeral` for new pools or plan node pool replacement. This provides faster disk I/O, lower latency, and reduced costs by using local VM storage instead of managed disks.	Learn More
AKSBP006	Non-Ephemeral Disks with Adequate Size	Medium	Best Practices	✅ PASS		No issues detected.	Increase OS disk size using `az aks nodepool update --resource-group <rg> --cluster-name <cluster> --name <nodepool> --os-disk-size-gb 128` or higher. Larger disks provide better IOPS performance and accommodate container image layers and temporary storage needs.	Learn More
AKSBP007	System Node Pool Taint	High	Best Practices	✅ PASS		No issues detected.	Apply system node pool taint using `az aks nodepool update --resource-group <rg> --cluster-name <cluster> --name <system-pool> --node-taints CriticalAddonsOnly=true:NoSchedule`. This ensures only critical system pods run on system nodes, improving reliability and resource isolation.	Learn More
AKSBP010	Customized MC_ Resource Group Name	Medium	Best Practices	✅ PASS		No issues detected.	Use a custom node resource group name during cluster creation with `az aks create --node-resource-group <custom-name>`. This cannot be changed after cluster creation, so plan accordingly for better resource organization and management.	Learn More
AKSBP011	System Node Pool Has Minimum Two Nodes	High	Best Practices	✅ PASS		No issues detected.	Scale system node pool to at least 2 nodes using `az aks nodepool scale --resource-group <rg> --cluster-name <cluster> --name <system-pool> --node-count 2`. Configure cluster autoscaler with --min-count 2 to ensure resiliency against node failures and maintenance events.	Learn More
AKSBP012	Node Pool Version Matches Control Plane	Medium	Best Practices	✅ PASS		No issues detected.	Upgrade node pools to match control plane version using `az aks nodepool upgrade --resource-group <rg> --cluster-name <cluster> --name <nodepool> --kubernetes-version <version>`. Plan coordinated upgrades to maintain version consistency and avoid compatibility issues.	Learn More
AKSBP013	No B-Series VMs in Node Pools	High	Best Practices	✅ PASS		No issues detected.	Replace B-series VMs with consistent performance SKUs like Standard_D2s_v5 or Standard_E2s_v5. Create new node pool with `az aks nodepool add --vm-size Standard_D2s_v5`, migrate workloads using `kubectl drain`, then delete old pool with `az aks nodepool delete`.	Learn More

Show Disaster Recovery (0/2 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSDR001	Agent Pools with Availability Zones	High	Disaster Recovery	✅ PASS		No issues detected.	Deploy node pools across availability zones using `az aks nodepool add --availability-zones 1 2 3 --resource-group <rg> --cluster-name <cluster> --name <pool>`. Ensure at least 3 zones are used for production workloads to achieve 99.95% SLA and protect against datacenter failures.	Learn More
AKSDR002	Control Plane SLA	Medium	Disaster Recovery	✅ PASS		No issues detected.	Upgrade to Standard tier using `az aks update --resource-group <rg> --name <cluster> --tier Standard` to get 99.95% uptime SLA, financially-backed availability guarantees, and improved support. This is essential for production workloads requiring high availability.	Learn More

Show Identity & Access (0/7 failed)

ID	Check	Severity	Category	Status	Fail Message	Recommendation	URL
AKSIAM001	RBAC Enabled	High	Identity & Access	✅ PASS	No issues detected.	Enable RBAC during cluster creation using `--enable-rbac` or for existing clusters via Azure Portal. Create RoleBindings and ClusterRoleBindings to assign appropriate permissions to users and service accounts based on the principle of least privilege.	Learn More
AKSIAM002	Managed Identity	High	Identity & Access	✅ PASS	No issues detected.	Create a user-assigned managed identity using `az identity create --resource-group <rg> --name <identity-name>` and associate it during cluster creation with `--assign-identity <identity-resource-id>`. This eliminates the need to manage service principal credentials and provides better security.	Learn More
AKSIAM003	Workload Identity Enabled	Medium	Identity & Access	✅ PASS	No issues detected.	Enable Workload Identity using `az aks update --resource-group <rg> --name <cluster> --enable-workload-identity` (requires OIDC issuer). Create Kubernetes service accounts and federate them with Azure managed identities for secure, token-based authentication to Azure services.	Learn More
AKSIAM004	Managed Identity Used	High	Identity & Access	✅ PASS	No issues detected.	Migrate from Service Principal to User-Assigned Managed Identity using `az aks update --resource-group <rg> --name <cluster> --assign-identity <identity-resource-id>`. This provides automatic credential rotation and eliminates the need to manage client secrets.	Learn More
AKSIAM005	AAD RBAC Authorization Integrated	High	Identity & Access	✅ PASS	No issues detected.	Enable Azure RBAC for Kubernetes authorization using `az aks update --resource-group <rg> --name <cluster> --enable-azure-rbac`. Assign built-in roles like 'Azure Kubernetes Service RBAC Reader/Writer/Admin' to users and groups for centralized access management through Azure AD.	Learn More
AKSIAM006	AAD Managed Authentication Enabled	High	Identity & Access	✅ PASS	No issues detected.	Enable Azure AD integration during cluster creation with `--enable-aad --aad-admin-group-object-ids <group-id>` or update existing cluster using `az aks update --resource-group <rg> --name <cluster> --enable-aad`. Configure admin groups and integrate with conditional access policies.	Learn More
AKSIAM007	Local Accounts Disabled	High	Identity & Access	✅ PASS	No issues detected.	Disable local accounts using `az aks update --resource-group <rg> --name <cluster> --disable-local-accounts`. This enforces authentication exclusively through Azure AD, eliminating certificate-based admin access and improving audit capabilities.	Learn More

Show Monitoring & Logging (0/2 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSMON001	Azure Monitor	High	Monitoring & Logging	✅ PASS		No issues detected.	Enable Azure Monitor Container Insights using `az aks enable-addons --resource-group <rg> --name <cluster> --addons monitoring --workspace-resource-id <workspace-id>` or through Azure Portal > Monitoring > Insights. Configure log retention (90+ days) and set up alerts for container failures and resource usage.	Learn More
AKSMON002	Managed Prometheus Enabled	High	Monitoring & Logging	✅ PASS		No issues detected.	Enable managed Prometheus using `az aks update --resource-group <rg> --name <cluster> --enable-azure-monitor-metrics` or via Azure Portal > Monitoring > Insights. Consider integrating with Azure Managed Grafana for advanced dashboards and setting up alerting rules for critical metrics.	Learn More

Show Networking (2/4 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSNET001	Authorized IP Ranges Configured (Public Clusters)	High	Networking	❌ FAIL	false	API server accepts connections from any internet IP address, creating a large attack surface for brute force attacks, credential stuffing, and vulnerability exploitation. This violates network security best practices and most compliance frameworks.	Configure authorized IP ranges using `az aks update --resource-group <rg> --name <cluster> --api-server-authorized-ip-ranges <ip-ranges>`. Include management networks, CI/CD systems, and jump boxes using CIDR notation. Alternatively, migrate to a private cluster for enhanced security.	Learn More
AKSNET003	Web App Routing Enabled	Low	Networking	❌ FAIL	false	Web App Routing add-on is disabled, requiring manual ingress controller management, DNS configuration, and SSL certificate handling. This increases operational overhead and may lead to inconsistent external access patterns and security configurations.	Enable Web App Routing using `az aks enable-addons --resource-group <rg> --name <cluster> --addons web_application_routing`. Configure DNS zones and SSL certificates for automatic ingress management. Consider using Application Gateway Ingress Controller (AGIC) for enterprise scenarios.	Learn More
AKSNET002	Network Policy Check	Medium	Networking	✅ PASS		No issues detected.	Enable network policy during cluster creation with `--network-policy azure` (Azure CNI) or `--network-policy calico` (kubenet). Create NetworkPolicy resources to define ingress/egress rules for pods, implementing micro-segmentation and zero-trust networking principles.	Learn More
AKSNET004	Azure CNI with Cilium Dataplane Recommended	Medium	Networking	✅ PASS		No issues detected.	For new clusters, use `--network-plugin azure --network-dataplane cilium --network-plugin-mode overlay` for optimal performance. Azure CNI powered by Cilium provides eBPF-based packet processing, better scalability, and advanced L3-L7 network policies. Existing clusters should migrate by creating a new cluster with Cilium enabled.	Learn More

Show Resource Management (1/5 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSRES002	AKS Built-in Cost Tooling Enabled	Medium	Resource Management	❌ FAIL	false	Cost analysis and OpenCost integration is disabled, providing no visibility into per-namespace, per-workload, or per-application spending. This makes it impossible to implement cost allocation, identify expensive workloads, optimize resource usage, or implement chargeback policies for different teams.	Enable cost analysis using `az aks update --resource-group <rg> --name <cluster> --enable-cost-analysis` to track namespace and workload-level costs. Use the cost insights to identify expensive workloads, optimize resource requests, and implement chargeback/showback policies.	Learn More
AKSRES001	Cluster Autoscaler	Medium	Resource Management	✅ PASS		No issues detected.	Enable Cluster Autoscaler using `az aks update --resource-group <rg> --name <cluster> --enable-cluster-autoscaler --min-count <min> --max-count <max>` on node pools. Configure appropriate min/max node counts, scale-down parameters, and node pool priorities for optimal cost and performance balance.	Learn More
AKSRES003	Vertical Pod Autoscaler (VPA) is enabled	Medium	Resource Management	✅ PASS		No issues detected.	Enable VPA using `az aks update --resource-group <rg> --name <cluster> --enable-vpa`. Deploy VPA objects with 'updateMode: Auto' or 'Off' for recommendations only. Monitor VPA recommendations and adjust application resource requests/limits accordingly for better resource efficiency.	Learn More
AKSRES004	KEDA (Event-Driven Autoscaling) Enabled	Low	Resource Management	✅ PASS		No issues detected.	Enable KEDA using `az aks update --resource-group <rg> --name <cluster> --enable-keda`. Deploy ScaledObject resources to define event sources (Azure Queue, Service Bus, Kafka, HTTP, etc.) and scaling behavior. KEDA complements HPA by enabling scale-to-zero and event-driven scaling patterns.	Learn More
AKSRES005	Node Auto-provisioning or Cluster Autoscaler Configured	High	Resource Management	✅ PASS		No issues detected.	Enable Node Auto-provisioning using `az aks update --resource-group <rg> --name <cluster> --node-provisioning-mode Auto` for Karpenter-based dynamic provisioning. Alternatively, enable Cluster Autoscaler with `az aks update --enable-cluster-autoscaler`. NAP is recommended for modern workloads with diverse resource requirements.	Learn More

Show Security (3/8 failed)

ID	Check	Severity	Category	Status	Observed Value	Fail Message	Recommendation	URL
AKSSEC001	Private Cluster	High	Security	❌ FAIL	false	API server is publicly accessible from the internet, exposing your cluster to potential attacks, unauthorized access attempts, and compliance violations. This creates a significant security risk as attackers can attempt to exploit Kubernetes API vulnerabilities.	Configure as a private cluster using `az aks create --enable-private-cluster` or `az aks update --enable-private-cluster` for existing clusters. This routes API server traffic through private endpoints within your VNet. Configure private DNS zones and ensure network connectivity from management machines.	Learn More
AKSSEC006	Image Cleaner Enabled	Medium	Security	❌ FAIL	false	Image Cleaner is disabled, allowing stale and potentially vulnerable container images to accumulate on node disks. This increases storage costs, extends attack surface with outdated images containing known CVEs, and can impact node performance due to disk space consumption.	Enable Image Cleaner using `az aks update --resource-group <rg> --name <cluster> --enable-image-cleaner`. Configure cleaning interval and retention policies to automatically remove unused container images and reduce attack surface.	Learn More
AKSSEC008	Pod Security Admission Enabled	High	Security	❌ FAIL	false	Pod Security Admission is not configured on this cluster, meaning there are no built-in Kubernetes security controls to prevent insecure pod configurations. Without PSA, pods can run with dangerous settings like privileged mode, host network access, or unsafe capabilities, increasing container escape risks.	Configure Pod Security Admission by setting pod security standards on namespaces. Use `kubectl label namespace <namespace> pod-security.kubernetes.io/enforce=restricted pod-security.kubernetes.io/audit=restricted pod-security.kubernetes.io/warn=restricted` for production namespaces. Consider 'baseline' for less restrictive environments. This is separate from Azure Policy and provides Kubernetes-native security controls.	Learn More
AKSSEC002	Azure Policy Add-on	Medium	Security	✅ PASS		No issues detected.	Enable Azure Policy add-on using `az aks enable-addons --resource-group <rg> --name <cluster> --addons azure-policy`. Deploy built-in policy initiatives like 'Kubernetes cluster pod security restricted standards' and create custom policies for your organization's requirements.	Learn More
AKSSEC003	Defender for Containers	High	Security	✅ PASS		No issues detected.	Enable Defender for Containers using `az aks update --resource-group <rg> --name <cluster> --enable-defender` or through Security Center in Azure Portal. Configure vulnerability scanning, runtime threat detection, and compliance monitoring for comprehensive container security.	Learn More
AKSSEC004	OIDC Issuer Enabled	Medium	Security	✅ PASS		No issues detected.	Enable OIDC issuer using `az aks update --resource-group <rg> --name <cluster> --enable-oidc-issuer`. This enables workload identity federation, allowing pods to authenticate to Azure services using service account tokens instead of secrets.	Learn More
AKSSEC005	Azure Key Vault Integration	High	Security	✅ PASS		No issues detected.	Enable Key Vault CSI driver using `az aks enable-addons --resource-group <rg> --name <cluster> --addons azure-keyvault-secrets-provider`. Create SecretProviderClass resources to mount secrets, certificates, and keys from Azure Key Vault as volumes in pods.	Learn More
AKSSEC007	Kubernetes Dashboard Disabled	High	Security	✅ PASS		No issues detected.	Disable the Kubernetes dashboard using `az aks disable-addons --addons kube-dashboard --resource-group <rg> --name <cluster>`. Use Azure Portal, kubectl, or other secure management tools instead. If dashboard access is required, implement proper authentication and network restrictions.	Learn More

AKS Automatic Migration Readiness

AKS Automatic Migration Readiness Not Ready

Not Ready - Fix blocker findings before migrating workloads to a new AKS Automatic cluster.

🚫 Blockers: 1

⚠️ Warnings: 3

✅ Aligned Checks: 8

This view is derived from existing Kubernetes and AKS shared checks and focuses on readiness for a new AKS Automatic cluster.

Open detailed AKS Automatic action plan

Fix Before Migration

ID	Check	Affected	Recommendation	Examples
POD007	Container images do not use latest tag	4	Specify an explicit image tag (e.g., ':v1.2.3') on every container and initContainer to ensure consistent deployments.	pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/store-front-698cc8c565-f5hp5

Warnings

ID	Check	Affected	Recommendation	Examples
SEC003	Pods Running as Root	12	Avoid running pods as root by explicitly setting runAsUser to a non-zero UID in pod or container securityContext.	pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/rabbitmq-5dcdf9484-kvgw7, pod/store-front-698cc8c565-f5hp5
SEC020	Seccomp Profile Not Configured	5	Set seccompProfile.type to RuntimeDefault or Localhost at the pod or container level.	pod/order-service-65cc8855c-ghk9m, pod/product-service-77ff9f6fd6-rzcxj, pod/rabbitmq-5dcdf9484-kvgw7, pod/store-front-698cc8c565-f5hp5
WRK007	Missing Readiness and Liveness Probes	4	Add readiness and liveness probes to all containers to improve availability and fault detection.	deployment/order-service, deployment/product-service, deployment/rabbitmq, deployment/store-front

Menu

Cluster Overview

Cluster Health Score

API Server Health

Passed / Failed Checks

Top 5 Improvements

Issue Summary

Rightsizing at a Glance

Node Insights

Pod Actions

Impact Summary

Excluded Namespaces iThese namespaces are excluded from analysis and reporting.

Cluster Summary

Cluster Metrics Summary iSummary of metrics including node and pod counts, warnings, and issues.

Pod Distribution iAverage, min, and max pods per node and total node count.

Cluster Health Metrics (Last 24h) i 24-hour Prometheus averages and charts for cluster CPU and memory usage.

Cluster CPU Usage (%)

Cluster Memory Usage (%)

Cluster Events

Node Conditions & Resources

NODE001 - Node Readiness and Conditions iDetects nodes that are not in Ready state or reporting other warning conditions.

NODE002 - Node Resource Pressure (Last 24h) iDetects nodes under high CPU, memory, or disk pressure.Data source: Prometheus (24h average)

NODE003 - Max Pods per Node iAlerts when any node is running too many pods according to configured thresholds.

PROM005 - Overcommitted CPU (Prometheus) iChecks if CPU requests on nodes exceed allocatable capacity over the last 24 hours.

PROM006 - Node Sizing Insights (Prometheus) iUses Prometheus p95 CPU and memory usage over a fixed 7-day window to highlight underutilized or saturated nodes and suggest sizing actions.

Node Sizing Guidance

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

CPU Usage (%)

Memory Usage (%)

Disk Usage (%)

Namespaces

NS001 - Empty Namespaces iFinds namespaces with no running pods.

NS002 - Missing or Weak ResourceQuotas iDetects namespaces with missing or incomplete ResourceQuota definitions.

NS003 - Missing LimitRanges iDetects namespaces without a defined LimitRange.

NS004 - Pods in Default Namespace iFlags pods running in the default namespace.

Workloads

WRK001 - DaemonSets Not Fully Running iDetects DaemonSets that have fewer ready pods than desired.

WRK002 - Deployment Missing Replicas iDetects Deployments where available replicas are less than desired.

WRK003 - StatefulSet Incomplete Rollout iDetects StatefulSets with fewer ready replicas than desired.

WRK004 - HPA Misconfiguration or Inactivity iChecks for HPAs with missing targets, metrics, or inactive scaling.

WRK005 - Missing Resource Requests iChecks that every container has explicit CPU and memory requests.

WRK006 - PDB Coverage and Effectiveness iDetects missing or weak PodDisruptionBudgets.

WRK007 - Missing Readiness and Liveness Probes iDetects containers without readiness or liveness probes.

WRK008 - Deployment Selector Without Matching Pods iDetects Deployments whose selectors do not match any existing pods.

WRK009 - Deployment, Pod, and Service Label Consistency iValidates that deployments, pods, and services use aligned labels and selectors.

WRK010 - HPA Metrics Without Matching Resource Requests iDetects HPAs that scale on CPU or memory metrics when target containers lack matching requests.

WRK011 - VPA Update Mode and Declarative Resource Conflict Risk iFlags VPAs in Auto/Recreate mode that may conflict with declarative resource ownership or HPAs.

WRK012 - PodDisruptionBudget Adequacy for Replicated Workloads iValidates that replicated workloads have matching PDBs with sensible settings.

WRK013 - CrashLoopBackOff and OOMKilled Guardrail iFlags pods with CrashLoopBackOff, OOMKilled state, or high restart counts.

WRK014 - Missing Memory Limits iChecks that every container has an explicit memory limit.

WRK015 - Replicated Workloads Missing Spread Constraints iDetects replicated workloads that define neither anti-affinity nor topology spread constraints.

Pods

POD001 - Pods with High Restarts iDetects pods that have restarted more than the configured thresholds.

POD002 - Long Running Pods iFlags pods that have been running longer than configured thresholds.

POD003 - Failed Pods iDetects pods in a failed phase, typically due to startup errors, crashes, or misconfiguration.

POD004 - Pending Pods iDetects pods stuck in a 'Pending' state due to scheduling or resource issues.

POD005 - CrashLoopBackOff Pods iIdentifies pods stuck in a CrashLoopBackOff state due to repeated container crashes.

POD006 - Leftover Debug Pods iDetects pods created by kubectl debug that have not been cleaned up.

POD007 - Container images do not use latest tag iFlags containers using latest or no explicit tag.

🛠️ Use Specific Image Tags

POD008 - Automounting API Credentials Enabled in Pods iFlags pods that do not explicitly disable service account token automounting.

🛠️ Disable Automounting API Credentials

PROM001 - High CPU Pods (Prometheus) iChecks for pods with sustained high CPU usage over the last 24 hours using Prometheus metrics.

🛠️ Investigate High CPU Pods

PROM002 - High Memory Usage Pods (Prometheus) iDetects pods with high memory usage over the last 24 hours based on Prometheus metrics.

🛠️ Investigate High Memory Pods

PROM003 - High Network Receive Rate (Prometheus) iDetects pods receiving large amounts of network traffic over the last 24 hours.

🛠️ Investigate Network Receive Rate

PROM007 - Pod Sizing Insights (Prometheus) iGenerates per-container CPU and memory sizing recommendations from fixed 7-day p95 Prometheus usage.

Pod Sizing Guidance

Jobs

JOB001 - Stuck Kubernetes Jobs iFinds Jobs that have started but not completed within the threshold.

JOB002 - Failed Kubernetes Jobs iDetects jobs with failures and no successful completions.

Networking

NET001 - Services Without Endpoints iIdentifies services that have no backing endpoints.

NODE002 - Node Resource Pressure (Last 24h) iDetects nodes under high CPU, memory, or disk pressure.

Data source: Prometheus (24h average)