# Large-Scale Cluster Parameter Description
For large-scale deployments, refer to the following parameter configuration.
## Kubean Cluster Parameters
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster1-demo-vars-conf
namespace: kubean-system
data:
group_vars.yml: |
gcr_image_repo: "gcr.m.daocloud.io"
kube_image_repo: "k8s.m.daocloud.io"
docker_image_repo: "docker.m.daocloud.io"
quay_image_repo: "quay.m.daocloud.io"
github_image_repo: "ghcr.m.daocloud.io"
files_repo: "https://files.m.daocloud.io"
kubeadm_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kubeadm_version }}/bin/linux/{{ image_arch }}/kubeadm"
kubectl_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubectl"
kubelet_download_url: "{{ files_repo }}/dl.k8s.io/release/{{ kube_version }}/bin/linux/{{ image_arch }}/kubelet"
cni_download_url: "{{ files_repo }}/github.com/containernetworking/plugins/releases/download/{{ cni_version }}/cni-plugins-linux-{{ image_arch }}-{{ cni_version }}.tgz"
crictl_download_url: "{{ files_repo }}/github.com/kubernetes-sigs/cri-tools/releases/download/{{ crictl_version }}/crictl-{{ crictl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"
etcd_download_url: "{{ files_repo }}/github.com/etcd-io/etcd/releases/download/{{ etcd_version }}/etcd-{{ etcd_version }}-linux-{{ image_arch }}.tar.gz"
calicoctl_download_url: "{{ files_repo }}/github.com/projectcalico/calico/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
calicoctl_alternate_download_url: "{{ files_repo }}/github.com/projectcalico/calicoctl/releases/download/{{ calico_ctl_version }}/calicoctl-linux-{{ image_arch }}"
calico_crds_download_url: "{{ files_repo }}/github.com/projectcalico/calico/archive/{{ calico_version }}.tar.gz"
helm_download_url: "{{ files_repo }}/get.helm.sh/helm-{{ helm_version }}-linux-{{ image_arch }}.tar.gz"
crun_download_url: "{{ files_repo }}/github.com/containers/crun/releases/download/{{ crun_version }}/crun-{{ crun_version }}-linux-{{ image_arch }}"
kata_containers_download_url: "{{ files_repo }}/github.com/kata-containers/kata-containers/releases/download/{{ kata_containers_version }}/kata-static-{{ kata_containers_version }}-{{ ansible_architecture }}.tar.xz"
runc_download_url: "{{ files_repo }}/github.com/opencontainers/runc/releases/download/{{ runc_version }}/runc.{{ image_arch }}"
containerd_download_url: "{{ files_repo }}/github.com/containerd/containerd/releases/download/v{{ containerd_version }}/containerd-{{ containerd_version }}-linux-{{ image_arch }}.tar.gz"
nerdctl_download_url: "{{ files_repo }}/github.com/containerd/nerdctl/releases/download/v{{ nerdctl_version }}/nerdctl-{{ nerdctl_version }}-{{ ansible_system | lower }}-{{ image_arch }}.tar.gz"
cri_dockerd_download_url: "{{ files_repo }}/github.com/Mirantis/cri-dockerd/releases/download/v{{ cri_dockerd_version }}/cri-dockerd-{{ cri_dockerd_version }}.{{ image_arch }}.tgz"
yq_download_url: "{{ files_repo }}/github.com/mikefarah/yq/releases/download/{{ yq_version }}/yq_linux_{{ image_arch }}"
download_run_once: true
download_localhost: true
download_container: false
## etcd parameters
etcd_deployment_type: kubeadm
etcd_events_cluster_setup: true
etcd_heartbeat_interval: 250
etcd_election_timeout: 5000
## kube-controller-manager parameters
kube_controller_node_monitor_grace_period: 20s
kube_controller_node_monitor_period: 2s
kube_kubeadm_controller_extra_args:
kube-api-qps: 20
kube-api-burst: 30
concurrent-deployment-syncs: 5
pvclaimbinder-sync-period: 15s
## kube-scheduler parameters
kube_scheduler_config_extra_opts:
percentageOfNodesToScore: 0
## kube-apiserver parameters
kube_apiserver_pod_eviction_not_ready_timeout_seconds: 30
kube_apiserver_pod_eviction_unreachable_timeout_seconds: 30
kube_apiserver_request_timeout: 1m0s
kube_kubeadm_apiserver_extra_args:
max-requests-inflight: 400
## kubelet parameters
kubelet_status_update_frequency: 4s
kubelet_max_pods: 110
kubelet_pod_pids_limit: -1
kubelet_cpu_manager_policy: static
kubelet_cpu_manager_policy_options:
full-pcpus-only: "true"
kubelet_topology_manager_policy: single-numa-node
kubelet_topology_manager_scope: container
kubelet_config_extra_args:
kubeAPIQPS: 50
kubeAPIBurst: 100
serializeImagePulls: false
maxParallelImagePulls: 5
volumeStatsAggPeriod: 1m
kube_reserved: true
kube_master_cpu_reserved: 1
kube_master_memory_reserved: 2G
system_reserved: true
system_master_cpu_reserved: 1
system_master_memory_reserved: 2G
## kubeproxy
kube_proxy_mode: ipvs
## Cluster network
kube_network_plugin: calico
calico_cni_name: calico
kube_pods_subnet: 10.233.64.0/18
kube_network_node_prefix: 24
kube_network_node_prefix_ipv6: 120
kube_service_addresses: 10.233.0.0/18
## App network
dns_replicas: 3
dns_cpu_limit: 300m
dns_cpu_requests: 100m
dns_memory_limit: 300Mi
dns_memory_requests: 70Mi
enable_nodelocaldns: true
kube_vip_enabled: true
kube_vip_controlplane_enabled: true
kube_vip_arp_enabled: true
kube_proxy_strict_arp: true
kube_vip_address: 10.42.42.42
metrics_server_enabled: true
retry_stagger: 60
cluster_id: 10.42.42.2
```
## Large-Scale Deployment Parameters
| Category | Parameter | Value | Description |
| --- | ---- | -- | --- |
| Resource Distribution | foo_image_repo | url | Set to point to an intranet address or mirror site |
| | foo_download_url | url | Set to point to an intranet address or mirror site |
| | download_run_once | true/false | Set to `download_localhost: true` to download only once, then distribute from the Ansible control node to each target node |
| | download_localhost | true/false | Set to `download_localhost: true` to download only once, then distribute from the Ansible control node to each target node |
| | download_container | true/false | Set to `download_container: false` to avoid synchronizing large-scale images on different nodes |
| Core Cluster Components - etcd | etcd_events_cluster_setup | true/false | Set to true to store events in a separate dedicated etcd instance |
| | etcd_heartbeat_interval | Default 250, in milliseconds | Frequency at which the leader notifies the followers |
| | etcd_election_timeout | Default 5000, in milliseconds | Time a follower node waits before attempting to become the leader if it hasn't heard a heartbeat |
| Core Cluster Components - kube-controller-manager | kube_controller_node_monitor_grace_period | Default 40s | Time allowed for a node to be unresponsive before being marked as unhealthy; must be a multiple of `kubelet_status_update_frequency` |
| | kube_controller_node_monitor_period | Default 5s | Interval for synchronizing NodeStatus |
| | kube_kubeadm_controller_extra_args | Sub-elements | kube-api-qps: Default 20, QPS used for communication with kube-apiserver
kube-api-burst: Default 30, burst allowed when communicating with kube-apiserver
concurrent-deployment-syncs: Default 5, number of deployment objects allowed to sync concurrently. Other basic resources have similar parameters
pvclaimbinder-sync-period: Default 15s, interval for synchronizing PV and PVC |
| Core Cluster Components - kube-scheduler | kube_scheduler_config_extra_opts | Sub-elements | percentageOfNodesToScore: If the cluster size is 500 nodes and this value is 30, the scheduler stops looking for more feasible nodes after finding 150. When set to 0, a default percentage (5%-50% based on cluster size) of nodes will be scored. Use a low setting only if you prefer to select any schedulable node to run the Pod. |
| Core Cluster Components - kube-apiserver | kube_apiserver_pod_eviction_not_ready_timeout_seconds | Default 300 | Toleration seconds for notReady:NoExecute; by default, this time is added to each pod without this toleration |
| | kube_apiserver_pod_eviction_unreachable_timeout_seconds | Default 300 | Toleration seconds for unreachable:NoExecute; by default, this time is added to each pod without this toleration |
| | kube_apiserver_request_timeout | Default 1m0s | Can limit some large requests, such as certain resources in all namespaces |
| | kube_kubeadm_apiserver_extra_args | Sub-elements | max-requests-inflight: Default 400, limits the maximum number of ongoing non-mutating requests |
| Core Cluster Components - kubelet | kubelet_status_update_frequency | Default 10s | Frequency of reporting pod status to the apiserver; it is recommended to increase this value in large clusters |
| | kubelet_max_pods | Default 110 | Increases the maximum number of pods that can be created on each node |
| | kubelet_pod_pids_limit | - | Prevents or allows pods to use a large number of PIDs, range: [-1, 2^63-1] |
| | kubelet_cpu_manager_policy | - | Sets the CPU manager policy |
| | kubelet_cpu_manager_policy_options | - | Sets options for the CPU manager policy |
| | kubelet_topology_manager_policy | - | Sets the topology manager policy |
| | kubelet_topology_manager_scope | - | Sets the scope of the topology manager policy |
| | kube_reserved | true/false | Setting `kube_reserved: true` means allocating resources for non-Kubernetes components |
| | kube_master_cpu_reserved | - | |
| | kube_master_memory_reserved | - | |
| | system_reserved | true/false | Setting `system_reserved: true` means allocating resources for Kubernetes components |
| | system_master_cpu_reserved | - | |
| | system_master_memory_reserved | - | |
| | kubelet_config_extra_args | Sub-elements | kubeAPIQPS: Default 50, QPS used for communication with kube-apiserver
kubeAPIBurst: Default 100, burst allowed when communicating with kube-apiserver
serializeImagePulls: Default true, pulls only one image at a time
maxParallelImagePulls: Default nil, maximum number of parallel pulls; effective only when serializeImagePulls is false
volumeStatsAggPeriod: Default 1m, recommended to increase in cases of many volumes and high disk pressure |
| Kubeproxy | kube_proxy_mode | - | In scenarios with frequent service changes, `ipvs` performs better than `iptables`. Setting kube proxy mode to ipvs requires a Linux kernel version of 5.9 or higher. Note that Kube-Proxy IPVS also has some issues |
| Cluster Network Parameters | kube_pods_subnet | 10.233.64.0/18 | Increases the network allocation for pods |
| | kube_network_node_prefix | 24 | Increases the subnet range that each node can allocate to pods |
| | kube_network_node_prefix_ipv6 | 120 | Increases the subnet range that each node can allocate to pods |
| | kube_service_addresses | 10.233.0.0/18 | Increases the network allocation for K8s service ClusterIP |
| Application Stability | dns_replicas | - | Specifies the number of DNS service replicas |
| | dns_cpu_limit | - | Maximum CPU resources that each DNS service pod can use |
| | dns_cpu_requests | - | Minimum CPU resources that each DNS service pod can use |
| | dns_memory_limit | - | Maximum memory resources that each DNS service pod can use |
| | dns_memory_requests | - | Minimum memory resources that each DNS service pod can use |
| | enable_nodelocaldns | - | Setting `enable_nodelocaldns: true` allows pods to connect to a DNS (core-dns) cache agent running on the same node, avoiding the use of iptables DNAT rules and connection tracking |
| | kube_vip_enabled | - | Setting `kube_vip_enabled: true` provides a virtual IP and load balancer for the cluster, used for the control plane (to build a highly available cluster) and Kubernetes services of type LoadBalancer |
| | metrics_server_enabled | - | Setting `metrics_server_enabled: true` is a prerequisite for starting HPA |
| Others | retry_stagger | - | Increases the number of retry attempts for failed tasks |
## Recommendations for Different Scenarios
### Fast Update and Fast Reaction
**Parameter Settings:**
- `kubelet_status_update_frequency` set to 4s
- `kube_controller_node_monitor_period` set to 2s (default 5s)
- `kube_controller_node_monitor_grace_period` set to 20s (default 40s)
- `kube_apiserver_pod_eviction_unreachable_timeout_seconds` set to 30 (default 300s)
In this scenario, Pods will be evicted within 50 seconds because the node will be considered down
after 20 seconds, and `kube_apiserver_pod_eviction_not_ready_timeout_seconds` or
`kube_apiserver_pod_eviction_unreachable_timeout_seconds` will occur after 30 seconds.
However, this setup will impose a load on etcd, as each node will attempt to update its status every 2 seconds.
**If the environment has 1000 nodes, there will be 15000 node updates per minute, potentially requiring large etcd containers or even dedicated etcd nodes.**
### Medium Update and Average Reaction
**Parameter Settings:**
- `kubelet_status_update_frequency` set to 20s
- `kube_controller_node_monitor_grace_period` set to 2m
- `kube_apiserver_pod_eviction_not_ready_timeout_seconds` and `kube_apiserver_pod_eviction_unreachable_timeout_seconds` set to 60.
In this scenario, Kubelet will attempt every 20 seconds.
Therefore, the Kubernetes controller manager will take 6 * 5 = 30 attempts to consider the node unhealthy.
After 1 minute, it will evict all Pods. The total time before eviction is 3 minutes.
**This scenario is suitable for medium environments since 1000 nodes require 3000 etcd updates per minute.**
## Other Considerations
When deploying Calico or Canal, you can add `calico_rr` nodes in the Kubean host manifest,
which allows for quick recovery from host/network interruptions.
You need to configure the cluster_id (formatted as an IPv4 address).
**Host Manifest Example:**
```yaml
apiVersion: kubean.io/v1alpha1
kind: Cluster
metadata:
name: cluster1-demo
spec:
hostsConfRef:
namespace: kubean-system
name: cluster1-demo-hosts-conf
varsConfRef:
namespace: kubean-system
name: cluster1-demo-vars-conf
---
apiVersion: v1
ind: ConfigMap
etadata:
name: cluster1-demo-hosts-conf
namespace: kubean-system
ata:
hosts.yml: |
all:
hosts:
node1:
ansible_connection: ssh
ansible_host: 10.42.42.2
ansible_user: root
ansible_ssh_pass: dangerous
node2:
ansible_connection: ssh
ansible_host: 10.42.42.3
ansible_user: root
ansible_ssh_pass: dangerous
node3:
ansible_connection: ssh
ansible_host: 10.42.42.4
ansible_user: root
ansible_ssh_pass: dangerous
children:
kube_control_plane:
hosts:
node1:
node2:
node3:
kube_node:
hosts:
node1:
node2:
node3:
etcd:
hosts:
node1:
node2:
node3:
k8s_cluster:
children:
kube_control_plane:
kube_node:
calico_rr:
hosts:
node1:
node2:
node3:
```
- The attributes of the Ansible configuration file can be set in the ClusterOperation file
of Kubean to configure concurrency and connection timeout.
- Concurrency: `forks: 50`
- Connection Timeout: `timeout: 600`