## NFV Service Density Benchmarking ## Introduction This note describes the benchmarking of VNF and CNF based software-based network services running on a single compute node, referred to as NFV service density benchmarking. FD.io VPP is used as the open-source Network Function (NF). NF(s) are running either within the VM(s), referred to as VNF(s), or within the Docker Container(s), referred to as CNF(s). Ethernet frames are demultiplexed and multiplexed from/to the two physical 10GbE interfaces thru a Linux User-Mode Software Switch using FD.io VPP again. The same version of FD.io VPP application running in VNFs and CNFs are configured as a IPv4 routing Network Fuction, routed-forwarding between two (virtual)software interfaces, virtio in VNFs and memif in CNFs. The same version of FD.io VPP application, running Linux User-Mode as a (virtual) software Switch, is configured as a Ethernet L2 Bridge with in-line dataplane MAC learning, L2 switched-forwarding between multiple (virtual) software interfaces, vhostuser for inter-connected VNFs and memif for inter-connected CNFs. ## Environments Benchmarked physical test environments: 1. FD.io CSIT 2n-skx testbed t22 (Xeon Platinum 8180) 2. [Equinix Metal](https://metal.equinix.com/) 2n-skx testbed (Xeon Gold 6150) ADD outputs of lspci for above ADD number of usable cores for above system switch NFs ## NFV Service Topologies Benchmarked NFV service topologies: 1. VNF Service Chain (VSC) topology with Snake Forwarding 2. CNF Service Chain (CSC) topology with Snake Forwarding 3. CNF Service Pipeline (CSP) topology with Pipeline Forwarding ## Cores Allocation ### Linux User-Mode Switch A single instance of Linux User-Mode Software (SW) Switch is running in a compute node. Every performance optimized SW Switch application has two sets of software threads: i) Main threads, handling Switch application management and control planes, and ii) Dataplane threads, handling dataplane packet processing and forwarding. This applies to FD.io VPP used in this benchmarking. Allocation of processor physical cores to the software switch is as follows: 1. Two mapping ratios are defined and used in software switch benchmarking: * `pcdr4sw` value determines Physical Core to Dataplane Ratio for SWitch. * `pcmr4sw` value determines Physical Core to Main Ratio for SWitch. 2. Target values to be benchmarked: * pcdr4sw = [(1:1),(2:1),(4:1)]. * pcmr4sw = [(1:1),(1:2)]. 3. Number of physical cores required for the benchmarked software switch is calculated as follows: * #pc = pcdr4sw * #dsw + pcmr4sw * #msw * where * #pc - total number of physical cores required and used. * #dsw - total number of switch dataplane thread sets (1 set per SW switch). * #msw - total number of switch main thread sets (1 set per SW switch). ### CNFs and VNFs Multiple instances of NFs (CNFs or VNFs) are running in a compute node. Every performance optimized NF has two sets of software threads: i) Main threads, handling NF application management and control planes, and ii) Dataplane threads, handling NF dataplane packet processing and forwarding. This applies to FD.io VPP used in this benchmarking. Allocation of processor physical cores per NF instance is as follows: 1. Two mapping ratios are defined and used in NF service matrix benchmarking: a. `pcdr4nf` value determines Physical Core to Dataplane Ratio for NF. b. `pcmr4nf` value determines Physical Core to Main Ratio for NF. 2. Target values to be benchmarked: a. pcdr4nf = [(1:1),(1:2),(1:4)]. b. pcmr4nf = [(1:2),(1:4),(1:8)]. 3. Number of physical cores required for the benchmarked NFs' service matrix is calculated as follows: * #pc = pcdr4nf * #dnf + pcmr4nf * #mnf * where * #pc - total number of physical cores required and used. * #dnf - total number of NF dataplane thread sets (1 set per NF instance). * #mnf - total number of NF main thread sets (1 set per per NF instance). ## Service Density Matrix – Network Function View ``` Row: 1..10 number of network service instances Column: 1..10 number of network functions per service instance Value: 1..100 total number of network functions within node ``` ``` SVC 001 002 004 006 008 010 001 1 2 4 6 8 10 002 2 4 8 12 16 20 004 4 8 16 24 32 40 006 6 12 24 36 48 60 008 8 16 32 48 64 80 010 10 20 40 60 80 100 ``` ## Service Density Matrix – Core Usage View ``` Row: 1..10 number of network service instances Column: 1..10 number of network functions per service instance Value: 1..NN number of physical processor cores used Cores Numa0: pcdr4sw = (1:1), pcmr4sw = (1:1) pcdr4nf = (1:1), pcmr4nf = (1:2) Cores Numa1: not used ``` ``` SVC 001 002 004 006 008 010 001 2 3 6 9 12 15 002 3 6 12 18 24 30 004 6 12 24 36 48 60 006 9 18 36 54 72 90 008 12 24 48 72 96 120 010 15 30 60 90 120 150 ``` ## Methodology - MRR Throughput MRR tests measure the packet forwarding rate under the maximum load offered by traffic generator over a set trial duration, regardless of packet loss. Maximum load for specified Ethernet frame size is set to the bi-directional link rate. ## Service Density Matrix – MRR Throughput Results (L2 size=64B) * Maximum Receive Rate (MRR) throughput results is measured in [Mpps] * [Mpps] mega (millions) packets-per-second * Encapsulation: IPv4 over untagged Ethernet * IPv4 size: 46 Bytes * Ethernet frame size: 64 Bytes ### FD.io CSIT 2n-skx, pcdr4sw = (1:1) ``` Testbed: t22 Row: 1..10 number of network service instances Column: 1..10 number of network functions (VNF or CNF) per service instance Value: x.y MRR throughput in [Mpps] x.y* `*` Indicates many retries due to failing nfvbench warm-up phase used to verify service forwarding path ??? to be measured --- configuration impossible for specific skx processor model, out of physical cores Ring sizes: VNF vring_size = 256 (old qemu), CNF memif_ring_size = 1024 Cores Numa0: pcdr4sw = (1:1), pcmr4sw = (1:1) pcdr4nf = (1:1), pcmr4nf = (1:2) Cores Numa1: not used ``` ``` 64B IMIX VSC 001 002 004 006 008 010 VSC 001 002 004 006 008 010 001 6.1 3.5 2.3 1.5 1.1 ??? 001 4.5 2.4 1.3 0.9 0.6 ??? 002 3.9 1.5 0.3 0.1 0.1 --- 002 3.0 0.8 0.2 0.1 0.1 --- 004 2.4 0.7 0.1 --- --- --- 004 1.9 0.5 0.1 --- --- --- 006 1.7 0.5 --- --- --- --- 006 1.4 0.4 --- --- --- --- 008 1.4 ???* --- --- --- --- 008 1.1 ???* --- --- --- --- 010 ??? --- --- --- --- --- 010 ??? --- --- --- --- --- ``` ``` 64B IMIX CSC 001 002 004 006 008 010 CSC 001 002 004 006 008 010 001 6.4 3.8 2.2 1.6 1.2 0.9 001 4.5 2.5 1.3 0.8 0.6 0.5 002 5.8 3.4 1.8 1.2 0.9 --- 002 4.0 2.1 1.0 0.7 0.5 --- 004 5.6 3.2 1.6 --- --- --- 004 3.8 1.8 0.9 --- --- --- 006 5.4 3.1 --- --- --- --- 006 3.6 1.7 --- --- --- --- 008 5.4 3.4 --- --- --- --- 008 3.4 1.9 --- --- --- --- 010 5.3 --- --- --- --- --- 010 3.4 --- --- --- --- --- ``` ``` 64B IMIX CSP 001 002 004 006 008 010 CSP 001 002 004 006 008 010 001 6.3 6.3 6.3 6.4 6.5 6.4 001 4.5 3.6 3.3 4.3 4.0 3.8 002 5.8 5.6 5.6 5.6 5.5 --- 002 4.0 3.8 3.6 3.3 3.2 --- 004 5.6 5.5 5.3 --- --- --- 004 3.7 3.5 3.3 --- --- --- 006 5.4 5.3 --- --- --- --- 006 3.6 3.3 --- --- --- --- 008 5.4 5.2 --- --- --- --- 008 3.5 3.2 --- --- --- --- 010 5.3 --- --- --- --- --- 010 3.4 --- --- --- --- --- ``` ### Equinix Metal 2n-skx, pcdr4sw = (1:1) ``` Testbed: tg-quad01, sut-quad02-sut Row: 1..10 number of network service instances Column: 1..10 number of network functions (VNF or CNF) per service instance Value: x.y MRR throughput in [Mpps] x.y* `*` Indicates many retries due to failing nfvbench warm-up phase used to verify service forwarding path ??? to be measured --- Configuration impossible for specific skx processor model, out of physical cores Ring sizes: VNF vring_size = 256 (old qemu), CNF memif_ring_size = 1024 Cores Numa0: pcdr4sw = (1:1), pcmr4sw = (1:1) pcdr4nf = (1:1), pcmr4nf = (1:2) Cores Numa1: not used ``` ``` 64B IMIX VSC 001 002 004 006 008 010 VSC 001 002 004 006 008 010 001 5.4 3.1 1.5 1.2 0.9 --- 001 3.4 1.5 0.9 0.6 0.4 --- 002 3.4 1.3 0.3 --- --- --- 002 2.4 0.8 0.2 --- --- --- 004 2.1 0.5 --- --- --- --- 004 1.6 0.3 --- --- --- --- 006 1.5 --- --- --- --- --- 006 1.2 --- --- --- --- --- 008 1.1 --- --- --- --- --- 008 0.9 --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ``` 64B IMIX CSC 001 002 004 006 008 010 CSC 001 002 004 006 008 010 001 5.6 3.3 1.9 1.3 1.0 --- 001 3.7 1.8 0.9 0.6 0.5 --- 002 5.1 2.9 1.5 --- --- --- 002 3.1 1.6 0.8 --- --- --- 004 4.9 2.7 --- --- --- --- 004 3.0 1.4 --- --- --- --- 006 4.8 --- --- --- --- --- 006 2.9 --- --- --- --- --- 008 4.7 --- --- --- --- --- 008 2.8 --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ``` 64B IMIX CSP 001 002 004 006 008 010 CSP 001 002 004 006 008 010 001 5.6 5.7 5.6 5.7 5.7 --- 001 3.8 3.6 3.3 3.1 3.0 --- 002 5.1 4.8 4.9 --- --- --- 002 3.1 3.0 2.8 --- --- --- 004 4.9 4.8 --- --- --- --- 004 3.0 2.8 --- --- --- --- 006 4.8 --- --- --- --- --- 006 2.9 --- --- --- --- --- 008 4.7 --- --- --- --- --- 008 2.8 --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ### FD.io CSIT 2n-skx, pcdr4sw = (2:1) ``` Testbed: t22 Row: 1..10 number of network service instances Column: 1..10 number of network functions (VNF or CNF) per service instance Value: x.y MRR throughput in [Mpps] x.y* `*` indicates many retries due to failing nfvbench warm-up phase used to verify service forwarding path ??? to be measured --- Configuration impossible for specific skx processor model, out of physical cores Ring sizes: VNF vring_size = 256 (old qemu), CNF memif_ring_size = 1024 Cores Numa0: pcdr4sw = (2:1), pcmr4sw = (1:1) pcdr4nf = (1:1), pcmr4nf = (1:2) Cores Numa1: not used ``` ``` 64B IMIX VSC 001 002 004 006 008 010 VSC 001 002 004 006 008 010 001 6.9* 2.6 3.3 2.4 1.8 ??? 001 4.0* 1.5 1.6 1.2 0.9 ??? 002 6.1 2.5 0.5 0.2 0.1 --- 002 3.8 1.5 0.3 0.1 0.1 --- 004 4.3 1.0 0.2 --- --- --- 004 3.3 0.7 0.2 --- --- --- 006 3.0 ???* --- --- --- --- 006 2.4 ???* --- --- --- --- 008 2.3 ???* --- --- --- --- 008 1.9 ???* --- --- --- --- 010 ??? --- --- --- --- --- 010 ??? --- --- --- --- --- ``` ``` 64B IMIX CSC 001 002 004 006 008 010 CSC 001 002 004 006 008 010 001 7.0* 6.0 3.7 2.6 2.1 1.7 001 5.1* 4.0 1.8 1.3 1.0 0.8 002 11.8 6.7 4.0 2.8 2.2 --- 002 7.4 3.5 2.0 1.3 1.0 --- 004 10.7 6.8 3.9 --- --- --- 004 6.8 3.7 1.9 --- --- --- 006 10.4 6.6 --- --- --- --- 006 6.5 3.6 --- --- --- --- 008 10.3 6.4 --- --- --- --- 008 6.5 3.5 --- --- --- --- 010 10.0 --- --- --- --- --- 010 6.3 --- --- --- --- --- ``` ``` 64B IMIX CSP 001 002 004 006 008 010 CSP 001 002 004 006 008 010 001 7.0* 6.9* 6.9* 6.9* 6.9* 6.9* 001 5.1* 5.0* 4.6* 4.2* 4.0* 3.7* 002 11.8 11.7 11.7 11.7 11.7 --- 002 7.4 7.2 6.8 6.4 6.1 --- 004 10.7 10.7 10.5 --- --- --- 004 6.8 6.4 5.9 --- --- --- 006 10.4 10.3 --- --- --- --- 006 6.5 6.1 --- --- --- --- 008 10.3 10.1 --- --- --- --- 008 6.5 5.9 --- --- --- --- 010 10.0 --- --- --- --- --- 010 6.3 --- --- --- --- --- ``` ### Equinix Metal 2n-skx, pcdr4sw = (2:1) ``` Testbed: tg-quad01, sut-quad02-sut Row: 1..10 number of network service instances Column: 1..10 number of network functions (VNF or CNF) per service instance Value: x.y MRR throughput in [Mpps] x.y* `*` Indicates many retries due to failing nfvbench warm-up phase used to verify service forwarding path ??? to be measured --- configuration impossible for specific skx processor model, out of physical cores Ring sizes: VNF vring_size = 256 (old qemu), CNF memif_ring_size = 1024 Cores Numa0: pcdr4sw = (2:1), pcmr4sw = (1:1) pcdr4nf = (1:1), pcmr4nf = (1:2) Cores Numa1: not used ``` ``` 64B IMIX VSC 001 002 004 006 008 010 VSC 001 002 004 006 008 010 001 6.3* 5.0 3.0 2.1 --- --- 001 3.8* 2.4 1.4 1.0 --- --- 002 5.5 2.1 --- --- --- --- 002 3.3 1.3 --- --- --- --- 004 4.0 --- --- --- --- --- 004 1.7 --- --- --- --- --- 006 2.8 --- --- --- --- --- 006 1.2 --- --- --- --- --- 008 --- --- --- --- --- --- 008 --- --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ``` 64B IMIX CSC 001 002 004 006 008 010 CSC 001 002 004 006 008 010 001 6.0* 5.3 3.2 2.3 --- --- 001 5.1* 3.0 1.5 1.1 --- --- 002 10.4 6.0 --- --- --- --- 002 6.0 2.9 --- --- --- --- 004 9.5 --- --- --- --- --- 004 5.7 --- --- --- --- --- 006 9.2 --- --- --- --- --- 006 5.5 --- --- --- --- --- 008 --- --- --- --- --- --- 008 --- --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ``` 64B IMIX CSP 001 002 004 006 008 010 CSP 001 002 004 006 008 010 001 6.2* 6.1* 6.1* 6.1* --- --- 001 5.1* 4.9* 4.2* 3.7* --- --- 002 10.4 10.3 --- --- --- --- 002 6.0 5.7 --- --- --- --- 004 9.5 --- --- --- --- --- 004 5.7 --- --- --- --- --- 006 9.2 --- --- --- --- --- 006 5.5 --- --- --- --- --- 008 --- --- --- --- --- --- 008 --- --- --- --- --- --- 010 --- --- --- --- --- --- 010 --- --- --- --- --- --- ``` ## Reading nfvbench logs Throughput results generated by nfvbench are stored in following directories: 1. pcdr4sw = (1:1) * ```cnfs/comparison/baseline_nf_performance-csit/results/2t1c_novlan``` 2. pcdr4sw = (2:1) * ```cnfs/comparison/baseline_nf_performance-csit/results/4t2c_novlan``` Pretty one-liner printouts per test can be obtained using ```jq``` json parser and following commands run within the above results' directories: ``` jq -r '.benchmarks.network.service_chain.EXT.result.result."64".run_config."direction-total".rx | "64B \(.rate_pps)pps (\(.rate_bps)bps) " + input_filename' *pps*.json ``` ``` jq -r '.benchmarks.network.service_chain.EXT.result.result.IMIX.run_config."direction-total".rx | "64B \(.rate_pps)pps (\(.rate_bps)bps) " + input_filename' *pps*.json ```