device: NVIDIA GB200 shape: experts=8, tokens/expert=4096, M=32768, K=8192, N=4096 dtype: torch.bfloat16 torch: 2.12.0a0+5aff3928d8.nv26.05 transformer_engine: 2.15.0+42b84005 nvidia-cudnn-frontend: 1.23.0 nvidia-cutlass-dsl: 4.4.1 summary: activation cudnn_ms te_ms te/cudnn swiglu 1.550 1.934 1.25x dswiglu 0.870 1.734 1.99x srelu 0.739 1.041 1.41x dsrelu 0.781 1.341 1.72x geglu 1.544 1.890 1.22x dgeglu 0.889 1.820 2.05x