device: NVIDIA GB300 shape: experts=8, tokens/expert=4096, M=32768, K=8192, N=4096 dtype: torch.bfloat16 torch: 2.12.0a0+5aff3928d8.nv26.05 transformer_engine: 2.15.0+42b84005 nvidia-cudnn-frontend: 1.23.0 nvidia-cutlass-dsl: 4.4.1 summary: activation cudnn_ms te_ms te/cudnn swiglu 1.319 1.695 1.28x dswiglu 0.740 1.549 2.09x srelu 0.675 0.873 1.29x dsrelu 0.686 1.163 1.70x geglu 1.317 1.631 1.24x dgeglu 0.751 1.666 2.22x