# Axiom Benchmark Results *Generated: 2026-02-21 16:02* *Platform: Darwin arm64* ## Contents - [Summary](#summary) - [Matrix Multiplication](#matrix-multiplication) - [Element-wise Operations](#element-wise-operations) - [Unary Operations](#unary-operations) - [Linear Algebra](#linear-algebra) - [FFT Operations](#fft-operations) - [Fusion Patterns](#fusion-patterns) - [Test Environment](#test-environment) --- ## Summary Comprehensive performance comparison across tensor operations. ![Comprehensive Summary](../benchmarks/results/plots/comprehensive_summary.png) --- ## Matrix Multiplication Performance comparison for square matrix multiplication (GFLOPS, higher is better). | Size | Axiom | Eigen3 | PyTorch | NumPy | Armadillo | |---:|---:|---:|---:|---:|---:| | 32×32 | 55.2 | 88.4 | 62.2 | 56.4 | 98.3 | | 64×64 | 371 | 440 | 336 | 332 | 42.1 | | 128×128 | 923 | 954 | 930 | 951 | 44.7 | | 256×256 | 1,421 | 1,251 | 1,412 | 1,488 | 162 | | 512×512 | 2,389 | 2,345 | 1,310 | 2,358 | 434 | | 1024×1024 | 2,820 | 2,445 | 2,423 | 2,299 | 524 | | 2048×2048 | 3,218 | 2,982 | 2,801 | 2,795 | 608 | | 4096×4096 | 3,087 | 2,961 | 2,961 | 2,959 | 754 | ### Performance Comparison ![Matmul Comparison](../benchmarks/results/plots/matmul_comparison.png) ### Scaling Analysis ![Matmul Scaling](../benchmarks/results/plots/matmul_scaling.png) --- ## Element-wise Operations Binary element-wise operations (add, sub, mul, div) measured in GB/s throughput. *Results at 4096×4096 (GB/s)* | Operation | Axiom | Eigen3 | PyTorch | NumPy | |:---|---:|---:|---:|---:| | add | 92.4 | 121 | 94.2 | 40.0 | | sub | 112 | 119 | 90.4 | 40.5 | | mul | 117 | 119 | 96.1 | 42.9 | | div | 99.1 | 120 | 95.4 | 41.2 | ### Performance by Operation ![Elementwise Comparison](../benchmarks/results/plots/elementwise_comparison.png) ### Bar Chart Comparison ![Elementwise Bar](../benchmarks/results/plots/elementwise_bar_4096.png) --- ## Unary Operations Unary operations (exp, log, sqrt, sin, cos, tanh, abs, neg, relu, sigmoid) measured in GB/s. *Results at 4096×4096 (GB/s)* | Operation | Axiom | Eigen3 | PyTorch | NumPy | |:---|---:|---:|---:|---:| | exp | 23.0 | 16.4 | 50.7 | 5.69 | | log | 17.0 | 13.1 | 33.7 | 4.93 | | sqrt | 39.0 | 66.3 | 73.2 | 29.7 | | sin | 26.0 | 11.0 | 39.1 | 6.73 | | cos | 25.2 | 11.0 | 33.7 | 6.46 | | tanh | 14.3 | 21.5 | 21.0 | 9.43 | | abs | 104 | 118 | 75.1 | 27.3 | | neg | 109 | 66.1 | 75.4 | 32.7 | | relu | 121 | 122 | 74.2 | 18.5 | | sigmoid | 14.3 | 15.9 | 47.5 | 3.71 | ### Performance by Operation ![Unary Comparison](../benchmarks/results/plots/unary_comparison.png) ### Bar Chart Comparison ![Unary Bar](../benchmarks/results/plots/unary_bar_4096.png) --- ## Linear Algebra Linear algebra operations (SVD, QR, solve, Cholesky, eigendecomposition, inverse, determinant). Measured in milliseconds (lower is better). *Results at 512×512 (time_ms)* | Operation | Axiom | Eigen3 | PyTorch | NumPy | |:---|---:|---:|---:|---:| | svd | 18.2 | 2,200 | 16.2 | 25.9 | | qr | 4.72 | 1.50 | 4.06 | 7.89 | | solve | 0.96 | 2.18 | 0.47 | 1.20 | | cholesky | 0.69 | 0.24 | 0.22 | 1.43 | | eig | 151 | 22.5 | 9.50 | 15.4 | | inv | 1.55 | 2.34 | 1.06 | 3.83 | | det | 1.01 | 1.46 | 0.57 | 1.69 | ### Performance by Operation ![Linalg Comparison](../benchmarks/results/plots/linalg_comparison.png) ### Bar Chart Comparison ![Linalg Bar](../benchmarks/results/plots/linalg_bar_512.png) --- ## FFT Operations Fast Fourier Transform operations (fft, ifft, rfft, fft2, ifft2, rfft2). Measured in milliseconds (lower is better). *Results at 2048×2048 (time_ms)* | Operation | Axiom | PyTorch | NumPy | |:---|---:|---:|---:| | fft | 0.00 | 0.01 | 0.01 | | ifft | 0.00 | 0.01 | 0.01 | | rfft | 0.00 | 0.01 | 0.01 | | fft2 | 14.3 | 27.2 | 60.6 | | ifft2 | 14.3 | 27.6 | 29.6 | | rfft2 | 10.0 | 7.82 | 22.8 | ### Performance by Operation ![FFT Comparison](../benchmarks/results/plots/fft_comparison.png) ### Bar Chart Comparison ![FFT Bar](../benchmarks/results/plots/fft_bar_2048.png) --- ## Fusion Patterns Lazy evaluation with operation fusion vs eager mode execution. *Run `make benchmark-fusion` to generate fusion data.* --- ## Test Environment ``` OS: Darwin 25.3.0 Architecture: arm64 Python: 3.12.7 Timestamp: 2026-02-21T15:56:58.080886 ``` ## Notes - All benchmarks run on CPU - Axiom uses Accelerate framework (BLAS) on macOS - Higher GFLOPS/GB/s = better for throughput metrics - Lower ms = better for time metrics - Results may vary based on system load and thermal conditions