leloykun's Ponder

leloykun's Ponder https://leloykun.github.io/ponder/ leloykun's Ponder blog http://www.rssboard.org/rss-specification python-feedgen en Sun, 05 Apr 2026 03:18:00 +0000 Frequency Domain Muon for Convolutional Neural Networks: Simplified https://leloykun.github.io/ponder/freqmuon/ The 'correct' way to apply Muon to CNNs https://leloykun.github.io/ponder/freqmuon/ Fri, 06 Mar 2026 00:00:00 +0000 LUCID-MoE: Mixture of Experts with Preconditioned Routing https://leloykun.github.io/ponder/lucid-moe/ Sharper Mixture-of-Experts routing with LUCID preconditioning. https://leloykun.github.io/ponder/lucid-moe/ Tue, 17 Feb 2026 00:00:00 +0000 Error-Compensating Optimizers: ECO-AdamW, ECO-Muon, and Beyond https://leloykun.github.io/ponder/eco/ Quantized training without full-precision master weights, extended to handle weight decay and matrix LMOs. https://leloykun.github.io/ponder/eco/ Tue, 10 Feb 2026 00:00:00 +0000 Bidirectional-PRISM: Kronecker-Factored Optimization via Anisotropic Spectral Shaping https://leloykun.github.io/ponder/shampoo-prism/ A novel optimizer that combines Shampoo-style preconditioning with PRISM's anisotropic spectral shaping to adaptively suppress noisy gradient directions while maximally descending under the spectral norm trust-region constraint. https://leloykun.github.io/ponder/shampoo-prism/ Wed, 04 Feb 2026 00:00:00 +0000 Steepest Descent on Affine-Conic Representable Manifolds with Boundary via Dual Ascent https://leloykun.github.io/ponder/steepest-descent-affine-conic/ Novel optimizers for maximally descending on the loss landscape while satisfying strict weight constraints. https://leloykun.github.io/ponder/steepest-descent-affine-conic/ Fri, 09 Jan 2026 00:00:00 +0000 Steepest Descent on the Birkhoff Polytope Equipped with the Spectral Norm https://leloykun.github.io/ponder/steepest-descent-doubly-stochastic/ We derive an optimizer that performs steepest descent on the Birkhoff polytope equipped with the spectral norm via dual ascent. We show that it yields larger effective weight updates than naive LMO-based optimizers. https://leloykun.github.io/ponder/steepest-descent-doubly-stochastic/ Sun, 04 Jan 2026 00:00:00 +0000 Sensitivity and Sharpness of Gated Linear Attention Mechanisms https://leloykun.github.io/ponder/lipschitz-gated-linear-attn/ We derive sensitivity and sharpness bounds for Gated DeltaNet and Mamba 2, showing that they can be made 1-Lipschitz with appropriate parameter constraints. https://leloykun.github.io/ponder/lipschitz-gated-linear-attn/ Fri, 02 Jan 2026 00:00:00 +0000 Convergence Bounds for Steepest Descent Under Arbitrary Norms https://leloykun.github.io/ponder/steepest-descent-convergence/ First-order optimization under arbitrary norms with Nesterov momentum (and decoupled weight decay) yields a universal convergence bound. Our results generalize to norms not induced by inner products, and also considers batch size. https://leloykun.github.io/ponder/steepest-descent-convergence/ Thu, 11 Dec 2025 00:00:00 +0000 Critical Batch Size for Steepest Descent Under Arbitrary Norms https://leloykun.github.io/ponder/steepest-descent-crit-bz/ First-order optimization under arbitrary norms with Nesterov momentum (and decoupled weight decay) yields universal critical batch size scaling laws. Under an additional local-LMO assumption, the same analysis also heuristically supports square-root learning-rate scaling with batch size. https://leloykun.github.io/ponder/steepest-descent-crit-bz/ Sat, 22 Nov 2025 00:00:00 +0000 Rethinking Maximal Update Parametrization: Steepest Descent on Finsler-Structured (Matrix) Geometries via Dual Ascent https://leloykun.github.io/ponder/steepest-descent-finsler-dual-ascent/ To guarantee fast and robust model training, we can recast the optimization problem as steepest descent on Finsler-structured geometries. Here we show how to compute the optimal updates via dual ascent. https://leloykun.github.io/ponder/steepest-descent-finsler-dual-ascent/ Wed, 29 Oct 2025 00:00:00 +0000 Rethinking Maximal Update Parametrization: Steepest Descent on the Spectral Ball https://leloykun.github.io/ponder/rethinking-mup-spectral-ball/ Novel optimizers for maximally updating both the weights and activations of neural networks while keeping weight norms under control. To get there, we needed to invent an efficient, GPU/TPU-friendly method for eigenvalue clipping and solve the Steepest Descent problem on the Positive Semidefinite Cone, Convex Spectrahedron, and finally on the Spectral Ball. https://leloykun.github.io/ponder/rethinking-mup-spectral-ball/ Wed, 15 Oct 2025 00:00:00 +0000 Factorization-free Eigenvalue Clipping and Steepest Descent on the Positive Semidefinite Cone, Convex Spectrahedron, and Spectral Ball https://leloykun.github.io/ponder/eigenvalue-clipping/ Novel optimizers for maximally updating both the weights and activations of neural networks while keeping weight norms under control. To get there, we needed to invent a cheap, GPU/TPU-friendly method for eigenvalue clipping and steepest descent on the positive semidefinite cone, convex spectrahedron, and finally steepest descent on the spectral ball. https://leloykun.github.io/ponder/eigenvalue-clipping/ Sat, 11 Oct 2025 00:00:00 +0000 Steepest Descent on Finsler-Structured (Matrix) Manifolds https://leloykun.github.io/ponder/steepest-descent-finsler/ Fast and robust model training. https://leloykun.github.io/ponder/steepest-descent-finsler/ Wed, 20 Aug 2025 00:00:00 +0000 Heuristic Solutions for Steepest Descent on the Stiefel Manifold https://leloykun.github.io/ponder/steepest-descent-stiefel/ What would Muon look like if we constrained the weights to be semi-orthogonal? https://leloykun.github.io/ponder/steepest-descent-stiefel/ Fri, 18 Jul 2025 00:00:00 +0000 Sensitivity and Sharpness of n-Simplicial Attention https://leloykun.github.io/ponder/lipschitz-n-simplical-transformer/ Towards a maximal update parameterization of n-simplicial attention https://leloykun.github.io/ponder/lipschitz-n-simplical-transformer/ Sun, 06 Jul 2025 00:00:00 +0000 Adam with Aggressive Gradient Clipping ≈ Smoothed SignSGD/NormSGD https://leloykun.github.io/ponder/adam-aggressive-clipping/ Why does Adam with aggressive gradient value/norm clipping have sparse updates and do well with higher learning rates? Here we show that it asymptotically matches Smoothed SignSGD in the value-clipping limit and tracks a rescaled Smoothed NormSGD direction in the norm-clipping limit. https://leloykun.github.io/ponder/adam-aggressive-clipping/ Thu, 03 Jul 2025 00:00:00 +0000 Fast, Numerically Stable, and Auto-Differentiable Spectral Clipping via Newton-Schulz Iteration https://leloykun.github.io/ponder/spectral-clipping/ A small step towards hardware-architecture-optimizer codesign in deep learning. https://leloykun.github.io/ponder/spectral-clipping/ Mon, 23 Jun 2025 00:00:00 +0000 Muon and a Selective Survey on Steepest Descent in Riemannian and Non-Riemannian Manifolds https://leloykun.github.io/ponder/steepest-descent-non-riemannian/ Muon from first principles, what makes it different from other optimizers, and why it works so well. https://leloykun.github.io/ponder/steepest-descent-non-riemannian/ Thu, 03 Apr 2025 00:00:00 +0000 Napkin Math on Non-Euclidean Trust Region Optimization https://leloykun.github.io/ponder/napkin-math-trust-region-opt/ A possible reason why Muon converges faster & does better at higher learning rates than Adam. https://leloykun.github.io/ponder/napkin-math-trust-region-opt/ Mon, 24 Mar 2025 00:00:00 +0000 Block Matrix Formulation of Linear Attention Mechanisms https://leloykun.github.io/ponder/blockmat-linear-attn/ The block matrix formulation of linear attention mechanisms, multi-step online gradient descent at inference time, and chunk-wise parallelism. https://leloykun.github.io/ponder/blockmat-linear-attn/ Sun, 16 Mar 2025 00:00:00 +0000 Steepest Descent Under Schatten-p Norms https://leloykun.github.io/ponder/steepest-descent-schatten-p/ Why Muon still work despite not perfectly semi-orthogonalizing the gradients. https://leloykun.github.io/ponder/steepest-descent-schatten-p/ Thu, 27 Feb 2025 00:00:00 +0000 Squeezing 1-2% Efficiency Gains Out of Muon by Optimizing the Newton-Schulz Coefficients https://leloykun.github.io/ponder/muon-opt-coeffs/ Simply switching to Muon can already get you 2x efficiency gains. But you can squeeze out an extra 1-2% by optimizing the Newton-Schulz coefficients. https://leloykun.github.io/ponder/muon-opt-coeffs/ Fri, 21 Feb 2025 00:00:00 +0000 CASPR Without Accumulation is Muon https://leloykun.github.io/ponder/caspr-wo-accum-is-muon/ The CASPR optimizer, a variant of Shampoo, reduces to Muon when we remove the accumulation on the preconditioners. https://leloykun.github.io/ponder/caspr-wo-accum-is-muon/ Thu, 13 Feb 2025 00:00:00 +0000 GRPO's Main Flaw https://leloykun.github.io/ponder/grpo-flaw/ GRPO may not be the best choice for training reasoning models. Here's why. https://leloykun.github.io/ponder/grpo-flaw/ Tue, 11 Feb 2025 00:00:00 +0000 (Linear) Attention as Test-Time Regression https://leloykun.github.io/ponder/test-time-regression/ A unifying framework for linear attention mechanisms as test-time regression and how to parallelize training and inference. https://leloykun.github.io/ponder/test-time-regression/ Mon, 27 Jan 2025 00:00:00 +0000 Deep Learning Optimizers as Steepest Descent in Normed Spaces https://leloykun.github.io/ponder/steepest-descent-opt/ Instead of asking, 'Which optimizer should I use?' ask, 'In which space do my features live in?' https://leloykun.github.io/ponder/steepest-descent-opt/ Sun, 20 Oct 2024 00:00:00 +0000 ChatGPT May Have Developed Seasonal Depression https://leloykun.github.io/ponder/chatgpt-seasonal-depression/ Could ChatGPT's shorter responses be an indication of something more bizarre going on? https://leloykun.github.io/ponder/chatgpt-seasonal-depression/ Sat, 16 Dec 2023 00:00:00 +0000 The Human Mind May Be Universal https://leloykun.github.io/ponder/human-mind-universality/ Years of experience in building artificial minds led me to believe that these AIs may end up seeming more 'human' than we currently imagine them to be. https://leloykun.github.io/ponder/human-mind-universality/ Sun, 10 Dec 2023 00:00:00 +0000 Four Rules for Rulers https://leloykun.github.io/ponder/4rules4rulers/ On how to gain and maintain power. https://leloykun.github.io/ponder/4rules4rulers/ Sun, 19 Jun 2022 00:00:00 +0000 Vaccine Search as a Computational Problem https://leloykun.github.io/ponder/vaccine-search-as-comp-prob/ A thought dump on mRNA vaccines and the future of computational biology https://leloykun.github.io/ponder/vaccine-search-as-comp-prob/ Sat, 06 Feb 2021 00:00:00 +0000 The Accuracy-Fairness Dilemma in Machine Learning https://leloykun.github.io/ponder/ml-accuracy-fairness-dilemma/ Machine learning models merely amplify our biases - not eliminate them. https://leloykun.github.io/ponder/ml-accuracy-fairness-dilemma/ Sat, 24 Oct 2020 00:00:00 +0000 How to Master Machine Learning: 3 Tips to Get Started https://leloykun.github.io/ponder/how-to-master-machine-learning/ Whether you're only here for the hype or genuinely interested in the field, you’re in for a wild ride. https://leloykun.github.io/ponder/how-to-master-machine-learning/ Sat, 12 Sep 2020 00:00:00 +0000