#deep-learning

Articles tagged with #deep-learning

RL in the Pre-train Space: Why Training on P(y) Beats Training on P(y|x)
A new paper shows that reinforcement learning directly on the marginal distribution unlocks reasoning capabilities that standard RLVR can never reach.
Apr 16, 20267 min read
Tucker Attention: GQA, MLA, and MHA Were the Same Thing All Along
All major attention variants are special cases of one tensor decomposition, achieving 10x parameter reduction with zero performance loss
Apr 2, 20267 min read1
The Compression Wars: Why Making AI Smaller Is Now Harder Than Making It Bigger
Google's TurboQuant, Apple's Gemini distillation, and a new knowledge transfer method converge on the same message: the race to make AI bigger is over.
Mar 28, 20268 min read