RL in the Pre-train Space: Why Training on P(y) Beats Training on P(y|x)
A new paper shows that reinforcement learning directly on the marginal distribution unlocks reasoning capabilities that standard RLVR can never reach.
Apr 16, 20267 min read
Search for a command to run...
Articles tagged with #deep-learning
A new paper shows that reinforcement learning directly on the marginal distribution unlocks reasoning capabilities that standard RLVR can never reach.
All major attention variants are special cases of one tensor decomposition, achieving 10x parameter reduction with zero performance loss
Google's TurboQuant, Apple's Gemini distillation, and a new knowledge transfer method converge on the same message: the race to make AI bigger is over.