RL in the Pre-train Space: Why Training on P(y) Beats Training on P(y|x)
A new paper shows that reinforcement learning directly on the marginal distribution unlocks reasoning capabilities that standard RLVR can never reach.
Search for a command to run...
Articles tagged with #artificial-intelligence
A new paper shows that reinforcement learning directly on the marginal distribution unlocks reasoning capabilities that standard RLVR can never reach.
A Meta paper reveals the overfitting wall everyone accepted was actually evaluation noise, and the real ceiling is much further out
Here's the deal we thought we had with chain-of-thought prompting: let the model show its work, and we can watch the reasoning unfold. If something goes wrong, we'd see it in the chain. CoT was our audit trail, our interpretability shortcut, our free...
All major attention variants are special cases of one tensor decomposition, achieving 10x parameter reduction with zero performance loss
Cross-model disagreement is a training-free, label-free signal that catches confident errors your model's own uncertainty metrics will miss every time.
Google's TurboQuant, Apple's Gemini distillation, and a new knowledge transfer method converge on the same message: the race to make AI bigger is over.