Twenty years ago John Maddox published a celebrated News and Views article in Nature 1 in which he provocatively wrote: “One of the continuing scandals in the physical sciences is that it remains ...
RENT is an unsupervised method for training reasoning LLMs by minimizing entropy. We demonstrate on a variety of datasets and models that RENT improves model performance without using any ground truth ...