DeepSeek-R1: pure reinforcement learning (RL), no supervised fine-tuning (SFT), no chain-of-thought…#1minPapersJan 21Jan 21
Takeaways from JPM Healthcare Conf 2025 #JPM2025This week was my eighth JPM Healthcare Conference. I’ve been to five pre-pandemic, courtesy of being on the buy-side, and having been a JPM…Jan 17Jan 17
#1minPapers MSFT’s rStar-Math small language model self-improves and generates own training dataThis is the second time in recent months that a small model performed equally well (or better) than the billion-parameter large models…Jan 12Jan 12
#1minPapers Francois Chollet: use LLMs for tree-search instead of next token predictionNot a paper, but 90min of Chollet is always worth watching! The ARC Challenge is fascinating because it’s a rapid adaptation, evolution…Jan 10Jan 10
#1minPapers “Fourier Analysis Networks” — Yihong Dong et alMulti-layer Perceptrons (MLPs) are the backbones of LLMs, but they aren’t efficient at modeling periodicity (e.g. rhythmic bass in music)…Jan 8Jan 8
#1minPapers Ability to leverage the tools increases with model params— “Toolformer: Language Models…Yesterday’s #1minPapers noted that model role-play/deception is a problem if models have access to tools. So of course today we’ll dig into…Jan 6Jan 6
#1minPapers “Role-Play with Large Language Models” - Shanahan et alThe scare a few weeks ago that o1 was able to duplicate its weights via deception got me very interested in how LLMs can role-play. Dug up…Jan 5Jan 5
#1minPapers “Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement…Got something spicy today: a team at Fudan University in China attempted to reproduce o1. This paper was published 2 wks ago, and full of…Jan 3Jan 3
#1minPapers “Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking” —…I’m fascinated by reasoning: reasoning allows a model to decompose a challenging computation into smaller steps. This Quiet-STaR model…Jan 2Jan 2
#1minPapers “Critique-out-loud Reward Models” — by Zachary Ankner, Mansheej Paul, Brandon Cui…I’m down the rabbit hole of optimized reward models. This paper is still in preprint.Dec 31, 2024Dec 31, 2024