#1minPapers Francois Chollet: use LLMs for tree-search instead of next token prediction

3 min readJust now

Not a paper, but 90min of Chollet is always worth watching! The ARC Challenge is fascinating because it’s a rapid adaptation, evolution, emergence of new model species.

Test-time compute increased performance from 10% accuracy to 50–60% accuracy in ARC, but test-time compute is only possible for quantifiable input-output pairs like tasks in ARC.
Finetuning can be autonomous when using demonstration pairs.
In the 2020 ARC Kaggle competition, highest score was 20% via brute force. But combining all of the submissions got to 49% (humans would get to 99% accuracy), because 1/2 of the private test set was brute-force-able, which means the benchmark was flawed (insufficient task diversity and complexity). Need to co-evolve the problem with the solution.
If an input is continuous, neural networks (discrete symbolic programs) may not be a good structure to approach these types of pattern recog problems. Vector-based programs may be better at certain problems.
Induction is formally verifiable. Transduction is guessing what the answer might be, without a way to verify if it’s the right guess — all the wrong answers are wrong for different reasons, but the right answer is right for the same reason. Transduction requires more sampling. Better to start with induction, but if induction doesn’t work, fall back to transduction.
If you look at the problem from different angles, you are more likely to come up with the true shape of the problem. Especially true for NN because NN tends to latch onto noise and irregularities. Different angles also act as a regularization mechanism, where noises from diff angles counter each other.
Using a VAE learns a much more structured, smoother latent spaces, which is key to making test-time gradient descent work.
Chollet would solve ARC via deep-learning program synthesis, not using LLMs for next token generation, but as a graph of operators. Program synthesis is a tree-search process. Use LLMs to guide this tree-search process.
Humans solve ARC by first describing the objects, contents, properties, causal relationships, then use this to constrain the search space, potentially even eliminating the need for search.
Turing-complete language (Python) vs DSL? The language must be able to learn, such that upon seeing a similar problem, it can save compute. It also needs to write higher-level functions.
The fundamental cognitive unit in our brain is fuzzy pattern recognition. System2 planning is applying our intuition in a structured form — which is deep-learning program synthesis. Iteratively guessing with guardrails, to construct a symbolic artifact. Without guardrails, it’s dreaming- continuously intuiting without consistency to the past. Consistency requires back-and-forth loops, bringing the past to the present.
Some recombination patterns of the building blocks will occur more often in certain contexts, extract this as a reservoir form (higher-level abstraction fitted to the problem), add it back to the building blocks, such that next time you solve it in fewer steps.
Speculate how o1 works: search process in the space of possible chains-of-thought. By backtracking and editing which branches work better, it ends up with a natural language program describing what the model should be doing, adapting to novelty. It’s clearly doing search in chain-of-thought space at test-time: telltale sign is compute and latency increasing.
Full interview here: https://www.youtube.com/watch?v=w9WE1aOPjHc

#1minPapers Francois Chollet: use LLMs for tree-search instead of next token prediction

Written by Gwen Cheni

No responses yet