Yulu Gan

Hi, I am Yulu Gan, a second-year CS PhD at MIT, studying AI and Science. Advised by Tomaso Poggio and Phillip Isola. Interested in how nature's evolutionary processes can inspire better AI systems. [...]

Updates

Invited talk at David Bau's group.

Recent 4 Papers * indicates equal contribution.

All publications →
Preprint 2026
1

Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

Yulu Gan, Phillip Isola

Core finding: The neighborhood around pretrained weights already contains task-specific experts. In small models they are sparse and hard to find; in large models they are dense and easy to discover. This motivates a simple post-training algorithm we call RandOpt: sample N weight vectors, keep the top K, and majority vote at inference time.

Preprint 2025
2

Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning

Xin Qiu*, Yulu Gan*, Conor F. Hayes*, Qiyao Liang, Yinggan Xu, Roberto Dailey, Elliot Meyerson, Babak Hodjat, Risto Miikkulainen

Core finding: We proposed a varient of Evolution Strategies to finetune LLMs. With a population of just 30, ES outperforms PPO and GRPO across tested models, while requiring no gradients, no critic, no reward models. ES is also more robust across base LLMs and less prone to reward hacking.

Preprint 2024
3

Self-Assembly of a Biologically Plausible Learning Circuit

Qianli Liao*, Liu Ziyin*, Yulu Gan*, Brian Cheung, Mark Harnett, Tomaso Poggio

Core finding: We propose a biologically plausible circuit for updating network weights that works as well as backpropagation on some image classification tasks, avoiding its biological implausibility. A key prediction is a surprising self-assembly property of the basic circuit, emerging from initial random connectivity and heterosynaptic plasticity rules, with verifiable predictions about cortical anatomy and physiology.

NeurIPS 2024
4

On the Power of Decision Trees in Auto-Regressive Language Modeling

Yulu Gan, Tomer Galanti, Tomaso Poggio, Eran Malach

Core finding: Auto-regressive decision trees (ARDTs) can express strong sequential reasoning, including simulations of automata, Turing machines, and sparse circuits via chain-of-thought computation. Empirically, a 0.3M-parameter tree ensemble outperforms a 1M Transformer on TinyStories, and tree ensembles on GPT-2 embeddings rival InstructGPT and PaLM-540B on Big-Bench-Hard.

To be updated.

Reading & Talks

See all →

Interesting Classes I Have Taken

6.5930/1 Hardware Architecture for Deep Learning 6.8300 Advances in Computer Vision 6.7960 Deep Learning 6.8610 Quantitative Methods for NLP 6.S184: Generative AI with Stochastic Differential Equations 6.262 Discrete Stochastic Processes