Hi, I am Yulu Gan, a second-year CS PhD at MIT, studying AI and Science. Advised by Tomaso Poggio and Phillip Isola. Interested in how nature's evolutionary processes can inspire better AI systems. [...]
Updates
Recent 4 Papers * indicates equal contribution.
All publications →Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights
Core finding: The neighborhood around pretrained weights already contains task-specific experts. In small models they are sparse and hard to find; in large models they are dense and easy to discover. This motivates a simple post-training algorithm we call RandOpt: sample N weight vectors, keep the top K, and majority vote at inference time.
Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning
Core finding: We proposed a varient of Evolution Strategies to finetune LLMs. With a population of just 30, ES outperforms PPO and GRPO across tested models, while requiring no gradients, no critic, no reward models. ES is also more robust across base LLMs and less prone to reward hacking.
Self-Assembly of a Biologically Plausible Learning Circuit
Core finding: We propose a biologically plausible circuit for updating network weights that works as well as backpropagation on some image classification tasks, avoiding its biological implausibility. A key prediction is a surprising self-assembly property of the basic circuit, emerging from initial random connectivity and heterosynaptic plasticity rules, with verifiable predictions about cortical anatomy and physiology.
On the Power of Decision Trees in Auto-Regressive Language Modeling
Core finding: Auto-regressive decision trees (ARDTs) can express strong sequential reasoning, including simulations of automata, Turing machines, and sparse circuits via chain-of-thought computation. Empirically, a 0.3M-parameter tree ensemble outperforms a 1M Transformer on TinyStories, and tree ensembles on GPT-2 embeddings rival InstructGPT and PaLM-540B on Big-Bench-Hard.