Servicio Social - Universidad Autónoma Metropolitana
- Torch-RL with OpenAI Gym: Build foundational RL models using PyTorch and Gymnasium (formerly OpenAI Gym)
- Atari Game Agent: Develop an RL system for classic Atari games
- Transformer-based RL: Explore RL projects integrating transformer architectures
We're particularly interested in:
- GRPO (Group Relative Policy Optimization): The efficient RL algorithm popularized by DeepSeek that eliminates the need for a separate critic model, reducing memory and compute overhead by ~50% compared to traditional PPO
- LoRA Fine-tuning: Using Low-Rank Adaptation to efficiently fine-tune base models with reinforcement learning
- DeepSeek-R1 - GRPO implementation
- DeepSeekMath - Original GRPO paper
- RL Course by David Silver - Deepmind course