tongjingqi

tongjingqi(SII) tongjingqi

Achievements

AI-Can-Learn-Scientific-Taste AI-Can-Learn-Scientific-Taste Public

We propose Reinforcement Learning from Community Feedback (RLCF), a training paradigm that uses large-scale community signals as supervision, and formulate scientific taste learning as a preference…

351 10
Thinking-with-Video Thinking-with-Video Public

We introduce 'Thinking with Video', a new paradigm leveraging video generation for multimodal reasoning. Our VideoThinkBench shows that Sora-2 surpasses GPT5 by 10% on eyeballing puzzles and reache…

Python 285 5
Game-RL Game-RL Public

Game-RL: Synthesizing Multimodal Verifiable Game Data to Boost VLMs' General Reasoning

Python 143 2
MathTrap MathTrap Public

In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset MATHTRAP‡ by introducing carefully designed log…

Python 60
Awesome-Agent-RL Awesome-Agent-RL Public

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more inte…

59
hello-world hello-world Public

first practice