Lists (3)
Sort Name ascending (A-Z)
Starred repositories
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Docker image registry for SWE-bench, created by Epoch AI.
Adaptive, block-based status line for Claude Code with bin-packing layout
OpenClaw-RL: Train any agent simply by talking
🧩 Force Overleaf editor and preview to display top-bottom instead of side-by-side. 强制Overleaf编辑区和预览区纵向显示插件
Lightweight coding agent that runs in your terminal
Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…
Official implementation of GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization
TMLR | This survey presents a comprehensive and structured synthesis of memory in LLMs and MLLMs, organizing the literature into a cohesive taxonomy comprising implicit, explicit, and agentic memor…
SWE-bench: Can Language Models Resolve Real-world Github Issues?
🏷 A Browser Extension to mark and take notes on the web page
A lightweight terminal-based Excel viewer with Vim-like navigation for viewing, editing, and exporting Excel data to JSON format. 一个运行在终端中的轻量级 Excel 查看器,具有类 Vim 导航功能,用于查看、编辑 Excel 数据并导出为 JSON 格式。
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
An interactive attention visualization and intervention tool for LLM Decode Stage.
[ICLR 2026] End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning
A high-fidelity, general-purpose platform for embodied agent training and testing.
A toolkit for embedding text datasets with sparse autoencoders
Official repository for DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research
Open-source release accompanying Gao et al. 2025
Official Repository of Native Parallel Reasoner
Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University
Processed / Cleaned Data for Paper Copilot
Tongyi Deep Research, the Leading Open-source Deep Research Agent
Evaluation of LLMs on latest math competitions
YaRN: Efficient Context Window Extension of Large Language Models