-
Fudan University, Shanghai AI Laboratory
- Shanghai
-
17:46
(UTC +08:00) - https://aleafy.github.io/
Stars
An in-the-wild benchmark for AI agents in the OpenClaw Environment.
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
ARIS ⚔️ (Auto-Research-In-Sleep) — Lightweight Markdown-only skills for autonomous ML research: cross-model review loops, idea discovery, and experiment automation. No framework, no lock-in — works…
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
A unified framework for easy reinforcement learning in Flow-Matching models
This repository provides FlashPortrait custom nodes for ComfyUI.
A Unified Visual Generator with Interleaved OmniModal Context
[ICLR 26 Oral] Stable Video Infinity: Infinite-Length Video Generation with Error Recycling
Official code for StoryMem: Multi-shot Long Video Storytelling with Memory
The official implementation of InfiniteVGGT
Mixture-of-Groups Attention for End-to-End Long Video Generation
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Official Implementation of "MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives"
[NeurIPS 24] The implementation and dataset of LiveScene: Language Embedding Interactive Radiance Fields for Physical Scene Rendering and Control
[Siggraph Asia 25] SS4D: Native 4D Generative Model via Structured Spacetime Latents
HY-World 1.5: A Systematic Framework for Interactive World Modeling with Real-Time Latency and Geometric Consistency
[CVPR2026]We present FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6$\times$ acceleration in inference…
[CVPR 2026] V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties
[CVPR 2026] Official Code for "ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning"
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
[ICLR 2026] An official implementation of "STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence"
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
We present StableAvatar, the first end-to-end video diffusion transformer, which synthesizes infinite-length high-quality audio-driven avatar videos without any post-processing, conditioned on a re…
Official implementation of "SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience"
[CVPR 2025 Oral & Best Paper Finalist] Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models
AgentCPM-GUI: An on-device GUI agent for operating Android apps, enhancing reasoning ability with reinforcement fine-tuning for efficient task execution.
The official implementation for "Recollection from Pensieve: Novel View Synthesis via Learning from Uncalibrated Videos".
Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…