Skip to content
View yhcao6's full-sized avatar
  • CUHK, MMLab
  • Hong Kong

Block or report yhcao6

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,319 105 Updated Oct 29, 2025

[NeurIPS 2025] Official implementation of HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance

Python 86 1 Updated Sep 18, 2025

[ICCV 2025] MM-IFEngine: Towards Multimodal Instruction Following

Python 119 Updated Feb 13, 2026

[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++

Python 219 5 Updated Feb 2, 2026

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

Python 13,260 1,414 Updated Mar 22, 2026

PlayStation 4 emulator for Windows, Linux and macOS written in C++

C++ 30,485 2,060 Updated Mar 24, 2026

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,911 919 Updated Mar 4, 2026

Data annotation toolbox supports image, audio and video data.

Python 1,520 165 Updated Mar 20, 2026

The Open-Source Data Annotation Platform

TypeScript 1,195 121 Updated Feb 19, 2025

Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

Python 57,061 4,716 Updated Mar 24, 2026

Detectron2 Toolbox and Benchmark for V3Det

Python 18 2 Updated Jun 2, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Python 2,923 177 Updated May 26, 2025

OpenMMLab Detection Toolbox and Benchmark for V3Det

Python 15 2 Updated Apr 3, 2024
Python 61 4 Updated Jan 12, 2022

OpenMMLab FewShot Learning Toolbox and Benchmark

Python 752 124 Updated Sep 5, 2023

OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

Python 3,865 619 Updated Sep 19, 2023

Exercise for cpp

C++ 1 Updated Jul 14, 2020

OpenMMLab Detection Toolbox and Benchmark

Python 32,537 9,848 Updated Aug 21, 2024