- Shanghai, China
Stars
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
Accelerating Masked Image Generation by Learning Latent Controlled Dynamics
TeleMem is a high-performance drop-in replacement for Mem0, featuring semantic deduplication, long-term dialogue memory, and multimodal video reasoning.
Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows
UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
PICABench: How Far Are We from Physically Realistic Image Editing?
Reference PyTorch implementation and models for DINOv3
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
The official repo of TeleEgo - A Benchmark for Egocentric AI Assistants.
Recommend new arxiv papers of your interest daily according to your Zotero libarary.
[CVPR 2026] Towards Real-Time Diffusion-Based Streaming Video Super-Resolution — An efficient one-step diffusion framework for streaming VSR with locality-constrained sparse attention and a tiny co…
SDAR (Synergy of Diffusion and AutoRegression), a large diffusion language model(1.7B, 4B, 8B, 30B)
UniGenBench++: A Unified Semantic Evaluation Benchmark for Text-to-Image Generation
ALLWEONE® Open source AI presentation generator Gamma Alternative. Create professional slides with customizable themes and AI-generated content in minutes.
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
Dingo: A Comprehensive AI Data, Model and Application Quality Evaluation Tool
[CVPR 2026] ArtiMuse: Fine-Grained Image Aesthetics Assessment with Joint Scoring and Expert-Level Understanding(书生 · 妙析多模态美学理解大模型)
A list of awesome all-in-one image restoration methods. Updating...!
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
Arbitrary-steps Image Super-resolution via Diffusion Inversion (CVPR 2025)
[ICCV 2025] MagicMirror: ID-Preserved Video Generation in Video Diffusion Transformers
Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning
[CVPR'25 Highlight] Official implementation for paper - LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[ICCV 2025] FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
HunyuanVideo: A Systematic Framework For Large Video Generation Model