- lizhongyu at nankai.edu.cn
- https://lzyhha.github.io/
- https://scholar.google.com/citations?user=g6WHXrgAAAAJ
Stars
ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.
A comprehensive list of papers about Large-Language-Diffusion-Models.
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Official Release of ICCV 2025 paper -- DiscretizedSDF
Official code for the paper: Depth Anything At Any Condition
The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.
Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)
A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.
lzyhha / diffusers
Forked from huggingface/diffusers🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"
Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.
[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling
[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Democratizing AI scientists with ToolUniverse
TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools