Skip to content
View lzyhha's full-sized avatar
🍋
studying
🍋
studying

Block or report lzyhha

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

ASID-Caption: Attribute-Structured and Quality-Verified Audiovisual Instruction Dataset and Training Pipeline for Fine-Grained Video Understanding.

Python 60 2 Updated Mar 3, 2026

A comprehensive list of papers about Large-Language-Diffusion-Models.

63 8 Updated Mar 2, 2026

paper list, tutorial, and nano code snippet for Diffusion Large Language Models.

Jupyter Notebook 159 9 Updated Jan 19, 2026

dLLM: Simple Diffusion Language Modeling

Python 2,291 223 Updated Feb 27, 2026
Python 10,799 723 Updated Feb 9, 2026

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch.

Python 33,225 6,886 Updated Mar 31, 2026

Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model

Python 958 60 Updated Mar 20, 2026
Python 55 Updated Sep 21, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,952 2,061 Updated Mar 27, 2026

Official Release of ICCV 2025 paper -- DiscretizedSDF

Python 103 12 Updated Aug 25, 2025
Python 723 20 Updated Feb 5, 2026

Official code for the paper: Depth Anything At Any Condition

Python 328 20 Updated Aug 21, 2025

The official code for the paper: LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Python 119 2 Updated Jul 1, 2025

Open-source unified multimodal model

Python 5,781 512 Updated Oct 27, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,566 65 Updated Jun 14, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,904 89 Updated Jan 8, 2026

Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)

Python 15 Updated May 2, 2025

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 2,173 95 Updated Dec 29, 2025
Python 115 3 Updated Apr 25, 2025

🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.

Python 1 Updated May 12, 2025

Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"

Python 32 Updated Apr 20, 2025

Resurrect Mask AutoRegressive Modeling for Efficient and Scalable Image Generation.

Python 15 Updated Jul 21, 2025

[ICCV 2025] VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)

Python 279 14 Updated Jan 7, 2026

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python 301 14 Updated Apr 23, 2025

Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling

Python 1,080 52 Updated Nov 3, 2025

[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers

Python 166 11 Updated Sep 6, 2025

Democratizing AI scientists with ToolUniverse

Python 1,193 187 Updated Mar 30, 2026

TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools

Python 610 95 Updated Jul 30, 2025

HumanOmni

Python 221 12 Updated Mar 10, 2025
Python 415 29 Updated Mar 10, 2025
Next