Lists (1)
Sort Name ascending (A-Z)
Starred repositories
SegviGen: Repurposing 3D Generative Model for Part Segmentation
[CVPR 2026] LoST: Level of Semantics Tokenization for 3D Shapes
Official code of "Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding"
[CVPR 2026] - IsoCLIP: Decomposing CLIP Projectors for Efficient Intra-modal Alignment
From scratch implementation of a vision language model in pure PyTorch
在千问最新的多模态image-text模型Qwen3-VL-4B-Instruct 进行多种lora微调对比效果,通过langchain+RAG+多智能体(Multi-Agent)进行部署
[SIGIR'24] The official implementation code of MOELoRA.
whuhxb / MoCLE
Forked from gyhdog99/MoCLEMoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)
MoCLE (First MLLM with MoE for instruction customization and generalization!) (https://arxiv.org/abs/2312.12379)
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detecti…
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
Data and software for building the ACL Anthology.
Repo of "LaST-VLA: Thinking in Latent Spatio-Temporal Space for Vision-Language-Action in Autonomous Driving""
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
New generation of CLIP with strong fine grained discrimination capability, ICML2025
[CVPR26] GeoMotion: Rethinking Motion Segmentation via Latent 4D Geometry
Spa3R: Predictive Spatial Field Modeling for 3D Visual Reasoning
[CVPR2026] UFO: Unifying Feed-Forward and Optimization-based Methods for Large Driving Scene Modeling
[CVPR 2026] VGGDrive: Empowering Vision-Language Models with Cross-View Geometric Grounding for Autonomous Driving
[CVPR 2026] SenCache: Accelerating Diffusion Model Inference via Sensitivity-Aware Caching