-
Humane
- Atlanta, Georgia
-
20:44
(UTC -04:00) - @ericlewisplease
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
A powerful toolkit for creating concise and expressive Swift macros
Advanced AI-Powered Reverse Engineering Tool with Agent Skills Integration
Developer documentation for writing new firmware for the SP-1 stem player by Teenage Engineering.
[ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
You like pytorch? You like micrograd? You love tinygrad! ❤️
A framework for unified personalized model, achieving mutual enhancement between personalized understanding and generation. Demonstrating the potential of cross-task information transfer in persona…
Research implementation to investigate methods of integrating the speech modality into pre-trained language models
Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
A paper list of some recent works about Token Compress for Vit and VLM
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
Reading list for research topics in multimodal machine learning
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
BLIP-2 implementation for training vision-language models. Q-Former + frozen encoders + any LLM. Colab-ready notebooks with MoE variant.
An API-compatible, drop-in replacement for Apple's Foundation Models framework with support for custom language model providers.
From Chain-of-Thought prompting to OpenAI o1 and DeepSeek-R1 🍓
A Framework of Small-scale Large Multimodal Models
Trying to study the effect of different connectors , (linear, MLP and Cross Attention) to analyze what paradigms do LLM'S use or make a best guess
A curated list of vision-and-language pre-training (VLP). :-)
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
MMSearch-R1 is an end-to-end RL framework that enables LMMs to perform on-demand, multi-turn search with real-world multimodal search tools.
Turn Apple's CVPR-25 FastVLM encoder into a reproducible baseline for mobile apps. First complete implementation achieving <250ms multimodal inference on iPhone.
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
Famous Vision Language Models and Their Architectures
Run frontier LLMs and VLMs with day-0 model support across GPU, NPU, and CPU, with comprehensive runtime coverage for PC (Python/C++), mobile (Android & iOS), and Linux/IoT (Arm64 & x86 Docker). Su…