Stars
Accelerating MoE with IO and Tile-aware Optimizations
Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch
Unsloth Studio is a web UI for training and running open models like Qwen, DeepSeek, gpt-oss and Gemma locally.
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Spiking Brain-inspired Large Models, integrating hybrid efficient attention, MoE modules and spike encoding into its architecture
Pretraining and inference code for a large-scale depth-recurrent language model
Deep and online learning with spiking neural networks in Python
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Efficient Triton Kernels for LLM Training
Hierarchical Reasoning Model Official Release
A PyTorch native platform for training generative AI models
Minimalistic large language model 3D-parallelism training
🔥 A minimal training framework for scaling FLA models
Build compute kernels and load them from the Hub.
Everything about the SmolLM and SmolVLM family of models
Continuous Thought Machines, because thought takes time and reasoning is a process.
CUDA Python: Performance meets Productivity
Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over GGUF
[NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models
RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…
Understand and test language model architectures on synthetic tasks.
Training Sparse Autoencoders on Language Models
Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling
Sparsify transformers with SAEs and transcoders