- All languages
- ActionScript
- Batchfile
- C
- C#
- C++
- CSS
- CoffeeScript
- Cuda
- Dart
- Dockerfile
- EJS
- FreeMarker
- GCC Machine Description
- Go
- Groovy
- HTML
- Haskell
- Java
- JavaScript
- Jsonnet
- Jupyter Notebook
- Kotlin
- Lua
- Makefile
- Markdown
- Nim
- Objective-C
- OpenSCAD
- PHP
- Perl
- PostScript
- Python
- Rich Text Format
- Ruby
- Rust
- SCSS
- Shell
- Swift
- TeX
- TypeScript
- Vim Script
- Vue
Starred repositories
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching
[NeurIPS 2025 Oral]Infinity⭐️: Unified Spacetime AutoRegressive Modeling for Visual Generation
推荐/广告/搜索领域工业界经典以及最前沿论文集合。A collection of industry classics and cutting-edge papers in the field of recommendation/advertising/search.
Tile primitives for speedy kernels
An official implementation of DanceGRPO: Unleashing GRPO on Visual Generation
Survey: A collection of AWESOME papers and resources on the large language model (LLM) related recommender system topics.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
PyTorch native quantization and sparsity for training and inference
Efficient Triton Kernels for LLM Training
Official code of "StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs".
<Foundations of Computer Vision> Book
Get started with building Fullstack Agents using Gemini 2.5 and LangGraph
Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache).
Finetune VITS and MMS using HuggingFace's tools
Collection of leaked system prompts
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
What are the principles we can use to build LLM-powered software that is actually good enough to put in the hands of production customers?
Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
《Designing Data-Intensive Application》DDIA 第一版 / 第二版 中文翻译
[CVPR2025 Highlight] Video Generation Foundation Models: https://saiyan-world.github.io/goku/