daodaoawaker

Follow

daodaoawaker

Follow

1 follower · 60 following

Stars

onnx / optimizer

ONNX Optimizer

C++ 802 102 Updated Apr 2, 2026

onnxsim / onnxsim

Simplify your onnx model

C++ 4,311 421 Updated Apr 2, 2026

Qualcomm-AI-research / FP8-quantization

Python 171 12 Updated Mar 9, 2023

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 23,844 2,745 Updated Mar 12, 2026

xlite-dev / ffpa-attn

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 255 14 Updated Feb 13, 2026

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 23,101 2,574 Updated Apr 2, 2026

spcl / QuaRot

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 498 69 Updated Nov 26, 2024

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 336 30 Updated Jul 2, 2024

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 820 61 Updated Mar 6, 2025

henrythe9th / AI-Crash-Course

AI Crash Course to help busy builders catch up to the public frontier of AI research in 2 weeks

5,952 858 Updated Feb 23, 2026

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 18,827 2,723 Updated Apr 2, 2026

artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,864 870 Updated Jun 10, 2024

list0830 / SSVQ

[ICCV'25] SSVQ: Unleashing the potential of vector quantization with sign-splitting

Python 9 1 Updated Jul 30, 2025

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 264 25 Updated Jul 19, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 1,117 170 Updated Sep 2, 2025

intel / neural-compressor

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

Python 2,611 302 Updated Apr 1, 2026

IST-DASLab / sparsegpt

Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".

Python 876 119 Updated Aug 20, 2024

he-y / Awesome-Pruning

A curated list of neural network pruning resources.

2,491 332 Updated Apr 4, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,092 1,022 Updated Mar 23, 2026

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 22,302 1,979 Updated Apr 2, 2026

nunchaku-ai / deepcompressor

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 772 89 Updated Aug 14, 2025

xiaoweiChen / xiaoweiChen

MySelf

24 4 Updated Jan 19, 2026

ModelTC / LightCompress

[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.

Python 696 76 Updated Apr 1, 2026

ModelTC / MQBench

Model Quantization Benchmark

Python 862 142 Updated Apr 20, 2025

ModelTC / Dipoorlet

Offline Quantization Tools for Deploy.

Python 144 19 Updated Dec 28, 2023

facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 10,398 775 Updated Mar 30, 2026

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,966 4,772 Updated Apr 2, 2026

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 41,372 4,523 Updated Mar 30, 2026

LetheSec / HuggingFace-Download-Accelerator

利用HuggingFace的官方下载工具从镜像网站进行高速下载。

Python 1,309 115 Updated Oct 12, 2024

nndeploy / nndeploy

一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework

C++ 1,777 212 Updated Mar 28, 2026