Lists (21)
Sort Name ascending (A-Z)
Claude
COMFY UI
CUDA GPU
DiT Acceleration
Collection of acceleration methods specifically for DiTGLSL
INTER
Kinect
Linux
ML Library
PYTHON TOOLS
RAG
Resources
Splat
STABLE DIFFUSION
List of SD workflows and useful componentsSTREAM_DIFFUSION
Sync
TensorRT
Touchdesigner
TRITON
UPSCALING
WINDOWS
Stars
This is a list of useful libraries and resources for CUDA development.
Fast SAM 3D Body: Accelerating SAM 3D Body for Real-Time Full-Body Human Mesh Recovery
Faster Green Screen Keys — async multi-GPU inference engine for professional VFX pipelines
pprofile + matplotlib = Python program profiled as an awesome heatmap!
A powerful set of Python debugging tools, based on PySnooper
Comprehensive GPU specifications database with 2,824 GPUs across NVIDIA, AMD, and Intel
Adobe's reference implementation of the OpenPBR BSDF
Tangle is a web app that allows the users to build and run Machine Learning pipelines without having to set up development environment.
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
[CVPR 2023] DepGraph: Towards Any Structural Pruning; LLMs, Vision Foundation Models, etc.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Example models using DeepSpeed
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Simple real time visualisation of the execution of a Python program.
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
[ICLR2025] Accelerating Diffusion Transformers with Token-wise Feature Caching
AI agents running research on single-GPU nanochat training automatically
Courses on building, compressing, evaluating, and deploying efficient AI models.
Unbearably fast near-real-time pure-Python runtime-static type-checker.
Pruna is a model optimization framework built for developers, enabling you to deliver faster, more efficient models with minimal overhead.
Official Repository of the paper "Trajectory Consistency Distillation"