Skip to content
View huwade's full-sized avatar

Block or report huwade

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 496 67 Updated Nov 26, 2024

[ECCV 2022]Code for paper "DaViT: Dual Attention Vision Transformer"

Python 374 34 Updated Feb 13, 2024

yolort is a runtime stack for yolov5 on specialized accelerators such as tensorrt, libtorch, onnxruntime, tvm and ncnn.

Python 730 154 Updated Mar 26, 2026

Introduction to Parallel Programming class code

Cuda 1,347 1,144 Updated Jun 27, 2022

Leibniz formula for π

C 13 107 Updated Oct 2, 2019

coding CUDA everyday!

Cuda 74 2 Updated Feb 5, 2026

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,713 382 Updated Mar 16, 2026

Learnings and programs related to CUDA

Cuda 435 20 Updated Jun 29, 2025

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 766 89 Updated Aug 14, 2025

LLM training in simple, raw C/CUDA

Cuda 29,266 3,452 Updated Jun 26, 2025

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Python 24,987 3,478 Updated Feb 11, 2026

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.

Python 107,051 12,335 Updated Mar 26, 2026

CUDA by Example, written by two senior members of the CUDA software platform team, shows programmers how to employ this new technology. The authors introduce each area of CUDA development through w…

C 474 149 Updated Jun 30, 2023

[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone

Python 2,084 131 Updated Mar 11, 2026

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 1,101 110 Updated Dec 30, 2024

关于Python的面试题

Shell 17,265 5,532 Updated Mar 5, 2025

Kolosal AI is an OpenSource and Lightweight alternative to LM Studio to run LLMs 100% offline on your device.

C++ 442 30 Updated May 22, 2025

AutoMQ is a diskless Kafka® on S3. 10x Cost-Effective. No Cross-AZ Traffic Cost. Autoscale in seconds. Single-digit ms latency. Multi-AZ Availability.

Java 9,628 674 Updated Mar 24, 2026

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,746 233 Updated Mar 7, 2026

Efficient vision foundation models for high-resolution generation and perception.

Python 3,272 235 Updated Sep 5, 2025

[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Python 671 106 Updated Mar 29, 2024

An OCR API service implementation using PaddleOCR and FastAPI

Python 1 Updated Mar 3, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 10,024 1,006 Updated Mar 23, 2026

Some common CUDA kernel implementations (Not the fastest).

Cuda 29 3 Updated Dec 5, 2025

Mamba SSM architecture

Python 17,736 1,660 Updated Mar 26, 2026

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation.

Rust 11,154 797 Updated Mar 26, 2026

Safety helmet wearing detect dataset, with pretrained model

Python 1,672 419 Updated Dec 17, 2019

Base on YOLOv5 Head Person Helmet Detection on Construction Sites,基于目标检测工地安全帽和禁入危险区域识别系统,🚀😆附 YOLOv5 训练自己的数据集超详细教程🚀😆2021.3新增可视化界面❗❗

Python 2,583 486 Updated Apr 11, 2024

Read-only mirror of https://gitlab.gnome.org/GNOME/glib

C 1,722 573 Updated Mar 26, 2026

NVIDIA DeepStream SDK 8.0 / 7.1 / 7.0 / 6.4 / 6.3 / 6.2 / 6.1.1 / 6.1 / 6.0.1 / 6.0 / 5.1 implementation for YOLO models

Python 1,979 455 Updated Jan 25, 2026
Next