A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…

Python 3,249 679 Updated Mar 30, 2026

intel / intel-graphics-compiler

C++ 694 181 Updated Mar 30, 2026

NVIDIA / accelerated-computing-hub

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,419 250 Updated Mar 30, 2026

Nadrin / PBR

An implementation of physically based shading & image based lighting in D3D11, D3D12, Vulkan, and OpenGL 4.

C++ 1,496 117 Updated Oct 30, 2021

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,507 1,757 Updated Mar 30, 2026

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 263 25 Updated Jul 19, 2024

oOXpycTOo / nume.rs

A repo for numerical methods written in Rust with wrappers for Python

Rust 1 Updated Aug 4, 2024

mikex86 / LibreCuda

C 1,083 43 Updated May 18, 2025

phoeniX-Digital-Design / phoeniX

A Reconfigurable RISC-V Core for Approximate Computing

Verilog 130 82 Updated May 30, 2025

NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 12,840 2,332 Updated Mar 25, 2026

ptaxom / pnn

pnn is Darknet compatible neural nets inference engine implemented in Rust.

Rust 17 1 Updated Jan 8, 2022

foambubble / foam

A personal knowledge management and sharing system for VSCode

TypeScript 16,979 755 Updated Mar 30, 2026

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 2,246 371 Updated Mar 30, 2026

adam-maj / tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 12,066 1,104 Updated Aug 18, 2024

madmann91 / bvh

A modern C++ BVH construction and traversal library

C++ 1,139 115 Updated Jun 16, 2025

markmap / markmap

Build mindmaps with plain text

TypeScript 12,608 954 Updated Jun 12, 2025

TartanLlama / minidbg

A mini x86 linux debugger for teaching purposes

C++ 650 107 Updated Aug 2, 2024

llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.

LLVM 37,655 16,717 Updated Mar 30, 2026

Daivuk / PureDOOM

Pure DOOM - Single Header Doom Source Port

C++ 391 41 Updated Feb 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ptaxom

Achievements

Achievements

Block or report ptaxom

Stars

RadeonFlow / RadeonFlow_Kernels

openai / gpt-oss

NVIDIA-Omniverse / PhysX

stjet / ming-wm

EnzymeAD / Enzyme

bertmaher / simplegemm

games-on-whales / gst-wayland-display

SzymonOzog / GPU_Programming

electronicarts / CnC_Generals_Zero_Hour

w-okada / voice-changer

MircoWerner / VkLBVH

NVIDIA / TransformerEngine