Stars
Efficient implementation of DeepSeek Ops (Blockwise FP8 GEMM, MoE, and MLA) for AMD Instinct MI300X
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Linux keyboard-operated window manager. Mirror of gitea instance listed below!
High-performance automatic differentiation of LLVM and MLIR.
A micro Wayland compositor that can be used as a Gstreamer plugin
Command and Conquer: Generals - Zero Hour
リアルタイムボイスチェンジャー Realtime Voice Changer
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
NVIDIA curated collection of educational resources related to general purpose GPU programming.
An implementation of physically based shading & image based lighting in D3D11, D3D12, Vulkan, and OpenGL 4.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
CUDA Matrix Multiplication Optimization
A repo for numerical methods written in Rust with wrappers for Python
A Reconfigurable RISC-V Core for Approximate Computing
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
pnn is Darknet compatible neural nets inference engine implemented in Rust.
A personal knowledge management and sharing system for VSCode
A minimal GPU design in Verilog to learn how GPUs work from the ground up
A modern C++ BVH construction and traversal library
A mini x86 linux debugger for teaching purposes
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.