DESU-CLUB

Low Keng Hoong, Warren DESU-CLUB

Y3 CS @ NUS

17 followers · 53 following

NUS
Singapore

Achievements

Highlights

Lists (1)

Sort

Study Materials

1 repository

Stars

bogdannadev / mfma-cdna-amd

Forked from danila-permogorskii/mfma

AMD specific CDNA architecture Matrix Fused Multiply-Add (MFMA) basics

C++ 3 1 Updated Mar 8, 2026

TheJDen / janestreet-gpu-mode-2025

Forked from janestreet-gpu-mode/hackathon

An educational walkthrough for the 9/6 Hackathon

Jupyter Notebook 15 Updated Mar 5, 2026

AuleTechnologies / Aule-Attention

High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.

Zig 152 6 Updated Jan 27, 2026

triton-lang / Triton-to-tile-IR

incubator repo for CUDA-TileIR backend

MLIR 123 8 Updated Mar 18, 2026

pytorch / helion

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 810 134 Updated Mar 26, 2026

valhrd / CS3230

Code snippets relevant to CS3230

Python 1 Updated Mar 6, 2026

TabbyML / pochi

TypeScript 94 46 Updated Mar 26, 2026

anthropics / original_performance_takehome

Anthropic's original performance take-home, now open for you to try!

Python 3,724 849 Updated Jan 22, 2026

justrach / nanobrew

The fastest macOS package manager. Written in Zig. 3ms warm installs.

Zig 590 4 Updated Mar 26, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 614 67 Updated Mar 24, 2026

m87-labs / kestrel

Python 42 2 Updated Mar 25, 2026

NVlabs / Fast-dLLM

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 898 112 Updated Jan 28, 2026

NVlabs / GatedDeltaNet

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 522 28 Updated Mar 13, 2026

Developer-Y / cs-video-courses

List of Computer Science courses with video lectures.

77,493 10,504 Updated Mar 23, 2026

karpathy / nanochat

The best ChatGPT that $100 can buy.

Python 50,380 6,616 Updated Mar 26, 2026

kanishkg / endless-terminals

Python 92 11 Updated Feb 12, 2026

ksm26 / video-analysis-agent

This project implements an AI agent that verifies if automated Hercules test runs were executed as intended by comparing planning logs, video evidence, and final outputs. It uses open-source LLMs a…

Python 3 1 Updated Jun 30, 2025

justrach / turboAPI

FastAPI-compatible Python framework with Zig HTTP core; 7x faster, free-threading native

Python 786 22 Updated Mar 26, 2026

allenai / open-instruct

AllenAI's post-training codebase

Python 3,657 516 Updated Mar 26, 2026

ServiceNow / PipelineRL

A scalable asynchronous reinforcement learning implementation with in-flight weight updates.

Python 384 40 Updated Mar 25, 2026

Tkaixiang / marktext

Forked from jacobwhall/marktext

Another attempt at modernising Marktext - but built "from the ground up" using electron-vite. Also translates the app into 9 different languages.

JavaScript 251 17 Updated Mar 2, 2026

Python 83 5 Updated Sep 11, 2025