Skip to content
View DESU-CLUB's full-sized avatar
  • NUS
  • Singapore

Highlights

  • Pro

Block or report DESU-CLUB

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

AMD specific CDNA architecture Matrix Fused Multiply-Add (MFMA) basics

C++ 3 1 Updated Mar 8, 2026

An educational walkthrough for the 9/6 Hackathon

Jupyter Notebook 15 Updated Mar 5, 2026

High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.

Zig 152 6 Updated Jan 27, 2026

incubator repo for CUDA-TileIR backend

MLIR 123 8 Updated Mar 18, 2026

A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.

Python 810 134 Updated Mar 26, 2026

Code snippets relevant to CS3230

Python 1 Updated Mar 6, 2026
TypeScript 94 46 Updated Mar 26, 2026

Anthropic's original performance take-home, now open for you to try!

Python 3,724 849 Updated Jan 22, 2026

The fastest macOS package manager. Written in Zig. 3ms warm installs.

Zig 590 4 Updated Mar 26, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 614 67 Updated Mar 24, 2026
Python 42 2 Updated Mar 25, 2026

Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"

Python 898 112 Updated Jan 28, 2026

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 522 28 Updated Mar 13, 2026

List of Computer Science courses with video lectures.

77,493 10,504 Updated Mar 23, 2026

The best ChatGPT that $100 can buy.

Python 50,380 6,616 Updated Mar 26, 2026
Python 92 11 Updated Feb 12, 2026

This project implements an AI agent that verifies if automated Hercules test runs were executed as intended by comparing planning logs, video evidence, and final outputs. It uses open-source LLMs a…

Python 3 1 Updated Jun 30, 2025

FastAPI-compatible Python framework with Zig HTTP core; 7x faster, free-threading native

Python 786 22 Updated Mar 26, 2026

AllenAI's post-training codebase

Python 3,657 516 Updated Mar 26, 2026

A scalable asynchronous reinforcement learning implementation with in-flight weight updates.

Python 384 40 Updated Mar 25, 2026

Another attempt at modernising Marktext - but built "from the ground up" using electron-vite. Also translates the app into 9 different languages.

JavaScript 251 17 Updated Mar 2, 2026

Build Real-Time Knowledge Graphs for AI Agents

Python 24,231 2,397 Updated Mar 21, 2026

Learn CUDA with PyTorch

Cuda 254 35 Updated Mar 14, 2026

Verifiers for LLM Reinforcement Learning

Python 83 5 Updated Sep 11, 2025

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9,247 904 Updated Mar 25, 2026

Reinforcement Learning for LLMs

Python 3 Updated Aug 10, 2025

[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation

Python 2,863 480 Updated Dec 18, 2025

MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts

Jupyter Notebook 355 50 Updated Sep 29, 2025

Explore training for quantized models

Python 26 2 Updated Jul 12, 2025

llama.cpp fork with additional SOTA quants and improved performance

C++ 1,881 240 Updated Mar 26, 2026
Next