Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
bogdannadev / mfma-cdna-amd
Forked from danila-permogorskii/mfmaAMD specific CDNA architecture Matrix Fused Multiply-Add (MFMA) basics
An educational walkthrough for the 9/6 Hackathon
High-performance FlashAttention-2 for AMD, Intel, and Apple GPUs. Drop-in replacement for PyTorch SDPA. Triton backend for ROCm (MI300X, RDNA3), Vulkan backend for consumer GPUs. No CUDA required.
incubator repo for CUDA-TileIR backend
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
Anthropic's original performance take-home, now open for you to try!
The fastest macOS package manager. Written in Zig. 3ms warm installs.
Accelerating MoE with IO and Tile-aware Optimizations
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
List of Computer Science courses with video lectures.
This project implements an AI agent that verifies if automated Hercules test runs were executed as intended by comparing planning logs, video evidence, and final outputs. It uses open-source LLMs a…
FastAPI-compatible Python framework with Zig HTTP core; 7x faster, free-threading native
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
Tkaixiang / marktext
Forked from jacobwhall/marktextAnother attempt at modernising Marktext - but built "from the ground up" using electron-vite. Also translates the app into 9 different languages.
Build Real-Time Knowledge Graphs for AI Agents
Verifiers for LLM Reinforcement Learning
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
[NeurIPS 2025] Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
Explore training for quantized models
llama.cpp fork with additional SOTA quants and improved performance