Shigangli

Shigang Li Shigangli

Beijing University of Posts and Telecommunications
Beijing, China
https://shigangli.github.io/
@shigang_li

Achievements

Stars

ParCIS / FlashSparse

FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by…

Cuda 38 7 Updated Oct 5, 2025

intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System

Python 1,641 213 Updated Mar 27, 2026

ParCIS / Magicube

Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.

C++ 92 16 Updated Nov 23, 2022

f-dangel / backpack

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.

Python 610 57 Updated Nov 28, 2025

spcl / DNN-cpp-proxies

C++/MPI proxies for distributed training of deep neural networks.

C++ 15 3 Updated Jun 18, 2022

ParCIS / Ok-Topk

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with th…

Python 27 10 Updated Dec 10, 2022

alpa-projects / alpa

Training and serving large-scale neural networks with auto parallelization.

Python 3,187 362 Updated Dec 9, 2023

kssteven418 / I-BERT

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Python 267 42 Updated Jan 29, 2023

google / yapf

A formatter for Python files

Python 13,987 902 Updated Mar 6, 2026

ParCIS / Chimera

Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.

Python 70 9 Updated Mar 20, 2025

Shigangli / SpMV-on-Many-Core

A cross-platform Sparse Matrix Vector Multiplication (SpMV) framework for many-core architectures (GPUs and Xeon Phi).

C++ 10 2 Updated Jul 2, 2021

Shigangli / COMPI

Cache-oblivious MPI all-to-all communications based on Morton order

C 3 1 Updated Jun 7, 2022

spcl / daceml

A Data-Centric Compiler for Machine Learning

Python 85 14 Updated Dec 14, 2025

Shigangli / WAGMA-SGD

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can…

Python 6 Updated Jun 30, 2021