Skip to content
View Shigangli's full-sized avatar

Block or report Shigangli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashSparse significantly reduces the computation redundancy for unstructured sparsity (for SpMM and SDDMM) on Tensor Cores through a Swap-and-Transpose mapping strategy. FlashSparse is accepted by…

Cuda 38 7 Updated Oct 5, 2025

DLRover: An Automatic Distributed Deep Learning System

Python 1,641 213 Updated Mar 27, 2026

Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.

C++ 92 16 Updated Nov 23, 2022

BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.

Python 610 57 Updated Nov 28, 2025

C++/MPI proxies for distributed training of deep neural networks.

C++ 15 3 Updated Jun 18, 2022

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with th…

Python 27 10 Updated Dec 10, 2022

Training and serving large-scale neural networks with auto parallelization.

Python 3,187 362 Updated Dec 9, 2023

[ICML'21 Oral] I-BERT: Integer-only BERT Quantization

Python 267 42 Updated Jan 29, 2023

A formatter for Python files

Python 13,987 902 Updated Mar 6, 2026

Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.

Python 70 9 Updated Mar 20, 2025

A cross-platform Sparse Matrix Vector Multiplication (SpMV) framework for many-core architectures (GPUs and Xeon Phi).

C++ 10 2 Updated Jul 2, 2021

Cache-oblivious MPI all-to-all communications based on Morton order

C 3 1 Updated Jun 7, 2022

A Data-Centric Compiler for Machine Learning

Python 85 14 Updated Dec 14, 2025

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can…

Python 6 Updated Jun 30, 2021
Jupyter Notebook 7 1 Updated Jul 29, 2025

Research and development for optimizing transformers

Python 131 16 Updated Feb 16, 2021

Deep Learning for Post-Processing Ensemble Weather Forecasts

Jupyter Notebook 92 17 Updated Mar 24, 2023

Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.

Python 8 Updated Nov 18, 2021

DaCe - Data Centric Parallel Programming

Python 581 155 Updated Mar 30, 2026

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Python 17,121 3,733 Updated Jun 2, 2023

A Deep Learning Meta-Framework and HPC Benchmarking Library

Python 81 26 Updated May 23, 2022

NetworkX clone in JavaScript

JavaScript 13 1 Updated Oct 17, 2016

CSR-based SpGEMM on nVidia and AMD GPUs

C++ 48 8 Updated Apr 9, 2016