-
IBM
- Beijing
-
16:32
(UTC +08:00)
Stars
a language for fast, portable data-parallel computation
Eigen is a C++ template library for linear algebra: matrices, vectors, numerical solvers, and related algorithms.
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Build and run containers leveraging NVIDIA GPUs
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
Ongoing research training transformer models at scale
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs
Open standard for machine learning interoperability
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Build resilient language agents as graphs.
Build and run agents you can see, understand and trust.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A GPU cluster manager that configures and orchestrates inference engines like vLLM and SGLang for high-performance AI model deployment.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
A QoS-based scheduling system brings optimal layout and status to workloads such as microservices, web services, big data jobs, AI jobs, etc.
a toolkit on knowledge distillation for large language models
An Open Source Toolkit For LLM Distillation
A service for managing and provisioning Bare Metal servers. Mirror of code maintained at opendev.org.
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license.
Serve, optimize and scale PyTorch models in production
Algorithms for explaining machine learning models
Development repository for the Triton language and compiler