Skip to content
View TS-toolchain's full-sized avatar

Block or report TS-toolchain

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 18,818 1,706 Updated Jan 30, 2026

一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework

C++ 1,772 212 Updated Mar 28, 2026

MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.

C++ 14,699 2,265 Updated Mar 30, 2026

On-device AI across mobile, embedded and edge for PyTorch

Python 4,443 905 Updated Mar 30, 2026

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 6,564 864 Updated Dec 22, 2025

YOLO-World-ONNX is a Python package for running inference on YOLO-WORLD Open-vocabulary-object detection model using ONNX models. It provides an easy-to-use interface for performing inference on im…

Python 16 2 Updated Feb 6, 2026

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 6,281 590 Updated Feb 26, 2025

这个是一个在SSD的基础上用于生成绘制mAP代码所用的txt的例子。(目的是生成txt)

Python 128 40 Updated Jan 31, 2021

Utilities intended for use with Llama models.

Python 7,540 1,352 Updated Feb 11, 2026

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 16,539 2,345 Updated Sep 3, 2025

SGLang is a high-performance serving framework for large language models and multimodal models.

Python 25,230 5,061 Updated Mar 30, 2026

Large Language Model Text Generation Inference

Python 10,818 1,262 Updated Mar 21, 2026

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 74,734 14,966 Updated Mar 30, 2026

Inference code for Llama models

Python 59,277 9,827 Updated Jan 26, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,738 675 Updated Mar 30, 2026

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 6,811 752 Updated Mar 30, 2026

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 7,207 396 Updated Jul 11, 2024

paraformer(chinense asr) online onnx runtime for python

Python 54 6 Updated Mar 27, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 15,462 1,623 Updated Mar 17, 2026

Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.

Python 74 5 Updated Mar 10, 2026

Doing simple retrieval from LLM models at various context lengths to measure accuracy

Jupyter Notebook 2,229 237 Updated Aug 17, 2024

Coral issue tracker (and legacy Edge TPU API source)

C++ 475 128 Updated Oct 27, 2021

LLMem: GPU Memory Estimation for Fine-Tuning Pre-Trained LLMs

Python 29 3 Updated May 31, 2025

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 13,376 895 Updated Dec 17, 2024

A Next-Generation Training Engine Built for Ultra-Large MoE Models

Python 5,109 414 Updated Mar 30, 2026

Go ahead and axolotl questions

Python 11,544 1,284 Updated Mar 30, 2026

Instruction Tuning with GPT-4

HTML 4,335 311 Updated Jun 11, 2023

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook 10,862 871 Updated Jun 10, 2024
Next