Stars
Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework
MNN: A blazing-fast, lightweight inference engine battle-tested by Alibaba, powering high-performance on-device LLMs and Edge AI.
On-device AI across mobile, embedded and edge for PyTorch
AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。
YOLO-World-ONNX is a Python package for running inference on YOLO-WORLD Open-vocabulary-object detection model using ONNX models. It provides an easy-to-use interface for performing inference on im…
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
这个是一个在SSD的基础上用于生成绘制mAP代码所用的txt的例子。(目的是生成txt)
Utilities intended for use with Llama models.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
SGLang is a high-performance serving framework for large language models and multimodal models.
Large Language Model Text Generation Inference
A high-throughput and memory-efficient inference and serving engine for LLMs
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
paraformer(chinense asr) online onnx runtime for python
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inference Server.
Doing simple retrieval from LLM models at various context lengths to measure accuracy
Coral issue tracker (and legacy Edge TPU API source)
LLMem: GPU Memory Estimation for Fine-Tuning Pre-Trained LLMs
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
A Next-Generation Training Engine Built for Ultra-Large MoE Models
Instruction Tuning with GPT-4
QLoRA: Efficient Finetuning of Quantized LLMs