LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,136 222 Updated May 19, 2025

supertone-inc / super-monotonic-align

Python 167 10 Updated Sep 19, 2024

WangHelin1997 / SSR-Speech

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 147 17 Updated Jan 1, 2025

VITA-MLLM / VITA

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,500 183 Updated Mar 28, 2025

keonlee9420 / evaluate-zero-shot-tts

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 93 11 Updated Mar 12, 2025

zhenye234 / xcodec

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 296 23 Updated Oct 12, 2025

AudioLLMs / Awesome-Audio-LLM

Audio Large Language Models

Python 898 46 Updated Jul 5, 2025

QwenLM / Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,062 164 Updated Apr 21, 2025

showlab / Show-o

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,904 89 Updated Jan 8, 2026

AI-S2-Lab / GPT-Talker

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

44 2 Updated Oct 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aaron (Yinghao) Li yl4579

Achievements

Achievements

Highlights

Block or report yl4579

Stars

SesameAILabs / csm

facebookresearch / audiobox-aesthetics

zhenye234 / LLaSA_training

deepseek-ai / DeepSeek-R1

facebookresearch / large_concept_model

naver-ai / usdm

Hannibal046 / Awesome-LLM

alessandroragano / scoreq

fishaudio / fish-speech

mini-sora / minisora

SWivid / F5-TTS

bytedance / SALMONN

FireRedTeam / FireRedTTS

tencent-ailab / MuCodec

karpathy / LLM101n

haidog-yaqub / EzAudio

SonyCSLParis / music2latent

Aria-K-Alethia / BigCodec

kyutai-labs / moshi

yangdongchao / SimpleSpeech

ictnlp / LLaMA-Omni