A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 887 62 Updated Mar 16, 2026

narcotic-sh / senko

Very fast, accurate speaker diarization

Python 245 24 Updated Mar 25, 2026

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,422 1,309 Updated Mar 30, 2026

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 100 6 Updated Nov 1, 2025

vibevoice-community / VibeVoice

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 1,017 381 Updated Jan 23, 2026

XiaomiMiMo / MiMo-Audio-Training

Python 102 12 Updated Oct 16, 2025

meituan-longcat / LongCat-Audio-Codec

LongCat Audio Tokenizer and Detokenizer

Python 287 21 Updated Mar 22, 2026

inclusionAI / MingTok-Audio

Python 83 8 Updated Feb 24, 2026

OpenMOSS / MOSS-Speech

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 128 7 Updated Feb 13, 2026

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,004 102 Updated Mar 3, 2026

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,570 240 Updated Jan 8, 2026

Hannieliao / Emilia-NV

Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"

87 2 Updated Sep 18, 2025

LAION-AI / emotion-annotations

Python 109 10 Updated Feb 28, 2026

ali-vilab / alitok

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025

playht / PlayDiffusion

Python 537 57 Updated Mar 18, 2026

liutaocode / TTS-arxiv-daily

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 633 41 Updated Mar 30, 2026

jasonppy / VoiceStar

VoiceStar: Robust, Duration-controllable TTS that can Extrapolate

Python 311 28 Updated May 31, 2025

X-LANCE / KWStreamingSearch

Python 83 5 Updated Jun 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xinkez

Block or report xinkez

Stars

rishikksh20 / voxtral-codec-pytoch

HumeAI / tada

openclaw / openclaw

beethogedeon / MulliVC

astral-sh / ty-vscode

webmachinelearning / webmcp

antirez / voxtral.c

liyunlongaaa / MiMo-Tokenizer-Trainer

z-lab / dflash

JIA-Lab-research / MGM-Omni

jjery2243542 / flow-slm

locustio / locust

stepfun-ai / Step-Audio-EditX