Skip to content
View xinkez's full-sized avatar

Block or report xinkez

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Voxtral Codec : Combining Semantic VQ and Acoustic FSQ for Ultra-Low Bitrate Speech Generation (Voxtral TTS Backbone)

Python 11 1 Updated Mar 27, 2026

Open Source Speech Language Model

Jupyter Notebook 933 98 Updated Mar 24, 2026

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

TypeScript 341,306 67,365 Updated Mar 30, 2026
Python 2 Updated Oct 12, 2025

A Visual Studio Code extension for ty.

TypeScript 348 14 Updated Mar 30, 2026

🤖 WebMCP

Bikeshed 2,191 133 Updated Mar 27, 2026

Pure C inference of Mistral Voxtral Realtime 4B speech to text model

C 1,573 105 Updated Feb 15, 2026

Unofficial implementation of training pipeline in mimo-tokenizer about "MiMo-Audio: Audio Language Models are Few-Shot Learners"

Python 3 Updated Nov 9, 2025

DFlash: Block Diffusion for Flash Speculative Decoding

Python 672 46 Updated Mar 17, 2026

MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech

Python 258 15 Updated Mar 26, 2026
Python 13 1 Updated Mar 18, 2026

Write scalable load tests in plain Python 🚗💨

Python 27,654 3,195 Updated Mar 29, 2026

A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech

Python 887 62 Updated Mar 16, 2026

Very fast, accurate speaker diarization

Python 245 24 Updated Mar 25, 2026

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.5, DeepSeek-R1, GLM-5, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Llava, Phi4, ...)…

Python 13,422 1,309 Updated Mar 30, 2026

Trainging, inference, and testing of the SAC speech codec model.

Python 100 6 Updated Nov 1, 2025

VibeVoice: Expressive, longform conversational speech synthesis. (Community fork)

Python 1,017 381 Updated Jan 23, 2026

LongCat Audio Tokenizer and Detokenizer

Python 287 21 Updated Mar 22, 2026
Python 83 8 Updated Feb 24, 2026

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 128 7 Updated Feb 13, 2026

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 1,004 102 Updated Mar 3, 2026

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 3,570 240 Updated Jan 8, 2026

Official Repository of Paper: "Emilia-NV: A Non-Verbal Speech Dataset with Word-Level Annotation for Human-Like Speech Modeling"

87 2 Updated Sep 18, 2025

[ICLR2026] AliTok: Towards Sequence Modeling Alignment between Tokenizer and Autoregressive Model

Python 53 2 Updated Oct 12, 2025
Python 537 57 Updated Mar 18, 2026

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 633 41 Updated Mar 30, 2026

VoiceStar: Robust, Duration-controllable TTS that can Extrapolate

Python 311 28 Updated May 31, 2025
Python 83 5 Updated Jun 25, 2025
Next