Skip to content
View yl4579's full-sized avatar
  • Columbia University
  • New York, US

Highlights

  • Pro

Block or report yl4579

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Conversational Speech Generation Model

Python 14,566 1,471 Updated May 27, 2025

Unified automatic quality assessment for speech, music, and sound.

Python 698 50 Updated Jun 5, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 660 52 Updated Jan 21, 2026

Large Concept Models: Language modeling in a sentence representation space

Python 2,344 207 Updated Jan 29, 2025

Official PyTorch implementation of "Paralinguistics-Aware Speech-Empowered LLMs for Natural Conversation" (NeurIPS 2024)

Python 94 4 Updated Dec 3, 2024

Awesome-LLM: a curated list of Large Language Model

26,566 2,408 Updated Jul 31, 2025

SCOREQ: Speech COntrastive REgression for Quality Assessment (NeurIPS 2024)

Python 108 8 Updated Aug 1, 2025

SOTA Open Source TTS

Python 28,944 2,431 Updated Mar 30, 2026

MiniSora: A community aims to explore the implementation path and future development direction of Sora.

Python 1,286 149 Updated Feb 18, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 14,274 2,109 Updated Mar 24, 2026

SALMONN family: A suite of advanced multi-modal LLMs

1,400 112 Updated Feb 3, 2026

An Open-Sourced LLM-empowered Foundation TTS System

Python 907 83 Updated Sep 28, 2025
Python 157 8 Updated Nov 22, 2024

LLM101n: Let's build a Storyteller

36,633 2,002 Updated Aug 1, 2024

High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

Python 330 25 Updated Dec 17, 2025

Encode and decode audio samples to/from compressed latent representations!

Python 251 25 Updated Sep 19, 2025

Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"

Python 213 17 Updated Sep 19, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 9,941 922 Updated Mar 4, 2026

The open source code for SimpleSpeech series

Python 145 11 Updated Oct 8, 2024

LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.

Python 3,136 222 Updated May 19, 2025

SSR-Speech: Towards Stable, Safe and Robust Zero-shot Speech Editing and Synthesis

Python 147 17 Updated Jan 1, 2025

✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,500 183 Updated Mar 28, 2025

Evaluation Protocol for Large-Scale Zero-Shot TTS Literature

Python 93 11 Updated Mar 12, 2025

AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Python 296 23 Updated Oct 12, 2025

Audio Large Language Models

Python 898 46 Updated Jul 5, 2025

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Python 2,062 164 Updated Apr 21, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,904 89 Updated Jan 8, 2026

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

44 2 Updated Oct 28, 2024
Next