Starred repositories
Code for openai.fm, a demo for the OpenAI Speech API
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Generates an image from a DOM node using HTML5 canvas
Offline voice input app for macOS on Apple Silicon — powered by MLX-Audio (Whisper/Qwen3-ASR)
Capture system loopback audio on macOS 12.3+, Windows and Linux
Build ultra fast, tiny, and cross-platform desktop apps with Typescript.
The swiss army knife of lossless video/audio editing
C inference for Qwen3-ASR 0.6b and 1.7b transcriptions models
A fast and soft pattern search for trillion-scale corpora.
Offline streaming speech-to-text in the browser
A Streaming-Native Serving Engine for TTS/STS Models
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
A real-time and light-weight software for generation of non-linguistic behaviors (turn-taking, backchannel, and head-nodding) in conversational AIs
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.
VoiceBench: Benchmarking LLM-Based Voice Assistants
Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞
🎙️ AI Dictation App - Open Source and Local-first ⚡ Type 3x faster, no keyboard needed. 🆓 Powered by open source models, works offline, fast and accurate.
Chrome extension that analyzes tweets on X timeline based on the X algorithm weights
Massive open Japanese speech corpus
A free, open source, and extensible speech-to-text application that works completely offline.
Browser automation CLI for AI agents
Curated list of design and UI resources from stock photos, web templates, CSS frameworks, UI libraries, tools and much more
Training code for FAcodec presented in NaturalSpeech3
Unsupervised Speech Decomposition Via Triple Information Bottleneck