This project is an AI-powered voice recorder that transcribes speech in real-time using client-side and cloud-side AI.
- Real-time voice activity detection (VAD) using ONNX Web Runtime
- Speech transcription using Whisper via ONNX Web Runtime or Lepton serverless API
- Responsive UI with recording and processing indicators
- Node.js
- npm
-
Clone the repository:
git clone https://github.com/vthinkxie/ai-recorder.git cd ai-recorder -
Install dependencies:
npm install
-
Setup workspace token
Go to Lepton dashboard to get your workspace token. Create a
.envfile in the root directory of the project and add the following:LEPTON_TOKEN=your_workspace_token
Note: the price whisper provided by lepton ai can be found here
-
Start the development server:
npm start
This will start the development server and you can access the application at http://localhost:3000, the local whisper can be accessed via http://localhost:3000/local
The application uses voice activity detection (VAD) via ONNX Web Runtime to determine when the user is speaking. This is indicated by the red "Recording" text and icon.
Get more detail at https://github.com/snakers4/silero-vad and https://github.com/DictationDaddy/VAD_WEB_DEMO
The application integrates with openai/whisper-tiny.en for speech transcription via ONNX Web Runtime. When the user speaks, the transcribed text will appear in the designated area.
This project is licensed under the MIT License.