Skip to content

vthinkxie/ai-recorder

Repository files navigation

AI Recorder

This project is an AI-powered voice recorder that transcribes speech in real-time using client-side and cloud-side AI.

Screenshot 2024-12-05 at 18 46 02

Features

  • Real-time voice activity detection (VAD) using ONNX Web Runtime
  • Speech transcription using Whisper via ONNX Web Runtime or Lepton serverless API
  • Responsive UI with recording and processing indicators

Prerequisites

  • Node.js
  • npm

Installation

  1. Clone the repository:

    git clone https://github.com/vthinkxie/ai-recorder.git
    cd ai-recorder
  2. Install dependencies:

    npm install
  3. Setup workspace token

    Go to Lepton dashboard to get your workspace token. Create a .env file in the root directory of the project and add the following:

    LEPTON_TOKEN=your_workspace_token

    Note: the price whisper provided by lepton ai can be found here

  4. Start the development server:

    npm start

    This will start the development server and you can access the application at http://localhost:3000, the local whisper can be accessed via http://localhost:3000/local

References

Voice Activity Detection

The application uses voice activity detection (VAD) via ONNX Web Runtime to determine when the user is speaking. This is indicated by the red "Recording" text and icon.

Get more detail at https://github.com/snakers4/silero-vad and https://github.com/DictationDaddy/VAD_WEB_DEMO

Whisper tiny

The application integrates with openai/whisper-tiny.en for speech transcription via ONNX Web Runtime. When the user speaks, the transcribed text will appear in the designated area.

License

This project is licensed under the MIT License.

About

AI-powered voice recorder that transcribes speech in real-time using client-side and cloud-side AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages