Automatic speech recognition in Python
"Here's to Harry ... the best, bar none."
Audiofile transcription using NVIDIA's Parakeet family of multilingual
models with fallback to Whisper.cpp for languages outside Parakeet's scope.
Built with uv for package and project management. Installation's as simple as
uv python install 3.13
uv sync --dev
A handler object can be instantiated and run on one or more audio file paths or
directly on NumPy/Torch tensors, returning a list of ASRHandlerResult
for each input. transcriptions contains a list of
tuples of the form (start_time, end_time, text_segment) and scores a measure
of confidence in a transcription in the range(0, -250):
>>> from caul.handler import ASRHandler
>>> handler = ASRHandler(models="parakeet")
>>> handler.startup()
>>> results = handler.transcribe("<...path to some audio file...>")
>>> print(results)
[ASRInferenceHandlerResult(transcription=[(0.0, 1.5, "We're spending too much time here.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(1.5, 2.9, "Stay a little longer.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(2.9, 4.0, "He'd kill us if he got the chance.")], score=-250.0)]