Skip to content

ICIJ/caul

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lint Unit tests Supported Python versions Version

caul

Automatic speech recognition in Python

"Here's to Harry ... the best, bar none."

Audiofile transcription using NVIDIA's Parakeet family of multilingual models with fallback to Whisper.cpp for languages outside Parakeet's scope. Built with uv for package and project management. Installation's as simple as

uv python install 3.13
uv sync --dev

A handler object can be instantiated and run on one or more audio file paths or directly on NumPy/Torch tensors, returning a list of ASRHandlerResult for each input. transcriptions contains a list of tuples of the form (start_time, end_time, text_segment) and scores a measure of confidence in a transcription in the range(0, -250):

>>> from caul.handler import ASRHandler
>>> handler = ASRHandler(models="parakeet")
>>> handler.startup()
>>> results = handler.transcribe("<...path to some audio file...>")
>>> print(results)
[ASRInferenceHandlerResult(transcription=[(0.0, 1.5, "We're spending too much time here.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(1.5, 2.9, "Stay a little longer.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(2.9, 4.0, "He'd kill us if he got the chance.")], score=-250.0)]

About

Automatic speech recognition library in Python

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages