caul

Automatic speech recognition in Python

"Here's to Harry ... the best, bar none."

Audiofile transcription using NVIDIA's Parakeet family of multilingual models with fallback to Whisper.cpp for languages outside Parakeet's scope. Built with uv for package and project management. Installation's as simple as

uv python install 3.13
uv sync --dev

A handler object can be instantiated and run on one or more audio file paths or directly on NumPy/Torch tensors, returning a list of ASRHandlerResult for each input. transcriptions contains a list of tuples of the form (start_time, end_time, text_segment) and scores a measure of confidence in a transcription in the range(0, -250):

>>> from caul.handler import ASRHandler
>>> handler = ASRHandler(models="parakeet")
>>> handler.startup()
>>> results = handler.transcribe("<...path to some audio file...>")
>>> print(results)
[ASRInferenceHandlerResult(transcription=[(0.0, 1.5, "We're spending too much time here.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(1.5, 2.9, "Stay a little longer.")], score=-250.0),
ASRInferenceHandlerResult(transcription=[(2.9, 4.0, "He'd kill us if he got the chance.")], score=-250.0)]

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.github/workflows		.github/workflows
src/caul		src/caul
test		test
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

caul

About

Uh oh!

Releases 7

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

caul

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages