- Seattle, WA, USA
- http://soldaini.net/
- https://orcid.org/0000-0001-6998-9863
- @soldni
- @soldaini.net
Highlights
Lists (1)
Sort Name ascending (A-Z)
Stars
An autonomous novel writing pipeline, by Hermes Agent
poormanray is a collection of simple tools to manage cloud instances (EC2) and distribute jobs on them
Tools to build fast quality classifiers for Olmo data filtering
Tooling for exact and MinHash deduplication of large-scale text datasets
Our library for RL environments + evals
📚 Freely available programming books
[ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
FULL Augment Code, Claude Code, Cluely, CodeBuddy, Comet, Cursor, Devin AI, Junie, Kiro, Leap.new, Lovable, Manus, NotionAI, Orchids.app, Perplexity, Poke, Qoder, Replit, Same.dev, Trae, Traycer AI…
Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.
Open-source infrastructure for Computer-Use Agents. Sandboxes, SDKs, and benchmarks to train and evaluate AI agents that can control full desktops (macOS, Linux, Windows).
PyTorch building blocks for the OLMo ecosystem
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
Curated list of datasets and tools for post-training.
Versatile typeface for code, from code.
👻 Ghostty is a fast, feature-rich, and cross-platform terminal emulator that uses platform-native UI and GPU acceleration.
😸 Soothing pastel theme for the high-spirited!
A curated list of resources and examples of ASCII Art
A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.
Toolkit for linearizing PDFs for LLM datasets/training
LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.
Large Language Model (LLM) module for the Spezi Ecosystem