A Doctoral Thesis by Jesse Wood Victoria University of Wellington
A configuration-driven framework for the analysis of spectral data using Deep Learning, Classic Machine Learning, and Evolutionary Algorithms.
Important
Code vs. Data: The source code in this repository is open-source under the MIT license. However, the accompanying REIMS dataset is private research data. You must use the fishy download-data command with an authorized token to fetch the data locally.
- Consolidated CLI: Run training, benchmarks, and the dashboard via a single
fishycommand. - Configuration-Driven Architecture: Add new datasets, models, or tasks by simply editing YAML files in
fishy/configs/. - Unified Training Engine: Centralized
Trainerclass handles loops, metrics, and early stopping consistently across all experiments. - Advanced XAI Pipeline: Automated biomarker discovery with direct mapping to chemical databases (LipidMaps).
- Pro-Tier Architectures: High-capacity Sparsely-Gated Mixture of Experts (
gmoe) for complex spectral profiles.
-
Clone the repository:
git clone https://github.com/woodRock/fishy-business.git cd fishy-business -
Install dependencies and CLI:
pip install -e .
The framework provides a unified CLI via the fishy command.
New users should start with the Interactive Wizard, which guides you through model selection, dataset choice, and analysis setup:
fishy wizardFor visual data exploration, real-time training monitoring, and advanced biomarker analysis, use the built-in dashboard:
fishy dashboardAll model types (Deep, Classic, Evolutionary, Contrastive) are trained using the same unified command:
# Train a Gated MoE model with XAI biomarker discovery
fishy train -m gmoe -d species --xai
# Train a Deep Learning model with performance benchmarking
fishy train -m transformer -d species --benchmark --figures
# Train a Classic ML model
fishy train -m rf -d oil
# Train an Evolutionary Algorithm (Feature Weighting)
fishy train -m ga -d partFor large-scale or reproducible experiments, use YAML configuration files:
fishy train -c fishy/configs/experiments/quick_benchmark.yamlExpert flags are hidden by default to keep the interface clean. View them using:
# See context-aware help for transfer learning
fishy train --transfer --help
# See ALL expert overrides (Hyperparameters, XAI, etc.)
fishy train --all --helpRun the full doctoral benchmarking suite with statistical analysis (paired t-tests):
fishy run_all --num-runs 30The library is designed to be extended without modifying core logic:
- New Dataset: Add an entry to
fishy/configs/datasets.yamlwith filtering rules and label encoding type. - New Model: Add the class path and default hyperparameters to
fishy/configs/models.yaml. - New Pre-training Task: Define the method and hyperparameters in
fishy/configs/pre_training.yaml.
For advanced usage in Python scripts, you can explore our tutorials in two ways:
We provide Jupyter notebooks in the notebooks/ directory matching the thesis chapters:
- 01_Datasets and Preprocessing
- 02_Species and Part Identification
- 03_Oil and Cross-species Adulteration
- 04_Contrastive Learning for Batch Detection
These are also rendered beautifully in our online documentation.
You can run the entire framework, including the dashboard, in a containerized environment:
-
Build the image:
docker build -t fishy-business . -
Run the CLI:
docker run fishy-business fishy train -m transformer -d species
We maintain high code quality through automated testing:
# Run unit tests with coverage
pytest tests/
# Run documentation tests
pytest --doctest-modules fishy/Comprehensive documentation is available at Read the Docs.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this framework in your research, please cite the following paper:
@article{wood2025hook,
title={Hook, line, and spectra: machine learning for fish species identification and body part classification using rapid evaporative ionization mass spectrometry},
author={Wood, Jesse and Nguyen, Bach and Xue, Bing and Zhang, Mengjie and Killeen, Daniel},
journal={Intelligent Marine Technology and Systems},
volume={3},
number={1},
pages={16},
year={2025},
publisher={Springer}
}