Skip to content

PINTO0309/HSC

Repository files navigation

HSC

DOI GitHub License Ask DeepWiki

Happy smile classifier. The estimation is done on the entire head, 48x48 pixels, rather than the face.

output_.mp4
Variant Size F1 CPU
inference
latency
ONNX
P 115 KB 0.4841 0.23 ms Download
N 176 KB 0.5849 0.41 ms Download
T 280 KB 0.6701 0.52 ms Download
S 495 KB 0.7394 0.64 ms Download
C 876 KB 0.7344 0.69 ms Download
M 1.7 MB 0.8144 0.85 ms Download
L 6.4 MB 0.8293 1.03 ms Download

Data sample

1 2 3 4
image image image image

Setup

git clone https://github.com/PINTO0309/HSC.git && cd HSC
curl -LsSf https://astral.sh/uv/install.sh | sh
uv sync
source .venv/bin/activate

Inference

uv run python demo_hsc.py \
-hm hsc_l_48x48.onnx \
-v 0 \
-ep cuda \
-dlr -dnm -dgm -dhm -dhd

uv run python demo_hsc.py \
-hm hsc_l_48x48.onnx \
-v 0 \
-ep tensorrt \
-dlr -dnm -dgm -dhm -dhd

Dataset Preparation

uv run python 01_build_smile_parquet.py
class_composition_pie

Training Pipeline

  • Use the images located under dataset/output/002_xxxx_front_yyyyyy together with their annotations in dataset/output/002_xxxx_front.csv.

  • Every augmented image that originates from the same still_image stays in the same split to prevent leakage.

  • The training loop relies on BCEWithLogitsLoss plus class-balanced pos_weight to stabilise optimisation under class imbalance; inference produces sigmoid probabilities. Use --train_resampling weighted to switch on the previous WeightedRandomSampler behaviour, or --train_resampling balanced to physically duplicate minority classes before shuffling.

  • Training history, validation metrics, optional test predictions, checkpoints, configuration JSON, and ONNX exports are produced automatically.

  • Per-epoch checkpoints named like hsc_epoch_0001.pt are retained (latest 10), as well as the best checkpoints named hsc_best_epoch0004_f1_0.9321.pt (also latest 10).

  • The backbone can be switched with --arch_variant. Supported combinations with --head_variant are:

    --arch_variant Default (--head_variant auto) Explicitly selectable heads Remarks
    baseline avg avg, avgmax_mlp When using transformer/mlp_mixer, you need to adjust the height and width of the feature map so that they are divisible by --token_mixer_grid (if left as is, an exception will occur during ONNX conversion or inference).
    inverted_se avgmax_mlp avg, avgmax_mlp When using transformer/mlp_mixer, it is necessary to adjust --token_mixer_grid as above.
    convnext transformer avg, avgmax_mlp, transformer, mlp_mixer For both heads, the grid must be divisible by the feature map (default 3x2 fits with 30x48 input).
  • The classification head is selected with --head_variant (avg, avgmax_mlp, transformer, mlp_mixer, or auto which derives a sensible default from the backbone).

  • Pass --rgb_to_yuv_to_y to convert RGB crops to YUV, keep only the Y (luma) channel inside the network, and train a single-channel stem without modifying the dataloader.

  • Alternatively, use --rgb_to_lab or --rgb_to_luv to convert inputs to CIE Lab/Luv (3-channel) before the stem; these options are mutually exclusive with each other and with --rgb_to_yuv_to_y.

  • Mixed precision can be enabled with --use_amp when CUDA is available.

  • Resume training with --resume path/to/hsc_epoch_XXXX.pt; all optimiser/scheduler/AMP states and history are restored.

  • Loss/accuracy/F1 metrics are logged to TensorBoard under output_dir, and tqdm progress bars expose per-epoch progress for train/val/test loops.

Baseline depthwise-separable CNN:

SIZE=48x48
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant baseline \
--seed 42 \
--device auto \
--use_amp

Inverted residual + SE variant (recommended for higher capacity):

SIZE=48x48
VAR=s
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_is_${VAR}_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant inverted_se \
--head_variant avgmax_mlp \
--seed 42 \
--device auto \
--use_amp

ConvNeXt-style backbone with transformer head over pooled tokens:

SIZE=48x48
uv run python -m hsc train \
--data_root data/dataset.parquet \
--output_dir runs/hsc_convnext_${SIZE} \
--epochs 100 \
--batch_size 256 \
--train_resampling balanced \
--image_size ${SIZE} \
--base_channels 32 \
--num_blocks 4 \
--arch_variant convnext \
--head_variant transformer \
--token_mixer_grid 3x3 \
--seed 42 \
--device auto \
--use_amp
  • Outputs include the latest 10 hsc_epoch_*.pt, the latest 10 hsc_best_epochXXXX_f1_YYYY.pt (highest validation F1, or training F1 when no validation split), history.json, summary.json, optional test_predictions.csv, and train.log.
  • After every epoch a confusion matrix and ROC curve are saved under runs/hsc/diagnostics/<split>/confusion_<split>_epochXXXX.png and roc_<split>_epochXXXX.png.
  • --image_size accepts either a single integer for square crops (e.g. --image_size 48) or HEIGHTxWIDTH to resize non-square frames (e.g. --image_size 64x48).
  • Add --resume <checkpoint> to continue from an earlier epoch. Remember that --epochs indicates the desired total epoch count (e.g. resuming --epochs 40 after training to epoch 30 will run 10 additional epochs).
  • Launch TensorBoard with:
    tensorboard --logdir runs/hsc

ONNX Export

uv run python -m hsc exportonnx \
--checkpoint runs/hsc_is_s_48x48/hsc_best_epoch0049_f1_0.9939.pt \
--output hsc_s_48x48.onnx \
--opset 17
  • The saved graph exposes images as input and prob_smiling as output (batch dimension is dynamic); probabilities can be consumed directly.
  • After exporting, the tool runs onnxsim for simplification and rewrites any remaining BatchNormalization nodes into affine Mul/Add primitives. If simplification fails, a warning is emitted and the unsimplified model is preserved.

Arch

hsc_p_48x48

Ultra-lightweight classification model series

  1. VSDLM: Visual-only speech detection driven by lip movements - MIT License
  2. OCEC: Open closed eyes classification. Ultra-fast wink and blink estimation model - MIT License
  3. PGC: Ultrafast pointing gesture classification - MIT License
  4. SC: Ultrafast sitting classification - MIT License
  5. PUC: Phone Usage Classifier is a three-class image classification pipeline for understanding how people interact with smartphones - MIT License
  6. HSC: Happy smile classifier - MIT License
  7. WHC: Waving Hand Classification - MIT License
  8. UHD: Ultra-lightweight human detection - MIT License

Citation

If you find this project useful, please consider citing:

@software{hyodo2025hsc,
  author    = {Katsuya Hyodo},
  title     = {PINTO0309/HSC},
  month     = {11},
  year      = {2025},
  publisher = {Zenodo},
  doi       = {10.5281/zenodo.17670546},
  url       = {https://github.com/PINTO0309/hsc},
  abstract  = {Happy smile classifier.},
}

Acknowledgments

  • https://github.com/microsoft/FERPlus: MIT License
    @inproceedings{BarsoumICMI2016,
        title={Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution},
        author={Barsoum, Emad and Zhang, Cha and Canton Ferrer, Cristian and Zhang, Zhengyou},
        booktitle={ACM International Conference on Multimodal Interaction (ICMI)},
        year={2016}
    }
  • https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34: Apache 2.0 License
    @software{DEIMv2-Wholebody34,
      author={Katsuya Hyodo},
      title={Lightweight human detection models generated on high-quality human data sets. It can detect objects with high accuracy and speed in a total of 28 classes: body, adult, child, male, female, body_with_wheelchair, body_with_crutches, head, front, right-front, right-side, right-back, back, left-back, left-side, left-front, face, eye, nose, mouth, ear, collarbone, shoulder, solar_plexus, elbow, wrist, hand, hand_left, hand_right, abdomen, hip_joint, knee, ankle, foot.},
      url={https://github.com/PINTO0309/PINTO_model_zoo/tree/main/472_DEIMv2-Wholebody34},
      year={2025},
      month={10},
      doi={10.5281/zenodo.17625710}
    }

About

Happy smile classifier. The estimation is done on the entire head, 48x48 pixels, rather than the face.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages