ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

Unified discrete-continuous actions for free-form drag computer use.

📑 Paper | 🌐 Project Page | 💬 X (Twitter) | 📦 Dataset

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands
Siyuan Hu†, Kevin Qinghong Lin†, Mike Zheng Shou*

Show Lab @ National University of Singapore
† Equal contribution * Corresponding author

🔥 Update

[2026.1.22] We release the ScreenDrag dataset.
[2026.1.10] Release code and project page.

Demo

showui-pi-demo.mp4

Overview

ShowUI-π is a 450M flow-based vision-language-action (VLA) model for GUI control. Given a screen observation and a natural language instruction, it generates continuous cursor trajectories — producing smooth clicks and drags directly in pixel space without tokenized coordinates.

The key insight is a unified action representation: both clicks and drags are modeled as cursor waypoint sequences paired with mouse button states (pressed/released). This allows the model to handle discrete click actions and continuous drag operations within a single framework, using flow matching to generate temporally coherent trajectories.

This design enables tasks that require fine-grained spatial control, such as freehand drawing, object rotation, drag-to-sort, slider adjustment, and captcha solving — capabilities that are difficult or impossible for conventional click-only GUI agents.

Highlights

Continuous GUI Control — Flow matching generates smooth, temporally coherent cursor trajectories in continuous pixel space, going beyond discrete click-only actions.
Unified Action Representation — Clicks and drags are both represented as cursor waypoint sequences with mouse states, eliminating the need for separate action heads.
Parameter Efficient — At 450M parameters, ShowUI-π outperforms 7B+ models on drag-based GUI tasks while remaining lightweight and efficient.
ScreenDrag Dataset — A new benchmark of 505 real-world drag tasks with over 20K trajectories across 5 application domains.

ScreenDrag Dataset

ScreenDrag is a dataset of real-world screen drag tasks collected to train and evaluate continuous GUI control. It contains 505 tasks with over 20K cursor trajectories, each annotated with full waypoint sequences and mouse button states.

The dataset covers 5 application domains:

PowerPoint — slide editing, object manipulation, shape drawing
OS / File Manager — drag-to-select, file sorting, window resizing
Adobe Premiere Pro — timeline editing, clip rearrangement
Captcha — slider and puzzle-piece drag verification
Handwriting — freehand character drawing and annotation

Getting Started

git clone https://github.com/showlab/ShowUI-Pi.git
cd ShowUI-Pi
pip install -e .

Detailed installation, training, and inference instructions are coming soon.

Acknowledgements

LeRobot (Hugging Face) — ShowUI-π builds on the LeRobot codebase for flow-based policy learning.
ShowUI — the predecessor project for vision-language GUI understanding.

🎓 BibTeX

If you find our work helpful, please kindly consider citing our paper.

@misc{showuipi,
  title={ShowUI-$\\pi$: Flow-based Generative Models as GUI Dexterous Hands},
  author={Siyuan Hu and Kevin Qinghong Lin and Mike Zheng Shou},
  year={2025},
  eprint={2512.24965},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  doi={10.48550/arXiv.2512.24965},
  url={https://arxiv.org/abs/2512.24965},
}

License

This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
benchmarks/video		benchmarks/video
docker		docker
docs		docs
examples		examples
media		media
src/lerobot		src/lerobot
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docs-requirements.txt		docs-requirements.txt
pyproject.toml		pyproject.toml
requirements-macos.txt		requirements-macos.txt
requirements-ubuntu.txt		requirements-ubuntu.txt
requirements.in		requirements.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

🔥 Update

Demo

Overview

Highlights

ScreenDrag Dataset

Getting Started

Acknowledgements

🎓 BibTeX

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ShowUI-π: Flow-based Generative Models as GUI Dexterous Hands

🔥 Update

Demo

Overview

Highlights

ScreenDrag Dataset

Getting Started

Acknowledgements

🎓 BibTeX

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages