Skip to content

JD2112/gsalit

Repository files navigation

gsalit - A Streamlit App for Global Screening Analysis

Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Anaconda-Server Badge Conda Build and Publish DOI

gsalit is a user-friendly Streamlit application that allows researchers to run the Illumina GSA (Global Screening Array) genotyping pipeline without manually handling command-line tools. A second version, built with Python and Conda, and available via Bioconda.

The app automates:

  • IDAT file management
  • Reference genome selection (hg19 or hg38)
  • Manifest and cluster file usage
  • Running the IAAP-CLI pipeline
  • Output collection and download

All required tools, references, and genome indices are pre-packaged in the Docker image, so users do not need to install anything else.

Web access

The full functional app is available at https://gsalit.serve.scilifelab.se/

Installation

Using Docker

🔹 System Requirements

  • Docker installed on your machine
  • Minimum 8 GB RAM (larger datasets may require more)
  • Recommended: multi-core CPU for faster processing

A Docker image is also available at jd21/genlit. You can run the application using Docker with the following command:

docker run -p 8501:8501 jd21/genlit:0.2.0

Then, open your web browser and navigate to http://localhost:8501 to access the Streamlit GUI.

Using Conda

You can install gsalit using Conda. Make sure to include the jd2112 channel when creating the environment:

conda create -n gsalit -c jd2112 -c conda-forge -c bioconda 
conda activate gsalit
gsa-gui

This will create a new Conda environment named gsalit, activate it, and launch the Streamlit GUI.

Note For the first time, the conda package will download the reference genomes (hg19 and hg38) and make indexes (.bwt), which may take 1-2 hours.

Usage

Step 1: Upload IDAT Files

  • Use the Upload IDAT files button in the sidebar.
  • You can upload multiple .idat files at once.
  • The app will store them temporarily for processing.

Step 2: Select Genome Build

  • Choose either hg19 or hg38 from the dropdown.
  • The app will automatically select the manifest, cluster, and reference genome for the chosen build.

Step 3: Run Pipeline

  • Click the Run Pipeline button.
  • The app will execute the GSA pipeline in a temporary workspace.
  • Real-time logs will appear in the main panel.

Step 4: Monitor Progress

  • Logs are streamed in real-time.
  • Errors or warnings will appear immediately.
  • Upon successful completion, a success message will appear.

Results

  • All outputs are saved in /app/results/run_<timestamp>/.
  • Certain intermediate files like .bpm, .csv, .egt, and IDAT directories are excluded from the results folder.
  • A results.zip file is automatically created for download.

NOTE Optional Preview The app will display the last 10 lines of the VCF header if the VCF was generated successfully.

Downloads / Tools & Reference Resources

Resource Description Link / Install
htslib High-throughput sequencing library htslib 1.22.1
bcftools Variant calling and manipulation tools wget http://github.com/samtools/bcftools/releases/download/1.20/bcftools-1.20.tar.bz2
samtools Utilities for manipulating SAM/BAM files sudo apt install samtools
gtc2vcf + affy2vcf Convert IDAT/GTc to VCF wget -P plugins http://raw.githubusercontent.com/freeseek/gtc2vcf/master/{idat2gtc.c,gtc2vcf.{c,h},affy2vcf.c,BAFregress.c}
IAAP-CLI Illumina Array Analysis Platform Genotyping CLI IAAP-CLI Manual
APT Affymetrix Analysis Power Tools APT Manual
bwa Burrows-Wheeler Aligner for sequence alignment bwa-0.7.17
plink2 Whole-genome association analysis toolset plink2 Linux x86_64
Illumina GSA Manifest Files BPM / CSV files for array annotation GSA Manifest & Cluster Downloads
Illumina GSA Cluster Files EGT files for genotype clustering GSA Cluster Files
UCSC Reference Genome hg19 Human genome build 19 hg19.fa
UCSC Reference Genome hg38 Human genome build 38 hg38.fa

Troubleshooting

If you encounter issues during installation or usage, please ensure that you are using the correct channels and that your Conda environment is properly set up. You can also try using mamba for a more reliable installation process.

Credits

Developed by Jyotirmoy Das. Application deployment on SciLifeLab server managed by in collaboration with Hamza Imran

License

This project is licensed under the MIT License. See the LICENSE file for details

Contributing

  • Contributions are welcome! Please fork the repository and submit a pull request with your changes.
  • Feel free to open issues for any bugs or feature requests.
  • Please make sure to follow the existing code style and include tests for any new features.

Citation

Das, J. (2025). gsalit (1.0.1). Zenodo. https://doi.org/10.5281/zenodo.17671007

Acknowledgements

Special thanks to SciLifeLab Data Center (specifically Hamza) for helping with the app settings on SciLifeLab serve server.

Thanks to the open-source community for the tools and libraries that made this project possible, including Streamlit, Bioconda, Docker, gtc2vcf, and agaat.

About

A Streamlit application for global screening analysis, built with Python.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors