pyPAGE is a Python implementation of the conditional-information PAGE framework for gene-set enrichment analysis.
It is designed to infer differential activity of pathways and regulons while accounting for annotation and membership biases using information-theoretic methods.
Standard gene-set enrichment methods test whether pathway members are non-randomly distributed across a ranked gene list. pyPAGE frames this as an information-theoretic question: how much does knowing a gene's pathway membership tell you about its expression bin?
- Discretize continuous expression scores (e.g. log2 fold-change) into equal-frequency bins
- Compute mutual information (MI) between expression bins and pathway membership — or conditional MI (CMI), which conditions on how many pathways each gene belongs to, correcting for the bias that heavily-annotated genes drive spurious enrichment
- Permutation test to assess significance, with early stopping
- Redundancy filtering removes pathways whose signal is explained by an already-accepted pathway (via CMI between memberships)
- Hypergeometric enrichment per bin produces the iPAGE-style heatmap showing which expression bins drive each pathway's signal
For single-cell data, the question becomes: are pathway scores spatially coherent across the cell manifold? A pathway whose activity varies smoothly across cell states (rather than randomly) is biologically meaningful.
- Per-cell scoring — for each cell, compute MI or CMI between gene expression bins and pathway membership across all genes. This produces an (n_cells x n_pathways) score matrix
- KNN graph — build a cell-cell k-nearest-neighbor graph from expression (or use a precomputed one from scanpy)
- Geary's C — measure spatial autocorrelation of each pathway's scores on the KNN graph. Report C' = 1 - C, where higher values mean the pathway varies coherently across the manifold rather than randomly
- Permutation test — generate size-matched random gene sets, compute their C', and derive empirical p-values with BH FDR correction
Install from PyPI:
pip install bio-pypageOr install from source:
git clone https://github.com/goodarzilab/pyPAGE
cd pyPAGE
pip install -e .import pandas as pd
from pypage import PAGE, ExpressionProfile, GeneSets
# 1) Load expression profile (gene, score)
expr = pd.read_csv(
"example_data/AP2S1.tab.gz",
sep="\t",
header=None,
names=["gene", "score"],
)
exp = ExpressionProfile(expr["gene"], expr["score"], is_bin=True)
# 2) Load annotation (gene, pathway)
ann = pd.read_csv(
"example_data/GO_BP_2021_index.txt.gz",
sep="\t",
header=None,
names=["gene", "pathway"],
)
gs = GeneSets(ann["gene"], ann["pathway"])
# 3) Run pyPAGE
p = PAGE(exp, gs, n_shuffle=100, k=7, filter_redundant=True)
results, heatmap = p.run()
print(results.head())
heatmap.show()results contains:
pathwayCMI— conditional mutual information scorez-score— z-score of observed CMI vs. permutation null distributionp-value— empirical p-value from permutation testRegulation pattern(1for up,-1for down)
Use these canonical examples with the bundled example_data/ outputs.
pypage -e example_data/test_DESeq_logFC.txt \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--type continuous --n-bins 9 \
--cols GENE,log2FoldChange \
--seed 42 \
--outdir example_data/test_DESeq_logFC_cont_PAGEpypage -e example_data/test_DESeq_logFC.txt \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--type discrete \
--cols GENE,log2FoldChange_bin9 \
--seed 42 \
--outdir example_data/test_DESeq_logFC_disc_PAGEpypage-sc --adata example_data/CRC.h5ad \
--gene-column gene \
--gmt example_data/c2.all.v2026.1.Hs.symbols.gmt \
--groupby PhenoGraph_clusters --n-jobs 0 --fast-modeBulk continuous (example_data/test_DESeq_logFC_cont_PAGE/):
Bulk discrete (example_data/test_DESeq_logFC_disc_PAGE/):
Single-cell (example_data/CRC_scPAGE/):
- Interactive SC report
- Consistency ranking PDF
- Example UMAP pathway PDF
- Example group-enrichment PDF
- Group-enrichment stats TSV
Bulk continuous heatmap (PDF | HTML):
Bulk discrete heatmap (PDF | HTML):
Single-cell ranking (PDF | Interactive ranking | SC report):
Single-cell UMAP pathway example (PDF):
Single-cell group-enrichment example (PDF | Stats TSV):
The detailed user and API documentation now lives in MANUAL.md.
Updated notebooks:
- Comprehensive Tutorial
- Bulk PAGE Tutorial
- Single-Cell PAGE Tutorial (CRC)
- Single-Cell PAGE Tutorial (Synthetic)
Bakulin A, Teyssier NB, Kampmann M, Khoroshkin M, Goodarzi H (2024) pyPAGE: A framework for Addressing biases in gene-set enrichment analysis—A case study on Alzheimer's disease. PLoS Computational Biology 20(9): e1012346. https://doi.org/10.1371/journal.pcbi.1012346
MIT
pyPAGE was developed in the Goodarzi Lab at UCSF by Artemy Bakulin, Noam B. Teyssier, and Hani Goodarzi.