Posthoc per gene#

bullkpy.tl.posthoc_per_gene(adata, *, gene, groupby, layer='log1p_cpm', method='mwu', adjust='bh', min_n=2)[source]#

Pairwise post-hoc comparisons for ONE gene across all categories.

Use AFTER selecting a gene of interest.

Returns:: DataFrame with columns – [‘gene’,’group1’,’group2’,’effect’,’pval’,’qval’]

Run pairwise posthoc tests for selected genes across all categories of a grouping variable.

This function performs gene-wise pairwise comparisons between all levels of a categorical variable in adata.obs, returning a separate result table per gene.

It is typically used after a global categorical association test (e.g. Kruskal–Wallis) to identify which specific group pairs differ.

What it does#

For each gene in genes, the function:
1. Extracts expression values (from layer or adata.X) 2. Groups samples by groupby 3. Performs all pairwise group comparisons 4. Returns results as a dictionary:

{
    "GENE1": DataFrame,
    "GENE2": DataFrame,
    ...
}

Each DataFrame contains pairwise statistics between categories.

Statistical methods#

Supported pairwise tests:

Method	Test	Notes
“mwu” (default)	Mann–Whitney U	Non-parametric, robust
“ttest”	Welch’s t-test	Assumes approximate normality

The actual pairwise testing is delegated to:

bk.tl.pairwise_posthoc(df, method=...)

Returned format#

Each value in the returned dictionary is a DataFrame with pairwise results.

Typical columns include (depending on pairwise_posthoc implementation):

Column	Description
group1	First group
group2	Second group
pval	Raw p-value
qval	FDR-corrected p-value
effect_size	Pairwise effect size
mean_1	Mean expression in group1
mean_2	Mean expression in group2

Parameters#

Gene selection#

genes List of gene names to test.
All genes must be present in adata.var_names.

Group definition#

groupby
Categorical column in adata.obs defining groups.

Expression source#

layer Expression layer to use (e.g. “log1p_cpm”).
If None, uses adata.X.

Statistical test#

method Pairwise test to apply: • “mwu” (default) • “ttest”

Examples#

Pairwise testing after global association#

# Global test
res = bk.tl.rank_genes_categorical(
    adata,
    groupby="Subtype",
    group="Basal",
)

# Posthoc comparisons for selected genes
posthoc = bk.tl.posthoc_per_gene(
    adata,
    genes=["TP53", "E2F1", "MYC"],
    groupby="Subtype",
)

Inspect pairwise results for one gene

posthoc["TP53"]

Use t-test instead of MWU

posthoc = bk.tl.posthoc_per_gene(
    adata,
    genes=["CDKN2A"],
    groupby="Project_ID",
    method="ttest",
)

Typical workflow#

1. Global test.
rank_genes_categorical or cat_cat_association

2. Select genes of interest.
Based on q-value and effect size

3. Posthoc testing.
posthoc_per_gene

4. Visualization • boxplots / violins • effect size heatmaps • pairwise significance tables

Notes#

•	This function does not apply multiple-testing correction across genes

(only within each gene’s pairwise comparisons). • For many genes × many groups, computation can be expensive. • Non-parametric tests are recommended for heterogeneous bulk cohorts.