Pairwise posthoc#

bullkpy.tl.pairwise_posthoc(df, *, group_col='grp', value_col='y', method='mwu', correction='bh', dropna=True)[source]#

Pairwise post-hoc tests between groups.

Parameters:

df (DataFrame with columns [group_col, value_col])
method (“mwu” (Mann-Whitney U, two-sided) or “ttest” (Welch t-test))
correction (currently only “bh” (Benjamini-Hochberg))
dropna (drop rows with NA in group/value)

Returns:

DataFrame with columns – group1, group2, n1, n2, pval, qval, effect, delta_mean, delta_median

Pairwise post-hoc statistical tests between groups.

This function performs all pairwise comparisons between categories in a grouping column, using either a non-parametric or parametric test. It is typically used after a global association test to identify which groups differ from each other.

What it does#

Computes pairwise group comparisons for a numeric variable
Supports: • Mann–Whitney U test (default, non-parametric) • Welch’s t-test (parametric)
Reports: • p-values • BH-adjusted q-values • effect sizes • mean and median differences
Returns a tidy DataFrame, easy to plot or export

When to use#

Typical use cases include:

Post-hoc analysis after: • rank_genes_categorical • Kruskal–Wallis / ANOVA–like tests
Pairwise comparison of: • gene expression across multiple groups • signature scores • QC metrics

Parameters#

df
Input DataFrame containing: • one column with group labels • one column with numeric values

group_col
Column defining groups (default: “grp”)

value_col
Numeric column to test (default: “y”)

method
Statistical test: • “mwu” – Mann–Whitney U test (two-sided, default) • “ttest” – Welch two-sample t-test

correction
Multiple-testing correction method: • “bh” – Benjamini–Hochberg FDR (default)

dropna
Whether to drop rows with missing group or value

Output#

Returns a DataFrame with one row per pairwise comparison and columns:

Column	Description
group1, group2	Compared groups
n1, n2	Sample sizes
pval	Raw p-value
qval	BH-adjusted p-value
effect	Effect size (rank-biserial or Cohen’s d)
delta_mean	Mean(group1) − Mean(group2)
delta_median	Median(group1) − Median(group2)

Results are sorted by increasing qval.

Effect sizes#

MWU
Rank-biserial correlation • Range: −1 to +1 • Sign indicates direction of shift

t-test
Cohen’s d (approximate, pooled SD)

Examples#

Pairwise tests for a single gene

from bullkpy.tl import pairwise_posthoc

df = pd.DataFrame({
    "grp": adata.obs["Project_ID"],
    "y": adata[:, "TP53"].X.ravel(),
})

post = pairwise_posthoc(df, method="mwu")
post

Post-hoc after categorical association

res = bk.tl.rank_genes_categorical(
    adata,
    groupby="Subtype",
)

posthoc = {}
for g in ["TP53", "RB1"]:
    posthoc[g] = bk.tl.posthoc_per_gene(
        adata,
        genes=[g],
        groupby="Subtype",
    )[g]

Notes#

•	Requires ≥2 samples per group
•	MWU is recommended for:
•	small sample sizes
•	non-Gaussian distributions
•	t-test assumes approximate normality
•	q-values are computed across all pairwise tests