Pairwise posthoc#
- bullkpy.tl.pairwise_posthoc(df, *, group_col='grp', value_col='y', method='mwu', correction='bh', dropna=True)[source]#
Pairwise post-hoc tests between groups.
- Parameters:
df (DataFrame with columns [group_col, value_col])
method (“mwu” (Mann-Whitney U, two-sided) or “ttest” (Welch t-test))
correction (currently only “bh” (Benjamini-Hochberg))
dropna (drop rows with NA in group/value)
- Returns:
DataFrame with columns – group1, group2, n1, n2, pval, qval, effect, delta_mean, delta_median
Pairwise post-hoc statistical tests between groups.
This function performs all pairwise comparisons between categories in a grouping column, using either a non-parametric or parametric test. It is typically used after a global association test to identify which groups differ from each other.
What it does#
Computes pairwise group comparisons for a numeric variable
Supports: • Mann–Whitney U test (default, non-parametric) • Welch’s t-test (parametric)
Reports: • p-values • BH-adjusted q-values • effect sizes • mean and median differences
Returns a tidy DataFrame, easy to plot or export
When to use#
Typical use cases include:
Post-hoc analysis after: • rank_genes_categorical • Kruskal–Wallis / ANOVA–like tests
Pairwise comparison of: • gene expression across multiple groups • signature scores • QC metrics
Parameters#
df
Input DataFrame containing:
• one column with group labels
• one column with numeric values
group_col
Column defining groups (default: “grp”)
value_col
Numeric column to test (default: “y”)
method
Statistical test:
• “mwu” – Mann–Whitney U test (two-sided, default)
• “ttest” – Welch two-sample t-test
correction
Multiple-testing correction method:
• “bh” – Benjamini–Hochberg FDR (default)
dropna
Whether to drop rows with missing group or value
Output#
Returns a DataFrame with one row per pairwise comparison and columns:
Column |
Description |
|---|---|
group1, group2 |
Compared groups |
n1, n2 |
Sample sizes |
pval |
Raw p-value |
qval |
BH-adjusted p-value |
effect |
Effect size (rank-biserial or Cohen’s d) |
delta_mean |
Mean(group1) − Mean(group2) |
delta_median |
Median(group1) − Median(group2) |
Results are sorted by increasing qval.
Effect sizes#
MWU
Rank-biserial correlation
• Range: −1 to +1
• Sign indicates direction of shift
t-test
Cohen’s d (approximate, pooled SD)
Examples#
Pairwise tests for a single gene
from bullkpy.tl import pairwise_posthoc
df = pd.DataFrame({
"grp": adata.obs["Project_ID"],
"y": adata[:, "TP53"].X.ravel(),
})
post = pairwise_posthoc(df, method="mwu")
post
Post-hoc after categorical association
res = bk.tl.rank_genes_categorical(
adata,
groupby="Subtype",
)
posthoc = {}
for g in ["TP53", "RB1"]:
posthoc[g] = bk.tl.posthoc_per_gene(
adata,
genes=[g],
groupby="Subtype",
)[g]
Notes#
• Requires ≥2 samples per group
• MWU is recommended for:
• small sample sizes
• non-Gaussian distributions
• t-test assumes approximate normality
• q-values are computed across all pairwise tests
See also#
• tl.rank_genes_categorical
• tl.posthoc_per_gene
• pl.violin
• pl.rankplot