Gene Association#
- bullkpy.pl.gene_association(adata, *, gene, groupby, layer='log1p_cpm', kind='violin', order=None, rotate_xticklabels=45, figsize=None, panel_size=(4.2, 3.2), show_points=True, point_size=2.0, point_alpha=0.35, palette='Set2', annotate_posthoc=True, posthoc_method='mwu', posthoc_alpha=0.05, max_brackets=6, bracket_height=0.06, save=None, show=True)[source]#
Gene vs categorical obs association plot (Scanpy-like panels).
gene can be a string or list of genes -> row of panels
violin/box + optional strip points
optional automatic pairwise post-hoc + significance brackets (BH corrected)
Returns (fig, axes).
Gene-vs-category expression plot with optional pairwise post-hoc testing.
This helper makes Scanpy-like panels showing the distribution of one gene (or multiple genes) across the categories of a categorical obs column. It supports violin or box plots, optional jittered points, and optional pairwise post-hoc tests annotated as significance brackets.
Example Gene Association plot
What it does#
1) Selects groups
Uses
adata.obs[groupby]converted to strings.Category order:
If order is None: uses the categorical order from pd.Categorical(grp).categories
Else: uses order exactly (as strings).
2) Extracts expression for each gene
For each gene panel, it calls: _get_gene_vector(adata, g, layer=layer).
Expression is then plotted per category.
3) Plots distribution per group
Using seaborn:
kind=”violin” →
sns.violinplot(..., cut=0, inner="quartile")kind=”box” →
sns.boxplot(...)
Optional points overlay:
sns.stripplot(..., jitter=0.25, color="k")
Axis formatting:
Title = gene name
y-label = “Expression” if layer is None, else the layer name
x-label removed
tick rotation controlled by rotate_xticklabels
4) Optional post-hoc pairwise tests + brackets
If annotate_posthoc=True and there are ≥2 categories:
Runs pairwise tests with BH correction:
post = pairwise_posthoc(
df, group_col="grp", value_col="y",
method=posthoc_method, correction="bh"
)
Adds significance brackets for the most significant comparisons:
_add_brackets(
ax, post,
order=cats,
alpha=posthoc_alpha,
max_brackets=max_brackets,
bracket_height=bracket_height,
)
If post-hoc annotation fails for a gene, it emits a warning and continues plotting.
Parameters#
Required#
adata
AnnData object containing expression and metadata.
gene
str or sequence of strings. If multiple genes are provided, a row of panels is created.
groupby
Categorical adata.obs key defining groups on the x-axis.
Expression source#
layer
Layer name to pull expression from. If None, _get_gene_vector should fall back to adata.X.
Plot controls#
kind
“violin” (default) or “box”.
order
Explicit category order for groupby. Useful for controlling display order.
rotate_xticklabels
Rotation (degrees) for x tick labels (default 45).
figsize
Full figure size. If None, computed as (panel_size[0] * n_genes, panel_size[1]).
panel_size
Per-panel size used when figsize=None.
palette
Seaborn palette name (default “Set2”).
Points overlay#
show_points
Add jittered points with stripplot.
point_size, point_alpha
Styling for points.
Post-hoc annotation#
annotate_posthoc
If True, compute all pairwise comparisons and add brackets.
posthoc_method
“mwu” (Mann–Whitney U, two-sided) or “ttest” (Welch t-test), passed to pairwise_posthoc.
posthoc_alpha
Significance threshold on BH-adjusted qval used for bracket display.
max_brackets
Maximum number of brackets to draw per panel (prevents clutter).
bracket_height
Vertical spacing factor for bracket stacking.
Output controls#
save
If provided, saves the figure to this path.
show
If True, calls plt.show().
Returns#
(fig, axes)
fig: Matplotlib Figure
axes: np.ndarray of Axes (even if only one gene).
Output interpretation#
Each panel shows the distribution of expression across groups.
If post-hoc is enabled, brackets indicate significant pairwise differences:
computed with pairwise_posthoc(…)
corrected with Benjamini–Hochberg (BH/FDR)
displayed for comparisons with qval <= posthoc_alpha (up to max_brackets)
Notes / tips#
Use a comparable expression layer (e.g. “log1p_cpm”) if you want interpretability across samples.
For many categories, consider setting max_brackets lower (e.g. 3–5) to keep the plot readable.
If you want a robust comparison against outliers, prefer posthoc_method=”mwu”.
Examples#
Single gene
fig, axes = bk.pl.gene_association(
adata,
gene="DLL3",
groupby="Subtype",
layer="log1p_cpm",
kind="violin",
)
Multiple genes in one row
fig, axes = bk.pl.gene_association(
adata,
gene=["ASCL1", "NEUROD1", "POU2F3", "YAP1"],
groupby="Subtype",
layer="log1p_cpm",
panel_size=(4.0, 3.2),
)
Enforce category order + Welch t-test posthoc
fig, axes = bk.pl.gene_association(
adata,
gene="SOX10",
groupby="Subtype",
order=["Luminal", "Basal", "NE-like"],
posthoc_method="ttest",
posthoc_alpha=0.01,
max_brackets=4,
)