Rank genes groups dotplot#
- bullkpy.pl.rank_genes_groups_dotplot(adata, *, groupby, groups=None, key='rank_genes_groups', n_genes=5, sort_by='scores', unique=True, use_abs=False, values_to_plot='expression', layer='log1p_cpm', fraction_layer='counts', expr_threshold=0.0, min_in_group_fraction=None, max_in_group_fraction=None, standard_scale='auto', swap_axes=True, dendrogram_top=True, dendrogram_rows=False, row_dendrogram_position='right', row_spacing=0.75, cmap='Reds', save=None, show=True)[source]#
Scanpy-like dotplot of ranked genes per group.
- Enhancements:
values_to_plot: - “expression” -> dot color = mean expression per group (your dotplot default) - “logfoldchanges” -> dot color = log2FC from rank_genes_groups
min/max fraction filters (computed from fraction_layer + expr_threshold)
standard_scale=”auto” -> uses “var” for expression mode (Scanpy-like)
Dotplot for visualizing top ranked genes per group from
adata.uns[key] (typically created by rank_genes_groups). This is a convenience
wrapper that:
selects the top
n_genesper group from the ranking table,optionally filters genes by within-group detection fraction, and
calls
dotplot()to render a multi-group gene summary.
It supports two color modes:
expression (mean expression per group; classic Scanpy-style dotplot).
logfoldchanges (dot color encodes DE log2FC instead of expression).
What it does.#
Read ranking results. Loads a long-form table via:
ank_genes_groups_df_all(adata, key=key, groups=groups, sort_by=sort_by).
Expected columns include at least:
group, gene, scores, log2FC (and often pval, qval).
Rank genes per group.
Default ranking uses descending scores.
If use_abs=True, ranks by abs(scores).
Optional fraction-based gene filtering.
If min_in_group_fraction and/or max_in_group_fraction are set:
It computes, for candidate genes, the fraction of samples within each group with expression > expr_threshold.
Fraction is computed from: fraction_layer if present in adata.layers, else falls back to layer or adata.X.
Rows failing the fraction constraints are removed before selecting top genes.
This helps remove:
genes expressed in too few samples (dropouts / unstable markers)
genes expressed in almost all samples (uninformative housekeeping-like).
Select top genes per group.
Keeps the first n_genes per group after sorting (and filtering).
If unique=True, the same gene will not be reused across multiple groups (first group “claims” it).
Render a dotplot (two modes).
A) values_to_plot=”expression” (default): Calls dotplot() on the original adata:
Dot color = mean expression per group (from layer)
Dot size = fraction of samples expressing the gene (from fraction_layer + expr_threshold)
standard_scale=”auto” maps to “var” (Scanpy-like) for expression mode.
B) values_to_plot=”logfoldchanges”: Constructs a temporary AnnData where:
ad_tmp.X[i, gene] = log2FC(group_of_sample_i, gene) (constant within each group).
Then calls dotplot() so that:
Dot color behaves like “mean value per group” → equals log2FC
Dot size still uses detection fraction from the original counts/frac layer.
This keeps the size meaningful (detection) while coloring by effect size.
Parameters#
Inputs / selection#
groupby (str, required).
adata.obs[groupby] defines the groups (clusters / conditions) used for plotting
and for matching log2FC values.
groups (Sequence[str] | None). Subset of groups to include. None uses what’s in the ranking table.
key (str, default “rank_genes_groups”).
Location of ranking results in adata.uns[key].
n_genes (int, default 5).
Number of genes selected per group.
sort_by (“scores” | “logfoldchanges” | “pvals_adj” | “pvals”).
Passed through to rank_genes_groups_df_all(…) for initial ordering.
unique (bool, default True). If True, prevents the same gene from appearing in multiple group panels.
use_abs (bool, default False).
If True, ranks by abs(scores) rather than signed scores.
Color mode#
values_to_plot (“expression” | “logfoldchanges”, default “expression”). Controls what drives dot color:
“expression”: mean expression per group
“logfoldchanges”: log2FC from ranking results
Expression / fraction settings (used by dotplot)#
layer (str | None, default “log1p_cpm”).
Expression layer used for mean expression when values_to_plot=”expression”.
fraction_layer (str | None, default “counts”).
Layer used to compute detection fraction (dot size).
Falls back to layer or adata.X if missing.
expr_threshold (float, default 0.0). A sample “expresses” the gene if value > expr_threshold in fraction_layer (or fallback).
Fraction filters (gene selection)#
min_in_group_fraction (float | None). Keep genes expressed in at least this fraction of samples within the group.
max_in_group_fraction (float | None). Keep genes expressed in at most this fraction of samples within the group.
If filtering removes all candidates, a ValueError is raised advising to relax thresholds.
Scaling#
standard_scale (“auto” | “var” | “group” | None, default “auto”). Passed to dotplot() as standard_scale:
“auto” → “var” when plotting expression, else None for logFC mode
“var” → z-score per gene across groups
“group”→ z-score per group across genes
None → no scaling
Layout (forwarded to dotplot)#
swap_axes (bool, default True). Scanpy-like orientation: genes on y-axis, groups on x-axis.
dendrogram_top (bool, default True). Cluster columns (typically groups) and show top dendrogram.
dendrogram_rows (bool, default False).
Cluster rows (typically genes) and show row dendrogram.
row_dendrogram_position (“right” | “left” | “outer_left”).
Placement for row dendrogram if enabled.
row_spacing (float, default 0.75).
Compresses/expands vertical spacing between rows (useful for long gene lists).
cmap (str, default “Reds”).
Colormap for dot color (expression or log2FC).
Output#
save (str | Path | None). Save the plot via _savefig.
show (bool, default True).
Whether to display the figure.
Returns#
Returns whatever dotplot() returns: (fig, ax). (Exact ax type depends on your dotplot implementation.).
Notes#
Requires matching group labels: logFC mode assumes the group labels in the ranking table correspond to adata.obs[groupby] categories.
fraction_layer should be comparable across samples (counts or normalized counts). If using raw counts, expr_threshold=0 is typical.
In logfoldchanges mode, genes without a stored log2FC for a given group are assigned 0.0 for that group (so they appear neutral).
Examples#
Classic Scanpy-style: expression colored, fraction sized
bk.pl.rank_genes_groups_dotplot(
adata,
groupby="leiden",
n_genes=5,
)
Color by log2FC instead of expression
bk.pl.rank_genes_groups_dotplot(
adata,
groupby="Subtype",
n_genes=8,
values_to_plot="logfoldchanges",
cmap="RdBu_r",
)
Filter to robust markers (expressed in ≥20% of samples in-group)
bk.pl.rank_genes_groups_dotplot(
adata,
groupby="leiden",
n_genes=6,
min_in_group_fraction=0.20,
fraction_layer="counts",
expr_threshold=0.0,
)
Remove “too ubiquitous” genes (expressed in >95% within group)
bk.pl.rank_genes_groups_dotplot(
adata,
groupby="leiden",
n_genes=6,
min_in_group_fraction=0.10,
max_in_group_fraction=0.95,
)