QC metrics#

bullkpy.pl.qc_metrics(adata, *, color='pct_counts_mt', vars_to_plot=('total_counts', 'n_genes_detected', 'pct_counts_mt', 'pct_counts_ribo'), log1p_total_counts=True, log1p_n_genes=False, point_size=20.0, alpha=0.8, figsize=(10, 7), save=None, show=True)[source]#

Plot bulk RNA-seq QC metrics (robust to missing columns).

  • If total_counts + n_genes_detected exist: scatter (library size vs detected genes)

  • Otherwise: skip scatter and show histograms only

Plot a compact set of bulk RNA-seq QC diagnostics from columns in adata.obs. The function is robust to missing QC columns: it will plot what is available and warn about missing variables.

QC metrics

Example QC metrics plot

What it does.#

Depending on what exists in adata.obs, this function produces:

  1. Scatter plot (if possible): library size vs detected genes.

  • x: total_counts (optionally log1p)

  • y: n_genes_detected (optionally log1p)

  • optional point coloring by an obs column (color=).

** 2. Histograms (always, if variables exist): Up to four QC distributions from vars_to_plot:

  • histogram of the 1st variable (vars_use[0])

  • histogram of the 2nd variable (vars_use[1])

  • optional overlay histogram of the 3rd variable on the first histogram (twin y-axis)

  • optional overlay histogram of the 4th variable on the second histogram (twin y-axis).

If either total_counts or n_genes_detected is missing, the scatter is skipped and only histograms are shown.

Requirements#

adata.obs must contain at least one of the entries in vars_to_plot.

For the scatter panel, adata.obs must include both:

  • total_counts

  • n_genes_detected.

If none of vars_to_plot exists, the function raises a KeyError with a suggestion to run a QC computation step first (e.g., bk.pp.qc_metrics).

Parameters#

Core inputs#

adata (anndata.AnnData). Annotated data matrix with QC metrics stored in adata.obs.

Plot selection#

vars_to_plot (Sequence[str]). Ordered list of QC columns to try plotting. The function will use only those that actually exist in adata.obs, in the same order.

Coloring#

color (str | None, default “pct_counts_mt”). Optional adata.obs column used to color points in the scatter plot.

  • If missing, coloring is disabled with a warning.

  • If categorical/object, values are converted to category codes (numeric coloring).

Scatter transforms#

log1p_total_counts (bool, default True)
If True, uses log1p(total_counts) on the x-axis of the scatter.

log1p_n_genes (bool, default False).
If True, uses log1p(n_genes_detected) on the y-axis of the scatter.

Styling#

point_size (float, default 20.0).
Marker size for scatter points.

alpha (float, default 0.8).
Transparency for scatter points and histograms.

figsize (tuple[float, float], default (10, 7)).
Overall figure size in inches.

Output#

save (str | Path | None, default None).
If provided, saves the figure to this path via _savefig.

show (bool, default True).
If True, calls plt.show().

Layout behavior#

If scatter is available (total_counts and n_genes_detected exist)
A 2×2 grid where:

  • Left column: one large scatter axis spanning both rows

  • Right column: two histogram axes stacked vertically.

If scatter is not available.
A 2×2 grid where:

  • Top row: histogram axis spanning both columns

  • Bottom row: histogram axis spanning both columns. A warning is emitted indicating the scatter was skipped.

Returns#

  • fig (matplotlib.figure.Figure).

  • axes (np.ndarray of matplotlib.axes.Axes). Array containing the axes that were created:

  • [ax_scatter, ax_h1, ax_h2] when scatter exists

  • [ax_h1, ax_h2] when scatter is skipped (the function filters out None axes before returning)

Notes and tips#

This function is designed for quick QC inspection, not filtering.
Use the scatter + histograms to choose thresholds (e.g., minimum library size).

Typical QC columns in adata.obs for bulk include:

  • total_counts

  • n_genes_detected

  • pct_counts_mt

  • pct_counts_ribo

Examples#

Basic usage

bk.pl.qc_metrics(adata)

Color by subtype (if present) and disable log on genes

bk.pl.qc_metrics(
    adata,
    color="Subtype",
    log1p_n_genes=False,
)

Plot a custom set of QC variables

bk.pl.qc_metrics(
    adata,
    vars_to_plot=("total_counts", "pct_counts_mt", "pct_counts_ribo"),
)

Save to file

bk.pl.qc_metrics(adata, save="qc_metrics.png", show=False)