Genes vs. mitochondrial fraction#

bullkpy.pl.genes_vs_mt_fraction(adata, *, x='pct_counts_mt', y='n_genes_detected', groupby=None, min_mt=None, max_mt=None, min_genes=None, max_genes=None, logx=False, logy=True, figsize=(5.5, 4.5), save=None, show=True)[source]#

Scatter QC: mt fraction vs detected genes (with thresholds).

Scatter plot for QC inspection of gene complexity vs mitochondrial fraction.

This plot complements library size–based QC by focusing on the relationship between detected gene count and mitochondrial fraction, which is particularly useful for identifying low-quality or degraded samples.

What it does#

Plots each sample as a point:

  • x-axis: mitochondrial fraction (default: pct_counts_mt)

  • y-axis: number of detected genes (default: n_genes_detected).

Optionally:

  • Colors samples by a categorical variable

  • Applies QC thresholds on mitochondrial fraction and/or gene count

  • Highlights QC-failing samples.

Automatically reports the number of QC failures in the title when thresholds are provided.

This view is especially useful when:

  • Samples with high mt fraction show reduced gene complexity

  • Library-size effects have already been inspected separately

Requirements#

adata.obs must contain:

  • the column specified by x (default: “pct_counts_mt”)

  • the column specified by y (default: “n_genes_detected”).

If groupby is provided, it must also exist in adata.obs.

Parameters#

Axes#

x (str, default “pct_counts_mt”).
Column in adata.obs used for the x-axis (mitochondrial fraction).

y (str, default “n_genes_detected”).
Column in adata.obs used for the y-axis (gene complexity).

Grouping#

groupby (str | None, default None). Optional categorical column in adata.obs used to color samples (e.g. batch, subtype).

QC thresholds#

Samples are considered QC-pass only if all specified conditions are met.

min_mt (float | None): Minimum allowed mitochondrial fraction (rarely used).
max_mt (float | None). Maximum allowed mitochondrial fraction (commonly used).
min_genes (float | None). Minimum allowed number of detected genes.
max_genes (float | None). Maximum allowed number of detected genes (occasionally useful to remove extreme outliers).

When any threshold is provided:

  • QC-failing samples are visually highlighted

  • The plot title includes the number of failing samples

Scaling#

logx (bool, default False). Apply log scaling to the x-axis (usually unnecessary for percentages).
logy (bool, default True). Apply log scaling to the y-axis (recommended for gene counts).

Figure and output#

figsize (tuple[float, float], default (5.5, 4.5)). Figure size in inches.
save (str | Path | None, default None). If provided, saves the figure to this path via _savefig.
show (bool, default True)
If True, calls plt.show().

Returns#

  • fig (matplotlib.figure.Figure). The created figure.

  • ax (matplotlib.axes.Axes). The scatter plot axis.

Interpretation tips#

High mt fraction + low gene count.
Strong indicator of degraded RNA or failed library prep.

Low mt fraction + high gene count.
Typically high-quality samples.

Diagonal trend.
Expected in many datasets; extreme deviations are worth inspecting.

This plot is best interpreted together with:

  • library_size_vs_genes

  • mt_fraction_vs_counts

  • qc_metrics

Examples#

Basic QC plot

bk.pl.genes_vs_mt_fraction(adata)

Apply mitochondrial and gene-count thresholds

bk.pl.genes_vs_mt_fraction(
    adata,
    max_mt=10.0,
    min_genes=12000,
)

Color by subtype

bk.pl.genes_vs_mt_fraction(
    adata,
    groupby="Subtype",
    max_mt=8.0,
)

Save figure without displaying

bk.pl.genes_vs_mt_fraction(
    adata,
    max_mt=7.5,
    save="genes_vs_mt_qc.png",
    show=False,
)