Correlation heatmap#

bullkpy.pl.corr_heatmap(adata, *, layer='log1p_cpm', method='pearson', use='samples', groupby=None, groups=None, col_colors=None, cmap='vlag', center=0.0, vmin=None, vmax=None, figsize=None, show_labels=False, dendrogram=True, add_col_color_legend=True, legend_title=None, legend_fontsize=8, legend_title_fontsize=9, legend_max_cols=1, remove_cbar_label=True, dendrogram_gap=0.004, right_margin=0.82, save=None, show=True)[source]#

Correlation heatmap for sample QC (or gene-gene if use=”genes”).

Improvements:

removes vertical colorbar label (optional)
pulls left dendrogram closer to heatmap
adds legend for col_colors categories on the right
fixes cg undefined for dendrogram=False branch

Correlation heatmap for sample–sample similarity (QC) or gene–gene similarity.

This function computes a correlation matrix from an expression matrix (typically a log-normalized layer) and visualizes it as a heatmap. When use="samples" (default), it is a convenient QC plot to detect:

batch structure
outlier samples
mislabeled groups
strong technical gradients

When use="genes", it produces a gene–gene correlation heatmap (useful for small gene panels).

What it does#

Extracts a matrix from adata.layers[layer] (or adata.X if layer=None)
Computes a correlation matrix using Pearson or Spearman
Optionally subsets and orders samples using groupby / groups
Optionally annotates columns/rows with color bars from one or more obs keys (col_colors)
Plots either: – a clustered heatmap (seaborn clustermap) if dendrogram=True – a standard heatmap if dendrogram=False.

Dependency: requires seaborn for plotting.

Parameters#

Core#

adata
AnnData object containing expression matrix and annotations.

layer (default: “log1p_cpm”)
Layer to use for correlations.
Use log-scale normalized expression for best behavior.

method (default: “pearson”)
Correlation metric:

“pearson” – linear correlation (fast)
“spearman” – rank correlation (more robust to non-linearity / outliers)

use (default: “samples”)
Which correlation matrix to compute:

“samples”: sample × sample correlation (QC use-case)
“genes”: gene × gene correlation (for smaller panels).

Subsetting / ordering (samples only)#

groupby
Categorical obs key used to subset and order samples.

groups
Optional list of groups to keep (and define ordering).
If provided, only those groups are used and ordered as given.

Annotations (samples only)#

col_colors
One or more obs keys used to draw color annotations aligned to samples.
Values are converted to categorical strings and mapped to colors automatically.
Examples:

col_colors=”batch”
col_colors=[“batch”, “sex”, “tumor_type”]

Plot appearance#

cmap, center, vmin, vmax
Control heatmap colormap and scaling.

figsize
Figure size in inches. If None, chosen automatically based on matrix size.

show_labels
Whether to show row/column labels (off by default for readability).

dendrogram (default: True)
If True, performs hierarchical clustering and shows dendrograms.

save
Path to save the plot.

show
Whether to display the plot.

Returns#

When dendrogram=True (default):

cg — seaborn ClusterGrid object (contains cg.fig, cg.ax_heatmap, etc.)

When dendrogram=False:

(fig, ax) — matplotlib Figure and Axes

Tip: if you want a consistent return type, keep dendrogram=True.

Examples#

Sample–sample QC correlation heatmap (default)

bk.pl.corr_heatmap(adata)

Spearman correlation (robust to outliers)

bk.pl.corr_heatmap(adata, method="spearman")

Subset/order samples by group and annotate batch

bk.pl.corr_heatmap(
    adata,
    groupby="condition",
    groups=["control", "treated"],
    col_colors="batch",
    show_labels=False,
)

Multiple annotation bars (batch + clinical)

bk.pl.corr_heatmap(
    adata,
    col_colors=["batch", "sex", "tumor_stage"],
)

Gene–gene correlation heatmap (small gene panel)

bk.pl.corr_heatmap(
    adata,
    use="genes",
    method="pearson",
    show_labels=True,
)

For use=”genes”, consider filtering adata beforehand to a manageable number of genes.

Disable clustering (plain heatmap)

bk.pl.corr_heatmap(
    adata,
    dendrogram=False,
    show_labels=False,
)

Notes / recommendations#

Use log-scale expression (e.g. log1p_cpm) for meaningful correlations.
Spearman can be preferable when expression distributions are heavy-tailed or have outliers.
For large datasets, correlation matrices can become visually dense; consider: • subsetting to relevant groups • turning labels off (show_labels=False) • using groupby to enforce interpretable ordering • reducing to representative samples (or averaging within groups)