Sample correlation clustergram#
- bullkpy.pl.sample_correlation_clustergram(adata, *, layer='log1p_cpm', method='pearson', linkage_method='average', col_colors=None, palette='tab20', figsize=None, show_labels=False, save=None, show=True)[source]#
Sample correlation clustergram (more interpretable than raw distances for bulk QC). Displays correlation, clusters by (1 - correlation).
Heatmap values: correlation in [-1, 1] Clustering: on distance = 1 - correlation
Sample–sample correlation clustergram for bulk expression QC and exploratory analysis.
This plot is often more interpretable than raw distance heatmaps because:
the heatmap shows correlations directly (range
[-1, 1]), andhierarchical clustering is performed on distance =
1 − correlation.
What it does.#
Extracts a sample × gene matrix
X = _get_matrix(adata, layer=layer, use="samples")
Optional Spearman transformation.
If method=”spearman”:
Each gene is ranked across samples.
Pearson correlation is then computed on the ranked data.
This matches the standard definition of Spearman correlation.
Computes the sample–sample correlation matrix
C = np.corrcoef(X) # shape: (n_samples, n_samples)
Values range from -1 (anti-correlated) to +1 (perfectly correlated).
Builds a clustering distance
distance = 1.0 - correlation
Converted to condensed form with squareform.
Hierarchical clustering is performed with scipy.cluster.hierarchy.linkage.
Plots a seaborn clustergram.
Heatmap values = correlation
Color map = “vlag” (diverging, centered at 0)
Dendrograms reflect clustering on 1 − correlation.
Optionally annotates samples with metadata.
col_colors adds color bars for adata.obs columns.
Legends are drawn manually to the right of the plot.
Parameters#
Core computation#
adata (AnnData): Input object.
layer (str | None, default “log1p_cpm”): Expression layer used for correlations. Passed to _get_matrix.
method (“pearson” | “spearman”, default “pearson”).
Correlation type:
“pearson”: linear correlation.
“spearman”: rank-based correlation (robust to outliers, monotonic trends).
linkage_method (str, default “average”): Linkage method used for hierarchical clustering.
Common choices: “average”, “complete”, “single”.
Metadata annotations#
col_colors (Sequence[str] | None).
List of adata.obs columns to show as color annotations above the heatmap.
Only categorical metadata are meaningful here.
palette (str, default “tab20”): Palette used for mapping metadata categories to colors.
Display#
figsize ((w, h) | None): If None, auto-sized based on number of samples:
w = max(6.0, min(16.0, 0.18 * n_samples + 4.0))
figsize = (w, w)
show_labels (bool, default False). Whether to show sample names on axes.
Output#
save (str | Path | None): If provided, saves the figure using _savefig.
**show (bool, default True). Whether to display the figure with plt.show().
Returns#
cg: seaborn.matrix.ClusterGrid.
Main heatmap axis: cg.ax_heatmap
Figure: cg.fig
Requirements#
seaborn
scipy (pdist, squareform, linkage)
Raises ImportError if dependencies are missing.
Interpretation guide#
Red (positive) values → samples with similar expression profiles.
Blue (negative) values → anti-correlated samples.
Block-diagonal structure → coherent sample groups (often biological subtypes).
Mixed blocks or striping → batch effects or gradual expression gradients.
Best practices#
For bulk RNA-seq QC, this plot is often preferable to distance heatmaps.
Use:
method=”pearson” for general similarity.
method=”spearman” when outliers or non-linear monotonic trends are suspected.
Combine with metadata annotations:
col_colors=["Subtype", "Batch"]
to visually assess confounding effects.
Examples#
Pearson correlation QC
bk.pl.sample_correlation_clustergram(
adata,
layer="log1p_cpm",
method="pearson",
col_colors=["Subtype", "Batch"],
)
Spearman correlation (robust)
bk.pl.sample_correlation_clustergram(
adata,
layer="log1p_cpm",
method="spearman",
col_colors=["Patient"],
)
Compact overview (no labels)
bk.pl.sample_correlation_clustergram(
adata,
show_labels=False,
)