Leiden resolution scan#
- bullkpy.tl.leiden_resolution_scan(adata, *, true_key, resolutions=None, base_key='leiden', store_key='leiden_scan', use_rep='X_pca', n_pcs=20, n_neighbors=15, metric='euclidean', recompute_neighbors=False, inplace=True, random_state=0)[source]#
Scan multiple Leiden resolutions and score vs a ground-truth annotation. Scan many Leiden resolutions and score against a ‘true’ categorical label using ARI / NMI / Cramér’s V.
Notes
Uses bk.tl.neighbors() to build adata.obsp[‘connectivities’] if missing.
Uses bk.tl.cluster(method=’leiden’) to compute clusters for each resolution.
Computes ARI, NMI (if sklearn available) + Cramér’s V (always).
Scan Leiden clustering resolutions and score against a ground-truth annotation.
This utility evaluates multiple Leiden resolutions and quantifies how well the resulting clusters match a known categorical label (e.g. cell type, subtype, batch) using external validation metrics.
What it does#
For each Leiden resolution, the function:
Computes (or reuses) a kNN graph.
Runs Leiden clustering.
Compares cluster labels to a reference annotation.
Computes agreement metrics:
• ARI (Adjusted Rand Index) • NMI (Normalized Mutual Information) • Cramér’s V (always available).Returns a tidy table summarizing clustering performance.
Optionally stores results in adata.uns.
When to use#
Use leiden_resolution_scan when you want to: • Choose an optimal Leiden resolution • Benchmark clustering against known labels • Detect over- or under-clustering • Quantify clustering robustness across resolutions
Typical use cases: • Validate clustering against known subtypes • Tune resolution before downstream analysis • Compare embeddings or representations
Parameters#
adata
AnnData object with an embedding or representation
true_key
Column in adata.obs containing ground-truth labels
(e.g. “Subtype”, “Batch”, “CellType”).
resolutions
Sequence of Leiden resolutions to scan
• Default: np.linspace(0.2, 2.0, 10)
base_key
Base name for clustering columns in adata.obs.
Each resolution creates a column like “{base_key}_{resolution}”.
store_key Key under adata.uns to store results (if inplace=True)
use_rep
Representation used for neighbor graph construction
(e.g. “X_pca”)
n_pcs
Number of principal components to use (if applicable)
n_neighbors
Number of neighbors for kNN graph
metric
Distance metric for neighbor search
recompute_neighbors
Force recomputation of the neighbors graph
inplace
If True, store results in adata.uns[store_key]
random_state.
Random seed for Leiden clustering
Metrics explained#
Adjusted Rand Index (ARI) • Measures similarity between two clusterings • Adjusted for chance • Range: –1 to 1 • Higher = better agreement
Normalized Mutual Information (NMI) • Information-theoretic similarity • Range: 0 to 1 • Higher = better agreement • Requires scikit-learn
Cramér’s V • Association measure for categorical variables • Range: 0 to 1 • Always computed (no external dependencies)
Output#
Returns a tidy DataFrame with one row per resolution:
Column |
Description |
|---|---|
resolution |
Leiden resolution |
key |
obs column storing cluster labels |
n_clusters |
Number of clusters |
ARI |
Adjusted Rand Index |
NMI |
Normalized Mutual Information |
cramers_v |
Cramér’s V |
n_used |
Samples used in scoring |
n_missing |
Samples excluded (NA labels) |
Sorted by increasing resolution.
Examples#
Basic resolution scan
df = bk.tl.leiden_resolution_scan(
adata,
true_key="Subtype",
)
df
Custom resolution grid
df = bk.tl.leiden_resolution_scan(
adata,
true_key="CellType",
resolutions=[0.2, 0.4, 0.6, 0.8, 1.0],
)
Plot ARI vs resolution
import matplotlib.pyplot as plt
plt.plot(df["resolution"], df["ARI"], marker="o")
plt.xlabel("Leiden resolution")
plt.ylabel("ARI")
plt.show()
Notes#
The neighbors graph is computed once unless recompute_neighbors=True
Missing labels in true_key are automatically excluded from scoring
If scikit-learn is unavailable: • ARI and NMI are returned as NaN • Cramér’s V is still computed
Clustering columns (leiden_
) remain in adata.obs
See also#
• tl.neighbors
• tl.cluster
• pl.rankplot
• pl.violin
• scanpy.tl.leiden