UMAP graph#
- bullkpy.tl.umap_graph(adata, *, graph_key='connectivities', use_rep='X_pca', n_pcs=20, min_dist=0.5, spread=1.0, n_components=2, random_state=0, init='spectral', negative_sample_rate=5, n_epochs=None)[source]#
Compute UMAP embedding strictly from a precomputed neighbor graph.
- Requires:
adata.obsp[graph_key] (typically ‘connectivities’ from bk.tl.neighbors)
- Uses:
adata.obsm[use_rep] only for initialization (spectral/random), not for graph construction.
- Stores:
adata.obsm[‘X_umap_graph’]
adata.uns[‘umap_graph’]
Compute a UMAP embedding strictly from a precomputed neighbor graph (e.g. Leiden/UMAP workflow where the graph is fixed and you want the embedding to reflect that exact graph).
Unlike bk.tl.umap, which embeds directly from a representation (e.g. PCA), umap_graph uses adata.obsp[graph_key] as the source of neighborhood structure. The chosen representation in adata.obsm[use_rep] is used only for initialization.
When to use#
Use umap_graph when you want:
a UMAP embedding that matches a specific neighbors graph already computed (e.g. after tuning n_neighbors, metric, batch correction, etc.)
reproducible embeddings from a stored graph (the structure is fixed in obsp)
Scanpy-like consistency: neighbors → clustering → umap where UMAP is “based on the graph”
Requirements#
This function requires:
adata.obsp[graph_key] exists.
Typically created by bk.tl.neighbors(adata) as “connectivities”.adata.obsm[use_rep] exists.
Typically created by bk.tl.pca(adata) as “X_pca” (used only for init).umap-learn installed:
pip install umap-learn
If any requirement is missing, a clear KeyError/ImportError is raised.
Parameters#
Graph and initialization#
graph_key
Key in adata.obsp containing the neighbor graph.
Default: “connectivities”.
use_rep
Key in adata.obsm used only to initialize the embedding.
Default: “X_pca”.
n_pcs
Number of dimensions from use_rep to use for initialization.
Default: 20.
UMAP layout parameters#
min_dist
Minimum spacing between points in the embedding.
Smaller → tighter clusters.
Default: 0.5.
spread
Overall scale of the embedding.
Default: 1.0.
n_components
Output dimensionality (2D or 3D).
Default: 2.
init.
Initialization method:
“spectral” (recommended, graph/structure-aware)
“random”. Default: “spectral”.
random_state
Random seed for reproducibility.
Default: 0.
Optimization controls#
negative_sample_rate
UMAP optimization parameter; larger can improve separation at cost of speed.
Default: 5.
n_epochs
Number of training epochs.
If None, umap-learn chooses a default; this implementation falls back to a safe default in some versions.
Default: None.
Output#
After running, the following are created:
adata.obsm[“X_umap_graph”]. Array of shape (n_obs, n_components) with graph-based UMAP coordinates.
adata.uns[“umap_graph”]. Stores parameters for provenance, e.g.:
{
"params": {
"mode": "graph",
"graph_key": "connectivities",
"use_rep_init": "X_pca",
"n_pcs_init": 20,
"min_dist": 0.5,
"spread": 1.0,
"n_components": 2,
"random_state": 0,
"init": "spectral",
"negative_sample_rate": 5,
"n_epochs": None,
}
}
Notes / caveats#
The graph is treated as fixed: changing use_rep or n_pcs changes only the initialization, not which neighbors exist.
This implementation uses some umap-learn internals (fit_embed_data, graph) for compatibility across versions. If you upgrade umap-learn and see errors, pinning a known working version may help.
Ensure adata.obsp[graph_key] is a proper sparse connectivities matrix (CSR recommended). The function will coerce to CSR and remove explicit zeros.
Examples#
Standard neighbors → UMAP-from-graph
bk.tl.pca(adata)
bk.tl.neighbors(adata, n_neighbors=15, n_pcs=20)
bk.tl.umap_graph(adata)
Tighter embedding, more optimization
bk.tl.umap_graph(
adata,
min_dist=0.1,
negative_sample_rate=10,
n_epochs=800,
)
3D graph UMAP
bk.tl.umap_graph(adata, n_components=3)