UMAP embedding#

bullkpy.tl.umap(adata, *, n_neighbors=15, n_pcs=20, use_rep='X_pca', min_dist=0.5, spread=1.0, n_components=2, metric='euclidean', random_state=0, init='spectral')[source]#

Compute UMAP embedding from a representation (default: PCA).

This mirrors Scanpy practice: UMAP is computed from X_pca with n_neighbors/min_dist, consistent with the neighbors graph settings.

Stores:
  • adata.obsm[‘X_umap’]

  • adata.uns[‘umap’]

Compute a UMAP embedding from an existing low-dimensional representation (default: PCA), following Scanpy’s standard workflow.

This function is intentionally lightweight and bulk-friendly: it does not recompute neighbors internally, but instead embeds samples directly from a representation such as X_pca.

When to use#

Use umap when you want to:

  • visualize samples in 2D or 3D after PCA

  • mirror Scanpy-style workflows (pca → umap)

  • control UMAP parameters explicitly (neighbors, distance, metric, etc.).

This function assumes you have already computed PCA (or another representation) and stored it in adata.obsm.

Parameters#

Input / representation#

adata
AnnData object with samples in rows.

use_rep
Key in adata.obsm containing the representation to embed.
Default: “X_pca”.
If the key is missing, a KeyError is raised.

n_pcs
Number of dimensions from use_rep to use.
If None, all dimensions are used.
Default: 20.

UMAP parameters#

n_neighbors
Size of the local neighborhood (controls local vs global structure).
Typical values: 10–50.
Default: 15.

min_dist
Minimum distance between embedded points.
Smaller values → tighter clusters.
Default: 0.5.

spread
Controls the overall scale of the embedding.
Default: 1.0.

n_components
Output dimensionality of the embedding.

  • 2 → 2D UMAP

  • 3 → 3D UMAP. Default: 2.

metric
Distance metric used in the high-dimensional space.
Default: “euclidean”.

random_state
Random seed for reproducibility.
Default: 0.

init
Initialization method for UMAP.
Common values: “spectral”, “random”.
Default: “spectral”.

Output#

After running, the following fields are populated:

Embedding.

  • adata.obsm[“X_umap”].
    NumPy array of shape (n_obs, n_components) containing the UMAP coordinates.

Metadata

  • adata.uns[“umap”].
    Dictionary storing UMAP parameters for provenance and reproducibility:

{
  "params": {
    "use_rep": "X_pca",
    "n_neighbors": 15,
    "n_pcs": 20,
    "min_dist": 0.5,
    "spread": 1.0,
    "n_components": 2,
    "metric": "euclidean",
    "random_state": 0,
    "init": "spectral",
  }
}


## Dependencies
- Requires umap-learn:

```python
pip install umap-learn

If umap-learn is not installed, an informative ImportError is raised.

## Examples

Standard PCA  UMAP workflow
```python
bk.tl.pca(adata, layer="log1p_cpm", n_comps=30)
bk.tl.umap(adata)

Use more neighbors and tighter clusters

bk.tl.umap(
    adata,
    n_neighbors=30,
    min_dist=0.1,
)

3D UMAP from the first 50 PCs

bk.tl.umap(
    adata,
    n_components=3,
    n_pcs=50,
)

Notes / caveats#

  • This function embeds directly from the chosen representation; it does not rebuild or reuse a neighbors graph.

  • For consistency across analyses, you should usually use the same n_pcs here as in PCA-based downstream steps (clustering, neighbors, etc.).

  • Results are stochastic unless random_state is fixed.