Neighbors#

bullkpy.tl.neighbors(adata, *, n_neighbors=15, n_pcs=20, use_rep='X_pca', metric='euclidean', key_added='neighbors')[source]#

Compute kNN graph on samples (obs) using PCA representation.

Stores:

adata.obsp[“distances”] (CSR sparse)
adata.obsp[“connectivities”] (CSR sparse; Gaussian kernel)
adata.uns[key_added] with parameters

Compute a k-nearest-neighbor (kNN) graph between samples using a low-dimensional representation (typically PCA), closely mirroring Scanpy’s neighbors workflow but implemented in a bulk-friendly, explicit way.

This function is the backbone for downstream steps such as Leiden clustering, graph-based UMAP, and clustering quality scans.

Overview#

neighbors builds two sample–sample graphs:

Distances graph.
Sparse matrix of pairwise distances to the k nearest neighbors.
Connectivities graph.
A symmetrized, locally scaled Gaussian kernel graph, suitable for:
- Leiden / Louvain clustering
- Graph-based UMAP
- Graph-based metrics.

Both graphs are stored in adata.obsp.

Requirements#

Before calling neighbors, you must have:

A low-dimensional representation in adata.obsm[use_rep]
- Typically created by bk.tl.pca(adata)
No missing samples (all rows in use_rep must be valid).

If use_rep is missing, a KeyError is raised.

Parameters#

Core parameters#

n_neighbors
Number of nearest neighbors (k).
Will be clipped to n_obs - 1 automatically.
Default: 15.

use_rep
Key in adata.obsm containing the representation used for neighbor search.
Default: “X_pca”.

n_pcs Number of dimensions from use_rep to use.
If None, uses all available dimensions.
Default: 20.

metric
Distance metric used to find neighbors:

“euclidean” (default)
“cosine” (implemented via L2 normalization + Euclidean distance).

key_added
Key under which parameters are stored in adata.uns.
Default: “neighbors”.

Method details#

Neighbor search

Uses scipy.spatial.cKDTree for fast kNN queries
Queries k + 1 neighbors and removes self-neighbors
Distances are symmetrized using the minimum distance between pairs

Local scaling → connectivities

Distances are converted to connectivities using a locally scaled Gaussian kernel:

For each sample i, a local scale (\sigma_i =) distance to its k-th nearest neighbor
Connectivity between i and j:
[
w_{ij} = \exp\left(-\frac{d_{ij}^2}{2 \sigma_i \sigma_j}\right)
]

This produces a smooth, robust graph suitable for clustering and visualization.

Stored results#

After running neighbors, the following fields are populated:

In adata.obsp

adata.obsp[“distances”]
- CSR sparse matrix
- Shape: (n_obs, n_obs)
- Symmetric nearest-neighbor distances
adata.obsp[“connectivities”].
- CSR sparse matrix
- Gaussian-kernel–weighted neighbor graph.

In adata.uns[key_added]

adata.uns["neighbors"] = {
    "params": {
        "n_neighbors": 15,
        "n_pcs": 20,
        "use_rep": "X_pca",
        "metric": "euclidean",
    }
}

Typical workflow#

# 1. Dimensionality reduction
bk.tl.pca(adata)

# 2. Build neighbors graph
bk.tl.neighbors(
    adata,
    n_neighbors=15,
    n_pcs=20,
    metric="euclidean",
)

# 3. Downstream analyses
bk.tl.cluster(adata, method="leiden")
bk.tl.umap_graph(adata)

Notes and caveats#

This function operates on samples (obs), not genes.
For cosine distance, vectors are L2-normalized first; results are equivalent to cosine similarity ranking.
The connectivities graph is symmetric and weighted, unlike a raw directed kNN graph.
Re-running neighbors will overwrite existing distances and connectivities.