Neighbors#
- bullkpy.tl.neighbors(adata, *, n_neighbors=15, n_pcs=20, use_rep='X_pca', metric='euclidean', key_added='neighbors')[source]#
Compute kNN graph on samples (obs) using PCA representation.
- Stores:
adata.obsp[“distances”] (CSR sparse)
adata.obsp[“connectivities”] (CSR sparse; Gaussian kernel)
adata.uns[key_added] with parameters
Compute a k-nearest-neighbor (kNN) graph between samples using a low-dimensional representation (typically PCA), closely mirroring Scanpy’s neighbors workflow but implemented in a bulk-friendly, explicit way.
This function is the backbone for downstream steps such as Leiden clustering, graph-based UMAP, and clustering quality scans.
Overview#
neighbors builds two sample–sample graphs:
Distances graph.
Sparse matrix of pairwise distances to the k nearest neighbors.Connectivities graph.
A symmetrized, locally scaled Gaussian kernel graph, suitable for:Leiden / Louvain clustering
Graph-based UMAP
Graph-based metrics.
Both graphs are stored in adata.obsp.
Requirements#
Before calling neighbors, you must have:
A low-dimensional representation in adata.obsm[use_rep]
Typically created by bk.tl.pca(adata)
No missing samples (all rows in use_rep must be valid).
If use_rep is missing, a KeyError is raised.
Parameters#
Core parameters#
n_neighbors
Number of nearest neighbors (k).
Will be clipped to n_obs - 1 automatically.
Default: 15.
use_rep
Key in adata.obsm containing the representation used for neighbor search.
Default: “X_pca”.
n_pcs
Number of dimensions from use_rep to use.
If None, uses all available dimensions.
Default: 20.
metric
Distance metric used to find neighbors:
“euclidean” (default)
“cosine” (implemented via L2 normalization + Euclidean distance).
key_added
Key under which parameters are stored in adata.uns.
Default: “neighbors”.
Method details#
Neighbor search
Uses scipy.spatial.cKDTree for fast kNN queries
Queries k + 1 neighbors and removes self-neighbors
Distances are symmetrized using the minimum distance between pairs
Local scaling → connectivities
Distances are converted to connectivities using a locally scaled Gaussian kernel:
For each sample i, a local scale (\sigma_i =) distance to its k-th nearest neighbor
Connectivity between i and j:
[
w_{ij} = \exp\left(-\frac{d_{ij}^2}{2 \sigma_i \sigma_j}\right)
]
This produces a smooth, robust graph suitable for clustering and visualization.
Stored results#
After running neighbors, the following fields are populated:
In adata.obsp
adata.obsp[“distances”]
CSR sparse matrix
Shape: (n_obs, n_obs)
Symmetric nearest-neighbor distances
adata.obsp[“connectivities”].
CSR sparse matrix
Gaussian-kernel–weighted neighbor graph.
In adata.uns[key_added]
adata.uns["neighbors"] = {
"params": {
"n_neighbors": 15,
"n_pcs": 20,
"use_rep": "X_pca",
"metric": "euclidean",
}
}
Typical workflow#
# 1. Dimensionality reduction
bk.tl.pca(adata)
# 2. Build neighbors graph
bk.tl.neighbors(
adata,
n_neighbors=15,
n_pcs=20,
metric="euclidean",
)
# 3. Downstream analyses
bk.tl.cluster(adata, method="leiden")
bk.tl.umap_graph(adata)
Notes and caveats#
This function operates on samples (obs), not genes.
For cosine distance, vectors are L2-normalized first; results are equivalent to cosine similarity ranking.
The connectivities graph is symmetric and weighted, unlike a raw directed kNN graph.
Re-running neighbors will overwrite existing distances and connectivities.
See also#
• bk.tl.pca – compute PCA representation
• bk.tl.cluster – Leiden/Louvain clustering
• bk.tl.umap_graph – UMAP embedding from the neighbor graph
• bk.tl.leiden_resolution_scan – resolution benchmarking