Gene-gene correlations#

bullkpy.tl.gene_gene_correlations(adata, *, gene, genes=None, layer='log1p_cpm', method='pearson', top_n=50, min_abs_r=None, use_abs=True, batch_key=None, batch_mode='none', covariates=None)[source]#

Correlate one gene with all other genes or a subset. Returns table: (gene, r, pval, qval, n).

Correlate one gene against all other genes (or a specified subset).

This function computes correlations between a single query gene and many other genes, returning the strongest associations ranked by correlation strength. It is useful for:

  • identifying co-expressed genes

  • exploring pathway membership

  • validating gene modules

  • hypothesis-driven correlation analysis

Purpose#

gene_gene_correlations answers the question:

Which genes are most correlated with this gene of interest?

Compared to top_gene_gene_correlations, this function is: • O(G) instead of O(G²) • focused on a single anchor gene • safe to run on genome-wide data

Parameters#

adata
AnnData object containing expression data.

gene
Query gene name. Must be present in adata.var_names.

genes
Optional list of target genes to correlate against.
If None, all genes except the query gene are used.

layer Expression layer to use (default: “log1p_cpm”).

method
Correlation method: • “pearson” • “spearman”

top_n
Maximum number of correlated genes to return (default: 50).

min_abs_r
Optional minimum absolute correlation threshold.

use_abs
If True (default), ranking is based on |r|.
If False, ranking uses signed correlation.

batch_key
Optional obs column specifying batch labels.

batch_mode
Batch handling strategy: • “none”: ignore batch structure • “within”: compute correlations within batches • “residual”: regress out batch effects before correlation

covariates
Optional numeric obs columns to regress out prior to correlation.

What is computed#

For each gene: • correlation coefficient (r) • p-value • FDR-adjusted q-value • number of samples used (n)

If batch handling is enabled, correlations are computed using batch-aware residualization or within-batch aggregation.

Returned value#

A DataFrame sorted by correlation strength:

column

description

query_gene

Anchor gene

gene

Correlated gene

r

Correlation coefficient

pval

Raw p-value

qval

FDR-adjusted p-value

n

Number of samples used

method

Correlation method

batch_key

Batch column used

batch_mode

Batch handling strategy

Examples#

Genome-wide correlation for a single gene

bk.tl.gene_gene_correlations(
    adata,
    gene="TP53",
)

Restrict to a pathway

bk.tl.gene_gene_correlations(
    adata,
    gene="MYC",
    genes=hallmark_cell_cycle,
    top_n=20,
)

With batch correction

bk.tl.gene_gene_correlations(
    adata,
    gene="CDKN1A",
    batch_key="Batch",
    batch_mode="residual",
)

Require strong correlations only

bk.tl.gene_gene_correlations(
    adata,
    gene="BRCA1",
    min_abs_r=0.5,
)

Interpretation notes#

•	High correlation suggests co-regulation, not causation
•	Batch effects can strongly inflate correlations
•	Always inspect results visually (scatter plots recommended)
•	Use alongside:
•	tl.top_gene_gene_correlations
•	pl.scatter
•	pl.heatmap

See also#

•	tl.top_gene_gene_correlations
•	tl.association
•	pl.scatter
•	pl.rankplot