Partial correlations#

bullkpy.tl.partial_corr(adata, *, x, y, covariates, method='pearson')[source]#: Partial correlation: residualize x and y on covariates (obs columns) then correlate.

Compute a partial correlation between two variables by first regressing out one or more covariates taken from adata.obs.

This is useful when you want to quantify the association between x and y after removing the linear effects of known confounders (e.g. batch, library size, purity scores).

What it does#

Builds a design matrix from adata.obs[covariates] (with an intercept).
Residualizes x and y with respect to these covariates.
Computes a correlation between the residuals.

Conceptually:

[ x^* = x - \hat{x}(\text{covariates}), \quad y^* = y - \hat{y}(\text{covariates}) ]

[ \text{partial corr}(x, y \mid \text{covariates}) = \text{corr}(x^, y^) ]

Parameters#

adata
AnnData object containing covariates in .obs.

x
1D numeric array (length = number of observations).

y
1D numeric array (length = number of observations).

covariates
List of obs column names to regress out.
These should typically be numeric (categorical covariates should be encoded beforehand).

method
Correlation method: • “pearson” (default) • “spearman”

Returns#

A tuple:

(r, pval, n)

r — partial correlation coefficient
pval — two-sided p-value
n — number of observations used in the correlation

Examples#

Partial correlation between two genes controlling for library size

x = adata.layers["log1p_cpm"][:, adata.var_names.get_loc("TP53")]
y = adata.layers["log1p_cpm"][:, adata.var_names.get_loc("MDM2")]

r, p, n = bk.tl.partial_corr(
    adata,
    x=x,
    y=y,
    covariates=["libsize"],
)

Gene–obs partial correlation adjusting for batch and purity

x = bk.tl._get_gene_vector(adata, "PDCD1", layer="log1p_cpm")
y = adata.obs["signature_Tcell"].to_numpy()

r, p, n = bk.tl.partial_corr(
    adata,
    x=x,
    y=y,
    covariates=["Batch", "purity"],
    method="spearman",
)

Notes#

Covariates are treated linearly (ordinary least squares residualization).
This function does not handle missing values explicitly; NaNs in x, y, or covariates should be removed or handled upstream.
For large-scale analyses, prefer higher-level wrappers such as: • top_gene_obs_correlations • top_gene_gene_correlations which can call partial correlation internally via batch/covariate logic.