Association#

bullkpy.tl.association(adata, *, x, y, layer='log1p_cpm')[source]#

Minimal association dispatcher (categorical focus).

  • gene vs categorical obs -> rank_genes_groups_fast-like is not applicable (needs a target group), so we run the global scan (gene_categorical_association) for that categorical obs.

  • categorical vs categorical -> categorical_association

For numeric correlations, keep using correlations.py utilities.

Unified dispatcher for association analyses between genes and annotations.

This function provides a single entry point to test associations between:

  • a gene and a categorical annotation

  • a numeric obs column and a categorical annotation

  • two categorical annotations

It automatically detects the types of x and y and routes the analysis to the appropriate lower-level function.

What it does#

Depending on the nature of x and y, association dispatches to:

x

y

Dispatched function

gene

categorical obs

gene_categorical_association

numeric obs

categorical obs

obs_categorical_association

categorical obs

categorical obs

categorical_association

The goal is to let users write:

bk.tl.association(adata, x="TP53", y="Subtype")

instead of worrying about which specific association function to call.

Parameters#

adata
AnnData object containing expression data and annotations

x
Name of a gene (adata.var_names) or
name of an obs column (adata.obs)

y
Name of a gene (adata.var_names) or
name of an obs column (adata.obs)

layer
Expression layer to use when x or y refers to a gene
(default: “log1p_cpm”)

method
Statistical method to use (passed through to the underlying function).
Common values:
• “kruskal”, “anova” (multi-group) • “mwu”, “ttest” (two-group) • “auto” (use defaults of the dispatched function)

Dispatch logic (conceptual)#

gene  categorical obs
     gene_categorical_association (single gene)

numeric obs  categorical obs
     obs_categorical_association (single variable)

categorical obs  categorical obs
     categorical_association

Numeric–numeric and gene–numeric associations are intentionally excluded from this dispatcher and should be handled by correlation utilities instead.

Returned value#

The return type depends on the dispatched function:

Case

Return type

gene ↔ categorical

pd.DataFrame

numeric obs ↔ categorical

pd.DataFrame

categorical ↔ categorical

dict

See the documentation of the underlying function for exact structure.

Examples#

Gene vs categorical annotation

bk.tl.association(
    adata,
    x="TP53",
    y="Subtype",
)

Equivalent to:

bk.tl.gene_categorical_association(
    adata,
    genes=["TP53"],
    groupby="Subtype",
)

Numeric obs vs categorical annotation

bk.tl.association(
    adata,
    x="age",
    y="Subtype",
)

Equivalent to:

bk.tl.obs_categorical_association(
    adata,
    obs_keys=["age"],
    groupby="Subtype",
)

Categorical vs categorical

bk.tl.association(
    adata,
    x="Batch",
    y="Subtype",
)

Equivalent to:

bk.tl.categorical_association(
    adata,
    key1="Batch",
    key2="Subtype",
)

Error handling#

The function raises informative errors when: • x or y cannot be resolved to gene or obs • unsupported combinations are requested (e.g. gene ↔ numeric) • required columns are missing

Design notes#

•	This is a dispatcher, not a statistical method itself
•	It favors explicitness over magic:

unsupported cases are rejected • Intended for interactive exploration and pipelines • Keeps the public API minimal while preserving flexibility

See also#

•	tl.gene_categorical_association
•	tl.obs_categorical_association
•	tl.categorical_association
•	tl.rank_genes_categorical
•	pl.violin