Add metadata#

bullkpy.io.add_metadata(adata, metadata_file, *, index_col, sep='\t', low_memory=False, how='left')[source]#

Add sample metadata to an AnnData object.

Parameters:
  • adata – AnnData object with samples in .obs.

  • metadata_file – Path to metadata file (tsv, csv, or xlsx).

  • index_col – Column in metadata that matches adata.obs_names.

  • sep – Column separator for tsv/csv files.

  • low_memory – Pandas low_memory parameter

  • how – Merge strategy: - “left”: keep all samples in adata (default) - “inner”: keep only samples present in metadata

Returns:

AnnData – The same AnnData object with updated .obs.

Add sample-level metadata to an existing AnnData object.

bk.io.add_metadata merges clinical, experimental, or annotation data into adata.obs, aligning rows by sample identifiers.

Purpose#

This function is used to attach sample annotations (clinical variables, phenotypes, batch information, mutation status, etc.) to an AnnData object created from a count matrix.

It is typically run immediately after read_counts.


Supported metadata formats#

  • Tab-separated files (.tsv)

  • Comma-separated files (.csv)

  • Excel files (.xls, .xlsx)

The metadata file must contain one column identifying samples, which will be matched to adata.obs_names.


Basic usage#

import bullkpy as bk

adata = bk.io.read_counts("counts.tsv")

adata = bk.io.add_metadata(
    adata,
    "metadata.tsv",
    index_col="Sample_ID",
)

After running this function, all metadata columns will be available in adata.obs.

Basic usage#

The column specified by index_col must contain sample IDs identical to adata.obs_names.

Example metadata table: Sample_ID, Project_ID, Age, Sex
TCGA-01, LUAD, 64, F
TCGA-02, LUSC, 71, M

adata = bk.io.add_metadata(
    adata,
    "clinical.tsv",
    index_col="Sample_ID",
)

Merge strategies#

The how argument controls how samples are retained during the merge.

Keep all samples (default)#

adata = bk.io.add_metadata(
    adata,
    "metadata.tsv",
    index_col="Sample_ID",
    how="left",
)
•	Keeps all samples in adata
•	Samples without metadata will contain NaN values

Keep only samples with metadata#

adata = bk.io.add_metadata(
    adata,
    "metadata.tsv",
    index_col="Sample_ID",
    how="inner",
)
•	Drops samples not present in the metadata file
•	Useful when metadata completeness is required

Warnings and diagnostics#

The function provides informative warnings: • Duplicated sample IDs in metadata • Missing metadata for some samples • Number of samples successfully matched

Example warning:

WARNING: 12 samples in AnnData are missing metadata (showing up to 5)

Output#

The function modifies and returns the same AnnData object: • Metadata columns are appended to adata.obs • Existing .obs columns are preserved • Index order remains consistent with sample order

adata.obs.head()

Notes and best practices#

•	Metadata columns with mixed types may be imported as object
•	For large metadata tables, consider sanitizing dtypes before saving

to .h5ad (e.g. convert categorical variables to category) • This function does not modify .X, .var, or .layers