Set raw counts#

bullkpy.pp.set_raw_counts(adata, *, layer='counts', overwrite=False)[source]#

Store current adata.X into adata.layers[layer] as raw counts.

Call this once right after reading counts (before any normalization).

Store the current expression matrix as raw counts in an AnnData layer.

set_raw_counts is a convenience function to preserve the original count matrix before any normalization or transformation steps are applied. This mirrors the recommended Scanpy workflow of keeping raw data accessible for QC, filtering, and reference.

What it does#

•	Copies the current adata.X matrix into adata.layers[layer]
•	Intended to be called immediately after reading count data
•	Prevents accidental overwriting of existing raw layers by default

Parameters#

adata
AnnData object containing raw counts in adata.X.

layer
Name of the layer where raw counts will be stored
(default: “counts”).

overwrite
If False (default), the function will not overwrite an existing layer and will emit a warning instead.
If True, any existing layer with the same name will be replaced.

Recommended usage#

Call this function once right after loading the count matrix and before any of the following steps:
• normalization (CPM / TPM) • log transformation • batch correction • filtering

This ensures that downstream steps can always reference unmodified counts.

Examples#

Basic usage after reading counts

import bullkpy as bk

adata = bk.io.read_counts("counts.tsv")
bk.pp.set_raw_counts(adata)

Raw counts are now available as:

adata.layers["counts"]

Overwrite an existing raw layer (not recommended unless intentional)

bk.pp.set_raw_counts(adata, overwrite=True)

Notes#

•	This function does not check whether the data in adata.X are truly raw.

It assumes the user calls it at the appropriate time. • Downstream functions such as filter_samples and filter_genes can reference the stored raw counts via layer=”counts”.