Make Obs .h5ad safe#

Force adata.obs columns into HDF5-safe formats so that adata.write() succeeds.

Description#

make_obs_h5ad_safe sanitizes columns in adata.obs to ensure they can be written to an .h5ad file without errors.

This function modifies adata.obs in place by converting problematic dtypes into safe, serializable representations:
• categorical columns → categorical with string categories • object columns → elementwise string conversion • index and column names → strings

It is intended as a last-mile safety step before calling adata.write().

What problems does it fix?#

This function prevents common HDF5 write errors such as:
• TypeError: Can’t implicitly convert non-string objects to strings • Object dtype has no native HDF5 equivalent • mixed Python types inside object columns • categorical columns with non-string categories • non-string obs index or column names

Unlike heuristic sanitizers, this function applies explicit, deterministic rules that are known to be safe for AnnData serialization.

Parameters#

adata
AnnData object whose .obs table will be sanitized.

cols

  • list[str] | None

  • Subset of .obs columns to sanitize.
    If None (default), all columns in adata.obs are processed.

Returns#

  • AnnData.

  • The same AnnData object with modified .obs.

Rules applied#

Column type –> Action category.–> Convert categories to strings, then recast as categorical.
object –> Convert each value to string.
None –> Converted to empty string “ “
NaN –> Converted to empty string “ “

Additionally:
• adata.obs.columns are forced to strings • adata.obs_names are forced to strings

Example usage#

# Minimal safe write
bk.io.make_obs_h5ad_safe(adata)
adata.write("clean.h5ad")

# Targeted cleanup
bk.io.make_obs_h5ad_safe(
    adata,
    cols=["clinical_notes", "mutation_status", "free_text_comments"]
)

# Combined with diagnostic
bad, _ = bk.io.find_bad_obs_cols_by_write(adata)

if bad:
    cols = [c for c, _ in bad]
    bk.io.make_obs_h5ad_safe(adata, cols=cols)

adata.write("safe.h5ad")

When should I use this?#

Use make_obs_h5ad_safe when:
• adata.write() fails due to .obs issues • metadata came from heterogeneous sources (Excel, TCGA, clinical tables) • you want a guaranteed HDF5-safe export

Do not use it if:
• you need to preserve exact Python objects • you rely on non-string categorical categories • you want to keep lists/dicts as structured data

Notes.#

•	This function modifies adata in place
•	Only .obs is affected
•	Does not touch .var, .uns, .layers, or .obsm

If adata.write() still fails after this step, the issue is likely in another AnnData component (e.g. .uns or .obsm).