Optimal transport: ot

Optimal transport: ot#

OT utilities for aligning batches and modalities. The functions below match what you see in the tutorials; refer to the notebooks for full, runnable examples.

  • integrate: batch correction for single-modality or cross-modality data (RNA or ATAC).

For a basic scRNA-seq dataset integration:

adata, metrics = scb.ot.integrate(
    adata,
    preset="rna",
    obsm_key="X_pca",
    batch_key="batch",
    out_key="X_ot"
)

For stable tuning, use the meta-parameter interface:

adata, metrics = scb.ot.integrate(
    adata,
    preset="rna",
    epsilon=0.03,
    tau=0.40,
    knn_scale=1.0,
    batch_strength=1.0,
    gate_temperature=1.0,
    # optional supervision:
    label_key="semi_cell_type",
    unlabeled_category="Unknown",
    sup_strength=0.10,
)

For unpaired RNA/ATAC workflows, compute a shared PCA with pp.coembed_pca and then run ot.integrate(preset="anchor", obsm_key="X_pca_shared", batch_key="modality", reference_category="reference") to align query cells to the reference.

For paired RNA/ATAC workflows, use the paired preset so OT sees each cell’s matched views directly. Call:

adata, metrics = scb.ot.integrate(
    adata,
    preset="paired",
    obsm_key="X_pca",
    view_key="X_lsi",
    batch_key="batch",
    out_key="X_ot"
)

The view_keys tuple points to the RNA PCA and ATAC LSI embeddings so the barycentric objective leverages the paired measurements directly.

Scaling options#

For ultra-large datasets, use centroid-level OT:

adata, metrics = scb.ot.integrate(
    adata,
    preset="centroid",
    obsm_key="X_pca",
    batch_key="batch",
    out_key="scBIOT",
)

If you want centroid OT while keeping another preset’s OT hyperparameters, enable the flag:

adata, metrics = scb.ot.integrate(
    adata,
    preset="anchor",
    obsm_key="X_pca",
    batch_key="batch",
    out_key="X_ot",
    centroid_ot=True,
)

For a faster approximate OT run on large datasets, enable the approximate OT solver while keeping your preset’s data keys:

adata, metrics = scb.ot.integrate(
    adata,
    preset="atac",
    obsm_key="X_lsi",
    batch_key="batchname_all",
    out_key="X_ot",
    approximate_ot=True,
)

OT backend controls#

All OT entry points share the use_gpu/gpu_device and ot_backend knobs.