scbiot.pp.add_iterative_lsi

scbiot.pp.add_iterative_lsi#

scbiot.pp.add_iterative_lsi(adata, n_components=51, drop_first_component=True, tfidf_layer='tfidf', add_key='X_lsi', outlier_quantiles=(0.02, 0.98), **lsi_kwargs)#

Convenience wrapper that runs iterative LSI.

Parameters#

adata:

ATAC AnnData with peak counts in .X or the layer selected via layer.

n_components:

Number of LSI components to compute.

drop_first_component:

Drop the first component (often depth-associated) when True.

tfidf_layer:

Ignored; retained for backward compatibility.

add_key:

Key in adata.obsm to store the LSI embedding.

outlier_quantiles:

Quantiles for winsorizing TF values before IDF scaling. Set to None to disable.

**lsi_kwargs:

Additional keyword arguments forwarded to lsi_transform (for example, n_iter, topN, layer, per_cluster_union).

Returns#

np.ndarray

The LSI embedding written to adata.obsm[add_key].

Notes#

We no longer precompute and pass a TF-IDF layer here to avoid double-normalization.

Examples#

Basic usage:

>>> import scbiot as scb
# Removed promoter-proximal peaks
>>> adata_top = scb.pp.remove_promoter_proximal_peaks(adata, f"{dir}/inputs/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz")    
# High variable peak selection
>>> scb.pp.find_variable_features(adata_top, batch_key="batchname_all")    
# TF-IDF
>>> scb.pp.add_iterative_lsi(adata_top, n_components=31, n_iter=2, drop_first_component=True, add_key="X_lsi")
>>> adata_atac.obs['X_lsi'] = adata_top.obs['X_lsi']
Parameters:
  • adata (anndata.AnnData)

  • n_components (int)

  • drop_first_component (bool)

  • tfidf_layer (str)

  • add_key (str)

  • outlier_quantiles (Tuple[float, float] | None)

  • lsi_kwargs (Any)

Return type:

numpy.ndarray