scbiot.pp.annotate_gene_activity

scbiot.pp.annotate_gene_activity#

scbiot.pp.annotate_gene_activity(atac, gtf_file, *, peak_chrom_col='chrom', peak_start_col='chromStart', peak_end_col='chromEnd', gene_biotypes=('protein_coding', 'lncRNA'), promoter_up=2000, promoter_down=0, include_gene_body=False, weight_by_distance=True, tss_decay_bp=2000, prefer_gene_name=True, promoter_priority=True, verbose=True)#

Build a gene-activity AnnData by assigning ATAC peaks to genes.

Parameters#

atac:

ATAC AnnData with peak counts in .X (or .layers["counts"]).

gtf_file:

Path to the GTF annotation used for gene coordinates.

peak_chrom_col / peak_start_col / peak_end_col:

Column names in atac.var for peak coordinates. If missing, the function falls back to standard columns or parses atac.var_names.

gene_biotypes:

Gene biotypes to retain from the GTF (gene_biotype or gene_type).

promoter_up / promoter_down:

Upstream/downstream distances (bp) from TSS for promoter regions.

include_gene_body:

Include gene body overlaps in addition to promoters.

weight_by_distance:

Weight peak contributions by distance to the TSS.

tss_decay_bp:

Length scale (bp) for exponential decay when weight_by_distance is True.

prefer_gene_name:

Prefer gene_name over gene_id when naming genes.

promoter_priority:

Prefer promoter overlaps when a peak maps to both promoter and gene body.

verbose:

Emit progress logging when True.

Returns#

AnnData

Gene-activity matrix with genes as variables and cells as observations. Includes var["n_peaks"] and uns["provenance"] metadata.

Examples#

Basic usage:

>>> import scbiot as scb
# download gtf from GENCODE: https://www.gencodegenes.org/human/
>>> ga = scb.pp.annotate_gene_activity(atac, f"{dir}/inputs/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz")
Parameters:
  • atac (anndata.AnnData)

  • gtf_file (Path | str)

  • peak_chrom_col (str)

  • peak_start_col (str)

  • peak_end_col (str)

  • gene_biotypes (tuple[str, ...])

  • promoter_up (int)

  • promoter_down (int)

  • include_gene_body (bool)

  • weight_by_distance (bool)

  • tss_decay_bp (int)

  • prefer_gene_name (bool)

  • promoter_priority (bool)

  • verbose (bool)

Return type:

anndata.AnnData