scbiot.pp.annotate_gene_activity#
- scbiot.pp.annotate_gene_activity(atac, gtf_file, *, peak_chrom_col='chrom', peak_start_col='chromStart', peak_end_col='chromEnd', gene_biotypes=('protein_coding', 'lncRNA'), promoter_up=2000, promoter_down=0, include_gene_body=False, weight_by_distance=True, tss_decay_bp=2000, prefer_gene_name=True, promoter_priority=True, verbose=True)#
Build a gene-activity AnnData by assigning ATAC peaks to genes.
Parameters#
- atac:
ATAC AnnData with peak counts in
.X(or.layers["counts"]).- gtf_file:
Path to the GTF annotation used for gene coordinates.
- peak_chrom_col / peak_start_col / peak_end_col:
Column names in
atac.varfor peak coordinates. If missing, the function falls back to standard columns or parsesatac.var_names.- gene_biotypes:
Gene biotypes to retain from the GTF (
gene_biotypeorgene_type).- promoter_up / promoter_down:
Upstream/downstream distances (bp) from TSS for promoter regions.
- include_gene_body:
Include gene body overlaps in addition to promoters.
- weight_by_distance:
Weight peak contributions by distance to the TSS.
- tss_decay_bp:
Length scale (bp) for exponential decay when
weight_by_distanceis True.- prefer_gene_name:
Prefer
gene_nameovergene_idwhen naming genes.- promoter_priority:
Prefer promoter overlaps when a peak maps to both promoter and gene body.
- verbose:
Emit progress logging when True.
Returns#
- AnnData
Gene-activity matrix with genes as variables and cells as observations. Includes
var["n_peaks"]anduns["provenance"]metadata.
Examples#
Basic usage:
>>> import scbiot as scb # download gtf from GENCODE: https://www.gencodegenes.org/human/ >>> ga = scb.pp.annotate_gene_activity(atac, f"{dir}/inputs/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz")