Alternative poly-adenylation (APA) analysis

Hari Krishna Yalamanchili; Callison Edward Alcott; Zhandong Liu; Huda Yahya Zoghbi

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Preprint

Alternative poly-adenylation (APA) analysis

ZL Zhandong Liu

HZ Huda Yahya Zoghbi

*Contributed equally to this work Last updated date: Dec 2, 2020 Views: 819 Forks: 0

An abbreviated version of this protocol was published in eLife in Apr, 2020

Partial loss of CFIm25 causes learning deficits and aberrant neuronal alternative polyadenylation

Download PDF

Ask a question

How to cite

Favorite

Alternative polyadenylation (APA) analysis: Python and R scripts used to perform the alternative polyadenylation analysis are available in “Supplementary file 2: Alternative polyadenylation analysis code,” found in the “Additional files” section of the manuscript. This document serves as the help file to execute the codes.

Dependency packages:

Python packages: pybedtools, pandas, concurrent.futures R Packages: DRIMSeq # BiocManager::install("DRIMSeq") fastp: https://github.com/OpenGene/fastp.git

pigz: https://github.com/madler/pigz.git

bowtie2: https://github.com/BenLangmead/bowtie2.git samtools: https://github.com/samtools/samtools.git

Usage: PACSeqAnalysisPolyA_DB.py <config file>

Configuration file: User can specify raw data directory, reference genome, reference polyA annotations, key analysis parameters and other output parameters in the configuration file, saved as a “yaml” file.

# Header

APSeq:

# base directory of raw data (fastq data in zipped gz format) #

baseDir: RawData

# output directory – all the results will be saved to this directory #

outDir: TestAPAOutput

# maximum CPU cores to use #

mc: "10"

# Path to reference bowtie2 index #

ref_genome: /Index/Human_hg38_UCSC/hg38

# Path to reference genome fasta sequence #

ref_fasta: / Index/Human_hg38_UCSC/hg38.fasta

# Path to reference gene bed file #

ref_bed: /Index/Human_hg38_UCSC/hg38.ucsc.refseq.bed

# Path to reference PolyA data (PolyA_DB) bed file – from PolyADB#

ref_polyA: /mnt/data/Index/PolyA_DB3/hg38.PASsignal.bed

# AP block – reads within this distance of an annotated polyA site are considered. Typically, a read’s length #

APAblock: 69

# Merge AP blocks distance –Cluster annotated polyA sites within this distance #

mddb: 0

# Merge features distance – reads within this distance are pooled as one polyA site #

md: 15

# AP site anchor/feature overlap - reads that overlap by at least an anchor distance are assigned to the same polyA peak #

anchor: 1

# control samples: Should be same as zipped (gz) fastq files, excluding file extensions - if raw files are WT1.fastq.gz, WT2.fastq.gz, or WT3.fastq.gz, replace the file extensions #

ct: WT1,WT2,WT3

# treatment samples: Should be same as zipped (gz) fastq files excluding file extensions #

tr: MT1,MT2,MT3

# poverA – fraction of samples required to have a minimum read count rc (option bellow) #

pA: 0.60

# Read cutoff – at least the pA fraction of samples in either of the conditions should have a minimum of rc reads #

rc: 5

# Minimum Reads per polyA site #

M: 1

# file key – output file name head#

fkey: AltPA_Results

Test environment: Ubuntu 16.04.6, Python 3.6.8, bowtie2 v2.2.6, samtools v1.4.1, bedtools v2.25.0, fastp v0.19.4. Default parameters should work for most of the datasets. However, we recommend adjusting the pA, rc, and M parameters to account for data variability and sequencing depth.

How to cite：

Readers should cite both the Bio-protocol preprint and the original research article where this protocol was used:

Yalamanchili, H K, Alcott, C E, Liu, Z and Zoghbi, H Y(2020). Alternative poly-adenylation (APA) analysis. Bio-protocol Preprint. bio-protocol.org/prep672.
Alcott, C. E., Yalamanchili, H. K., Ji, P., van der Heijden, M. E., Saltzman, A., Elrod, N., Lin, A., Leng, M., Bhatt, B., Hao, S., Wang, Q., Saliba, A., Tang, J., Malovannaya, A., Wagner, E. J., Liu, Z. and Zoghbi, H. Y.(2020). Partial loss of CFIm25 causes learning deficits and aberrant neuronal alternative polyadenylation. eLife. DOI: 10.7554/eLife.50895