Cap analysis of gene expression (CAGE) sequencing

DB David Brocks
CS Christopher R. Schmidt
MD Michael Daskalakis
HJ Hyo Sik Jang
NS Nakul M. Shah
DL Daofeng Li
JL Jing Li
BZ Bo Zhang
YH Yiran Hou
SL Sara Laudato
DL Daniel B. Lipka
JS Johanna Schott
HB Holger Bierhoff
YA Yassen Assenov
MH Monika Helf
AR Alzbeta Ressnerova
MI Md Saiful Islam
AL Anders M. Lindroth
SH Simon Haas
ME Marieke Essers
CI Charles D. Imbusch
BB Benedikt Brors
IO Ina Oehme
OW Olaf Witt
ML Michael Lübbert
JM Jan-Philipp Mallm
KR Karsten Rippe
RW Rainer Will
DW Dieter Weichenhan
GS Georg Stoecklin
CG Clarissa Gerhäuser
CO Christopher C. Oakes
TW Ting Wang
CP Christoph Plass
request Request a Protocol
ask Ask a question
Favorite

CAGE was performed in two independent experiments on normal and treated NCI-H1299 cells using the CAGE™ Preparation Kit from DNAFORM.jp according to the manufacturer’s instructions. Enrichment of capped RNAs versus uncapped ribosomal transcripts was used to assess sample quality. Samples with a minimum of 400-fold enrichment over ribosomal RNA were subjected to sequencing on the Illumina Hi-Seq 2000 system in 50 bp single-end (replicate 1) and 100 bp paired-end (replicate 2) mode by the DKFZ Genomics and Proteomics Core facility. Resulting raw sequencing data was processed as follows: Multiplexed samples were separated by barcode, trimmed at the first position to remove non-specific guanines53 as well as to 50 bps in the case of the 100 bp paired-end reads, and aligned against the reference genome (hg19) using HISAT54 version 0.1.6-beta. Only uniquely mapped reads were retained and in the case of SB939 and DAC+SB939, files were down-sampled to 25×106 aligned reads. The resulting BAM files were loaded into CAGEr version 1.10.055 and CTSS were called using the following parameters (sequencingQualityThreshold = 20, mappingQualityThreshold = 20). After simple tpm normalization, clusterCTSS were generated using the paraclu method (threshold = 0.1, nrPassThreshold=2, thresholdIsTpm = TRUE, removeSingletons = TRUE, keepSingletonsAbove = 0.2, minStability=2, maxLength=100, reduceToNonoverlapping = TRUE). Finally, consensus TSSs across all conditions and replicates were created using the aggregateTagClusters function (tpmThreshold = 0.3, qLow = NULL, qUp = NULL, maxDist = 100, excludeSignalBelowThreshold=FALSE). Importantly, no confounding effects of the underlying sequencing protocol on TSS expression were observed (Supplementary Fig. 3a). Distance to the nearest Gencode GRCh37.p13 annotated TSS was calculated using HOMER50 software and statistical analysis was performed in DESEQ version (1.18.0)48. Size factors were calculated for the normalization of TSS expression and dispersion estimates for each gene were obtained using the estimateDispersions function with the following parameters (method=”per-condition”, sharingMode=”maximum”). Differential expression between control and DAC, SB939, SAHA, and DAC+SB treated cells was assessed by testing the differences between the base means of two conditions (nbinomTest). Benjamini-Hochberg adjusted q-values below 0.05 were considered as significantly differentially expressed.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A