Identification of genetic variation with bulk tissue RNA-seq

ML Min Kyung Lee
NA Nasim Azizgolshani
JS Joshua A. Shapiro
LN Lananh N. Nguyen
FK Fred W. Kolling
GZ George J. Zanazzi
HF Hildreth Robert Frost
BC Brock C. Christensen
ask Ask a question
Favorite

RNA was collected using Qiagen RNeasy plus kit (Catalog ID: 74034, Qiagen, Hilden, Germany). RNA-seq libraries were prepared following the Takara Pico v3 low input protocol and sequenced on Illumina NextSeq500.

Raw RNA-seq data were trimmed for polyA sequences and low-quality bases using cutadapt (v2.4)51. Reads were aligned to human genome hg38 using STAR(v 2.7.2b)52. Duplicate read identification and other quality control checks for read alignment were performed using CollectRNASeqMetrics and MarkDuplicates in Picard Tools.53 Reads containing N were split using SplitNCigarReads function in the Genome Analysis Toolkit (GATK)54,55. Bases quality scores were recalibrated using known variants from the GATK resource bundle and with the BaseRecalibrator and ApplyBQSR functions in GATK54,55. Somatic SNV and indels were called with Mutect2 in tumor-only mode54,55. Only variants with at least read depth of 10, 5% allele frequency, read depth of 5 for the alternate allele were kept for analysis. The variants were then filtered for variants in sex or mitochondrial chromosomes, RNA editing sites, repeat masker regions, and variants in Panel of Normal (from GATK) references. Variants were then annotated using the Funcotator function in GATK54,55.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A