Whole exome sequencing of paired normal-tumour samples

YX Yun Xu
KL Kai Liu
CL Cong Li
ML Minghan Li
XZ Xiaoyan Zhou
MS Menghong Sun
LZ Liying Zhang
SW Sheng Wang
FL Fangqi Liu
YX Ye Xu
request Request a Protocol
ask Ask a question
Favorite

In order to perform comprehensive genomic and transcriptomic analyses, we carefully reviewed the tissue specimens from the Tissue Bank of our hospital. Frozen and paraffin-embedded tissue were available in 35 pMMR/MSI-H cases, for paired normal-tumour WES alongside RNA sequencing (Figure S2a).

Genomic DNA was extracted from tumour tissues and lymphocytes using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany). The DNA concentration was subsequently quantified using the Qubit 3.0 (Thermo Fisher Scientific, Inc., Waltham, MA, USA), following the manufacturer's instructions. Library preparation was achieved using the SureSelect Human All Exon Kit V6 (Agilent Technologies, Santa Clara, California, USA), adhering to the manufacturer's protocol. The quality of the captured libraries was assessed using the Agilent 2100 Bioanalyzer (Agilent Technologies) before being sequenced on the NovaSeq 6000 system (Illumina, San Diego, California, USA), according to the manufacturer's guidelines.

Raw sequencing reads were preprocessed by trim_galore (http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) for subsequent analysis: (1) adapter trimming: (2) remove the reads in which the N base has reached a certain percentage (default length of 8 bp); (3) remove the reads which contain low-quality bases (default quality threshold value ≤ 20) above a certain portion (default 40%); (4) sliding window trimming: the bases in the sliding window (default is 4 bp) with mean quality below cutting quality (default is 20) will be cut. The cleaned reads were aligned to the reference human genome (build hg38, https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/) using Sentieon bwa-mem.16 Subsequent processing including sorting reads and marking duplicates were performed according to best practices of the GATK Toolkit v417,18 (https://gatk.broadinstitute.org/hc/en-us). Sequence depth and coverage were obtained using qualimap.19 To identify all the variants, we used two somatic mutation callers for single nucleotide variants (SNVs) and indels: Mutect220 and Strelka2.21 To improve specificity, a panel of normal sample filtration was used to remove background germline variations and artifacts. Mutect2 was based on bam files which were processed by quality score recalibration that was performed using GATK4 (v 4.1.1.0). Somatic mutations were then annotated using VEP.22 To obtain reliable mutation calls, we used a two-step approach. First, chose mutations that were identified in both of the two callers (Mutect2 and Strelka2). Second, additional filtering with three criteria was performed: (1) variant allele frequency (VAF) ≥ 8%; (2) sequencing depth in the region ≥ 8; (3) sequence reads in support of the variant call ≥ 2. Tumour mutation burden (TMB) was defined as the number of somatic mutations per Mb by pyTMB (https://github.com/bioinfo-pf-curie/TMB). Samples with over 10 muts/Mb were labelled as TMB-H.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A