Data Preprocessing

MD Maria Angela Diroma
AM Alessandra Modi
ML Martina Lari
LS Luca Sineo
DC David Caramelli
SV Stefania Vai
request Request a Protocol
ask Ask a question
Favorite

The complete bioinformatics pipeline is shown in Figure 1. All the guidelines for command line tools used are provided as Supplementary Data. After quality check by FastQC1 (RRID:SCR_014583, v0.11.7), paired-end sequencing data in FASTQ format were first merged using Clip&Merge function (v1.7.6) from EAGER software (v1.92.37) (Peltzer et al., 2016), which also allowed to remove adapters from both paired- and single-end reads. Sequences with read length < 30 were discarded (–l 30), minimum base quality for quality trimming was set to 30 (–q 30). Sequences were clipped also when one nucleotide aligned with adapters (–m 1).

Computational pipeline for ancient mitochondrial DNA (mtDNA) analysis. Our computational pipeline comprises five main steps: (1) read alignment and preprocessing and postprocessing; (2) contamination analysis by schmutzi and consensus sequence assembly; (3) variant calling by GATK Mutect2, variant filtering, and consensus sequence assembly; (4) haplogroup prediction; (5) variant annotation. The alignment required revised Cambridge Reference Sequence (rCRS) as reference sequence to get a suitable input to schmutzi, while we used mtDNA reads aligned onto the whole genome (hg19 with rCRS as a mitochondrial reference sequence) for variant calling. AD10, minimum variant allele depth = 10; AF50, minimum allele fraction = 50%; RD10, minimum reference allele depth = 10.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A