FASTQ files were trimmed to remove low-quality reads using PRINSEQ (42) and aligned to the most likely inferred ancestor of MTBC (20) using BWA-MEM algorithm (43). Alignments with less than 20× mean coverage per base were filtered out. The variant calling was performed using SAMtools and VarScan (44). Because of the low variability found in M. tuberculosis, to avoid mapping errors and false SNPs, a variant was filtered out if (i) it was supported by less than 20 reads, (ii) it was found in a frequency of less than 0.9, (iii) it was found near indel areas (10-bp window), or (iv) it was found in areas of high accumulation of variants (more than three variants in a 10-bp defined window). Variants were annotated using SnpEff (45). Variants present in PE/PPE genes, phages, or repeated sequences were also filtered out, as they tend to accumulate SNPs owing to mapping errors. High-quality variant calls were combined in a nonredundant variant list and used to retrieve the most likely allele at each strain to generate a variant alignment.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.