Whole-exome sequencing was performed as previously described (Nissen et al., 2018). Variant call files (VCF) were filtered using the approach described in Supplementary Method 1. Figure 1 illustrates the filtering process. To identify the specificity of variants found within the cohort, we used the same filter settings in a cohort of HSE patients.
Whole-exome sequencing (WES) filtering diagram. In brief, we included exonic variants that were predicted to be rare (present in <0.1% of the reference genomes), excluded variants with CADD score <15 (Kircher et al., 2014), and included variants with frameshift, in-frame insertions or deletions (indels) or stop codon change (No CADD score available for these types of variants). Lastly, we included genes with high evolutionary conservation with a phyloP p-value less or equal to 0.01. The variant filtering was verified by random sampling, and all BAM files of the variants identified in Ingenuity Variant Analysis (IVA) were manually examined in order to include only variants with high sequencing quality. BAM files were evaluated by use of the UCSC genome browser or IGV.
The bioinformatics include two steps: evaluation of the identified variants, and evaluation of the variant-containing genes. First, following the identification of variants by Ingenuity, we included additional tools in order to eliminate variants not contributing to the phenotype. The SIFT (Kumar et al., 2009) and PolyPhen-2 (Adzhubei et al., 2010; Andersen et al., 2017) score were extracted from the IVA analysis. The PolyPhen-2 score predicts the possible impact of an amino acid substitution on the structure and function of a human protein. This score represents the probability that a substitution is damaging (Table 1). A SIFT score predicts whether an amino acid substitution affects protein function. The SIFT score ranges from 0.0 (deleterious) to 1.0 (tolerated) (Supplementary Table S3). Second, the Mutation Significance Cut-off (MSC) (with a 99% confidence Interval with HGMD Database Source) (Belov et al., 2003) was calculated for each variant-containing gene in order to include genes with a high phenotypic effect (possibly damaging) defined as any combined annotation dependent depletion (CADD) score greater than the MSC. The importance of this quantitative approach with gene-level and gene-specific cut-off values improves the use of variant-level methods, CADD in particular. Third, in order to estimate genetic intolerance, variant-containing genes were included based on function, ExAC RVIS (Residual Variation Intolerance Score based on exome aggregation consortium data) (Petrovski et al., 2013), gene damaging index (GDI) (Itan et al., 2015), ExAC missense Z score and LoF PLI (Lek et al., 2016; Supplementary Table S3). We excluded genes with 3 or more scores that did not predict the variant to be disease-causing or the gene to be intolerant, unless the remaining values including the function were highly relevant to the phenotype of PPM.
Rare variants identified in patients with paralytic poliomyelitis.
In addition, we performed an independent second layer of analysis by searching for single nucleotide polymorphisms (SNPs) in IFNAR1, TLR3, and IFIH1, which have been associated with increased susceptibility to enterovirus infections. In this analysis we did not apply a filtration pipeline, except for a biological filter including IFNAR1, TLR3 and IFIH1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.