A total of 1,141 unrelated, self-reported Chinese were enrolled for exome sequencing for rare disease diagnosis or complex disease research from 2012 to 2019. Exome sequencing was performed on genomic DNA derived from peripheral blood or buccal mucosa by Illumina sequencing platforms, and different exome capture kits were used (S1 Table). The processing of raw exome sequencing data is described in detail in the Supplementary Methods (S1 Text). Briefly, variant calling was performed using a pipeline based on the Genome Analysis Toolkit (GATK), and human leukocyte antigen (HLA) typing was performed using HLA typing from High-quality Dictionary (HLA-HD) [14,15]. The exome sequencing dataset was subjected to stringent quality control (QC) procedures at the sample, variant, and genotype levels and the output data were annotated using wANNOVAR [16]. To avoid over-representation of disease-associated variants, the samples collected from subjects with respiratory diseases and neuromuscular disorders were removed for CFTR and RYR1 analysis, respectively. In this study, a rare variant was defined as a variant having a Genome Aggregation Database (gnomAD) global allele frequency (AF) <1%. A missense variant was considered deleterious when it possessed a Phred-scaled Combined Annotation Dependent Depletion (CADD) score ≥20 [17], or Rare Exome Variant Ensemble Learner (REVEL) score ≥0.7, or PREDICT score ≥0.6; whereas a loss-of-function (LoF) variant was considered deleterious when it possessed a Phred-scaled CADD score ≥20 or a Loss-Of-Function Transcript Effect Estimator (LOFTEE) of “high-confidence” [18–20].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.