Study cohorts and targeted sequencing

We initially targeted 4045 ASD probands from the Chinese population [ACGC cohort described previously (41)] for CSDE1 sequencing. ACGC patients were diagnosed primarily according to DSM-4 and/or DSM-5 criteria, documenting additional comorbid conditions where possible. Peripheral blood DNA from all patients with ASD and their parents, where available, was collected after obtaining informed consent. Genomic DNA was extracted from whole blood using a standard proteinase K digestion and phenol-chloroform method. In the second stage, targeted sequencing was performed on a larger international cohort [described elsewhere (42)] with 10,745 patients with a primary diagnosis of ASD and/or ID/DD. Informed consents from all participants were obtained. This study was approved by the Institutional Review Board (IRB) of Central South University.

Targeted sequencing of CSDE1 was performed using smMIP technology—a highly cost-effective targeted sequencing method (15). In brief, MIPs were designed using MIPgen with an updated scoring algorithm. Amplification of the captured DNA was performed as previously reported (41). Libraries were sequenced using the Illumina HiSeq 2000 platform. Clean reads were aligned against hg19 (GRCh37 reference genome) with BWA-MEM (v0.7.13) (43) after removing incorrect read pairs and low-quality reads. Single-nucleotide variants and indels were called with FreeBayes (v0.9.14) (44). Variants exceeding 10-fold sequence coverage and read quality more than 20 (QUAL > 20) were annotated with SeattleSeq (45) Annotation 138 using reference GRCh37/hg19. LGD variants and rare missense variants in CSDE1 (minor allele frequency < 1% in ExAC) were selected for validation using Sanger sequencing in both patients and parents where available.

