GTEx v7 was used for external replication21. We downloaded the GTEx genotype data from dbGaP (accession phs000424.v7.p2) and imputed C4 alleles in samples of European ancestry according to genetic principal component analysis. We obtained transcript-level counts from www.gtexportal.org and derived gene-level counts using tximport package in R. Briefly, RNA-seq reads were aligned to the hg19 reference genome with STAR 2.4.2a and transcript-level counts quantified with RSEM v1.2.22. We started with samples and features that were used for GTEx eQTL analyses. We then dropped samples from non-brain tissues and tissues with different sample preparation (i.e. cortex and cerebellar hemisphere). We also dropped samples with a history of disease possibly affecting the brain prior to filtering for features with CPM > 0.1 in at least 25% of samples. Gene-level counts were then normalized using TMM normalization in edgeR and log2-transformed to match PsychENCODE. Each brain region was then assessed for outlier samples, defined as those with standardized sample network connectivity Z scores < −3, which were removed. These quality control steps resulted in 20,765 features based on Gencode v19 annotations and 920 samples across ten brain regions, out of which 540 samples were imputed for C4 alleles.
We next regressed out biological and technical covariates except region and subject terms using a linear mixed model via lme4 package in R. We entered region, age, sex, 13 seqPCs (top 13 principal components of sequencing QC metrics from RNA-SeQC), RIN, ischemic time, interval of onset to death for immediate cause, Hardy Scale, body refrigeration status as fixed effects and subject as a random intercept term. To evaluate the relationship between several non-genetic factors and C4A gene expression, we added 3 genetic PCs, brain pH, and a covariate of interest (e.g. BMI, weight, height, smoking status, or drinking status) as fixed effects to the above model. Significance was assessed by the likelihood ratio test (LRT) of the full model with the effect in question against the null model without the effect in question.
Due to the relatively limited sample size of GTEx (i.e. less than 10 samples for CN < 2 and CN > 2 in each brain region), we focused on samples with two C4A copy number in subsequent analyses. We constructed a C4A-seeded network using frontal cortical samples (N = 36) and combined this with the above PsychENCODE control-only network (N = 145) using the Olkin-Pratt (OP) fixed-effect meta-analytical approach as implemented in metacor R package.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.