As part of preprocessing, we ensured that each dataset was appropriately normalized and log2-transformed. Unless otherwise noted in the study summary (Supplemental Information), we used all datasets as preprocessed by the original authors. As negative values interfere with the calculation of the geometric mean based iSEXS scores, we raised any gene expression matrix with negative values above zero by adding the absolute value of the minimum value. We ensured that all datasets were log transformed, and performed log-2 transformation if necessary.
In datasets where sex annotation was not available, we imputed sex labels using expression of two Y chromosome genes RPS4Y1 and KDM5D as well as XIST, a lncRNA on the X chromosome that is highly expressed in females and mediates X-inactivation (Toker et al., 2016). We performed K-means clustering with two centroids on the expression values of RPS4Y1, KDM5D, and XIST. We assigned the centroid with higher XIST expression and lower RPS4Y1 and KDM5D expression as female. We removed samples from the analysis if their sex label in the GEO did not match their imputed sex label. Of the 21 datasets with available sex labels, 12 contained at least one mislabeled sample with 87 mislabeled samples total (Supplemental Information: Study Summaries). Microarray probes were mapped to Entrez Gene Identifiers (IDs). If a probe mapped to more than one gene, we expanded the expression of that probe to correspond to all relevant genes (Ramasamy et al., 2008).
We used MetaIntegrator to apply two meta-analysis methods (combining effect sizes and combining p values) as described previously (Andres-Terre et al., 2015, Bongen et al., 2018, Khatri et al., 2013). We calculated effect sizes as Hedges’ adjusted g and summarized using random effects inverse variance model, where each effect size was weighted by the inverse of the variance in that study. We corrected the p values for the summary effect size of each gene for multiple hypothesis testing using Benjamini-Hochberg false discovery rate (FDR) (Benjamini and Hochberg, 1995).
We performed meta-analysis by combining p values using Fisher’s sum of logs (Fisher, 1934). For each gene, we summed the logarithm of the one-sided hypothesis testing p values across k studies and compared the result to a -distribution with 2k degrees of freedom.
Using the method described by Hedges and Pigott, we found our discovery datasets had 90% statistical power at p value of 0.01 for detecting effect size of 0.38, 0.44, 0.54, and 0.78 in the presence of no, low, moderate, or high heterogeneity, respectively (Hedges and Pigott, 2001).
We arbitrarily defined females as “cases” and males as “controls” so that genes with positive effect sizes were expressed at higher level in females (female-associated iSEXS genes) and genes with negative effect sizes were expressed at a higher level in males (male-associated iSEXS genes). We defined the immune Sex Expression Signature (iSEXS) as differentially expressed genes with an absolute effect size ≥ 0.4 and a false discovery rate (FDR) ≤ 5% in the discovery cohorts. We defined the iSEXS score of a sample as the geometric mean of female-associated iSEXS gene expression minus the geometric mean of male-associated iSEXS gene expression. We scaled and centered the iSEXS score values within each dataset (mean = 0, standard deviation = 1).
We grouped the iSEXS genes located on sex chromosomes into XY-iSEXS, and those located on autosomes into Autosomal-iSEXS. We identified chromosome location for each gene in iSEXS using the NCBI Entrez Gene annotations from the biomaRt R package Version 2.30 (Durinck et al., 2009). We obtained the X-escape genes from a previously published study by Tukiainen and colleagues (Tukiainen et al., 2017). We used χ2 test to compute enrichment of X chromosome and Y chromosome genes.
As described previously, we generated immune cell type specific effect sizes, which are available in the MetaSignature database (http://metasignature.stanford.edu) (Haynes et al., 2017, Vallania et al., 2018). These immune cell type specific effect sizes were generated by combining 6,160 samples from 166 gene expression datasets. For each cell-type-gene pair, we computed the immune cell specificity score as Hedge’s g effect sizes of the expression of that gene in that cell type (case) versus all other cell types (controls).
Immune cell type proportions were estimated from bulk blood gene expression using cell mixture deconvolution. We utilized the immunoStates basis matrix and a linear regression model, as described previously (Bongen et al., 2018, Roy Chowdhury et al., 2018, Vallania et al., 2018).
We used t tests to compare the distributions of two groups. To calculate significance of the interactions between age and sex we used two-way Anova with an unbalanced design using Type-III sums of squares. To convert age into a factor individuals were binned into two groups: younger (18-40 years old) and older (50+ years old).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.