Bayesian network input genes

JR Jordan N Reed
JH Jiansheng Huang
YL Yong Li
LM Lijiang Ma
DB Dhanush Banka
MW Martin Wabitsch
TW Tianfang Wang
WD Wen Ding
JB Johan LM Björkegren
MC Mete Civelek
request Request a Protocol
ask Ask a question
Favorite

Ideally, we would probe gene-gene interactions at a genome-wide level, but Bayesian network construction is computationally intensive, and therefore, we limited this analysis to a subset of <10,000 genes that are more likely to regulate body fat distribution (Fig 1). We prioritized putative regulators of body fat distribution using three strategies: (1) genes whose expression are co-expressed with others in adipose tissue, (2) genes proximal to body fat distribution GWAS loci (Pulit et al, 2019), and (3) genes that are putatively regulated by the transcription factor KLF14 (Small et al, 2018).

For a gene to be connected to others in a co-expression network, it must be expressed in the measured dataset, must vary between samples, and must be correlated with the expression of other genes. These properties are optimal for Bayesian network construction and can indicate gene function in the tissue of interest; therefore, we constructed adipose tissue co-expression networks for all eight datasets and identified genes connected to the corresponding STARNET and GTEx networks. We used the python package iterativeWGCNA (Greenfest-Allen et al, 2017 Preprint) to obtain modules of co-expressed genes in each dataset. Weighted gene co-expression network analysis (Langfelder & Horvath, 2008) uses correlations found within the data to determine which groups of genes are highly correlated and likely co-regulated. First, we computed the correlations between all genes. We raised these correlation coefficients to an empirically determined power to increase the differences observed. Next, we performed hierarchical clustering on the correlation matrix to define modules of highly correlated genes. We then assessed the success of this clustering, and iteratively reassigned genes to the modules in which they fit best. Lowly expressed or uncorrelated genes were not assigned a module. We identified which genes were assigned to modules in each of the eight datasets. We then compared the GTEx and STARNET module assignments for each depot and sex; we found genes assigned to modules in both datasets in the four depot and sex groups. We then took the union set of these four genesets as the co-expressed geneset, contained 7,928 genes and made up the bulk of the input to Bayesian network construction (Table S1).

We have previously demonstrated that KLF14 expression regulates fat distribution in both female mice and humans (Yang et al, 2022). SNP rs4731702 is significantly associated with KLF14 expression in cis in adipose tissue of multiple cohorts (Civelek et al, 2017; Small et al, 2018). The same variant is also associated with the expression of 385 genes across the genome in trans- (Table S1). We hypothesized that KLF14’s effect on fat distribution is mediated by the genes it regulates, and we included 385 KLF14 putative target genes in the input geneset.

The largest WHRadjBMI GWAS meta-analysis to date was performed on primarily European ancestry and discovered 346 loci associated with WHRadjBMI (Pulit et al, 2019). Multiple sources have determined that the functional gene is the nearest gene to the locus in ∼70% of cases (Nasser et al, 2021), so we identified 443 genes overlapping or nearest to the lead SNP (and SNPs with LD r2 > 0.8) of 346 WHRadjBMI GWAS loci using haploReg (Ward & Kellis, 2016). Further, we used two studies that identified high-quality candidate genes using colocalization methods (Civelek et al, 2017; Raulerson et al, 2019), where the SNPs that affect association with WHRadjBMI also affect the expression of 59 candidate genes, which are more likely to be functional (Table S1). In total, we considered this combined set of 495 genes as WHRadjBMI GWAS genes in this study. Whereas this set does not contain all possible causal genes, it is likely enriched for them.

The union set of weighted gene co-expression network analysis module genes, KLF14 targets, and putative GWAS genes made up the input to Bayesian Network construction. For each dataset, the 8,492 gene expression values were discretized into “low” “medium” and “high” bins using k-means clustering.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A