Estimating TR activity by network component analysis (NCA)

KO Karin Ortmayr
SD Sébastien Dubuis
MZ Mattia Zampieri
request Request a Protocol
ask Ask a question
Favorite

Originally established by Liao et al.29, NCA provides a mathematical framework for reconstructing TR regulatory signals (TR activity) from gene expression profiles. Here, we adopted sparseNCA implementation by Noor et al.67 (Matlab code downloaded from https://sites.google.com/site/aminanoor/softwares). This methodology adopts a mathematical model to approximate TR-target regulatory interactions and integrates prior network information with the expression of target genes across multiple conditions to regress the activity of the respective TRs, delivering a relative measure of TR activity. We obtained normalized gene expression profiles across the NCI-60 cell lines from Gene Expression Omnibus (accession number GSE32474), containing 54,675 mRNA probes. TRRUST database24 served as the source of TR-target gene interactions relevant in human, including 748 human TRs and 1975 non-TR gene targets. Intersecting these two resources, we assembled a network of 2209 unique genes corresponding to 5490 mRNA probes that match target genes of 728 TRs in the TRRUST database (Supplementary Fig. 5). We implemented a bootstrapping approach to account for incompleteness of the regulatory network (i.e., missing regulatory interactions), and the fact that there may be multiple optima in the solution space. Of note, even if the current knowledge of TR-target genes is incomplete, few gene targets can be sufficient to estimate TR relative activities using this approach. To this end, for each TR we randomly selected 48 additional TRs and constructed a sub-network containing the 49 TRs and their target genes. Because growth-rate has a pleiotropic effect on gene expression, here reflected in the correlation between first principal component of gene expression data and cell line growth rates (Supplementary Fig. 5), we decouple TR activity from the confounding effect of growth-rate by adding an additional TR that targets all genes. This fictitious TR mimics the general effect of proliferation rates on transcription. As a result, each TR is embedded in a sub-network of 50 TRs and their target genes from the full network. Ten such subnetworks were created randomly for each TR to apply NCA. In this bootstrapping scheme, each TR was sampled in on average 490 subnetworks (permutations, min. 423, max. 556 data points per TR). In the final data set, we normalized the calculated TR activity to the maximum across all permutations, and finally calculated the median TR activity and its standard deviation for each TR and cell line (Supplementary Fig. 5). It is worth noting that the estimates we obtain with this approach are correct within an unknown scaling factor, and hence we determine a relative measure of activity for each of the 728 TRs across the NCI-60 cell lines (Supplementary Data 2).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A