Multi-omics correlation analysis of pathological signatures

RC Rui Cao
FY Fan Yang
SM Si-Cong Ma
LL Li Liu
YZ Yu Zhao
YL Yan Li
DW De-Hua Wu
TW Tongxin Wang
WL Wei-Jia Lu
WC Wei-Jing Cai
HZ Hong-Bo Zhu
XG Xue-Jun Guo
YL Yu-Wen Lu
JK Jun-Jie Kuang
WH Wen-Jing Huan
WT Wei-Min Tang
KH Kun Huang
JH Junzhou Huang
JY Jianhua Yao
ZD Zhong-Yi Dong
ask Ask a question
Favorite

The occurrence histograms in the PALHI and the TF-IDF feature vector in BoW were the pathological signatures generated by our model. The importance of each signature was measured by its contribution weight to the final WSI-level prediction for discovering top pathological signatures. The top pathological signatures were evaluated by Wilcoxon Rank Sum tests for significance and then sent for genomic and transcriptomic correlation analysis.

The DNA mutation profile of TCGA-COAD was retrieved from cBioPortal 28. The synonymous mutations were excluded from the following correlation analysis. For a particular gene set, as long as there was a non-synonymous mutation in any of its gene members, it would be defined as deficient.

The relationship between MSI and some mutation indexes has been reported in previous literature, including INDEL and tumor mutation burden (TMB) 29. INDEL mutations refer to a variant type caused by sequence insertion (INS) or deletion (DEL) and can be calculated as the frequency of DEL and INS mutations. As the mutation data was profiled by the whole exome sequencing, TMB is defined and calculated as the total number of somatic nonsynonymous mutations divided by size of the exonic region of the entire genome 30. To explore the relationship between the pathological signatures and these known genomic biomarkers, they were first normalized to a range of 0 to 1 and then visualized in a heat map using the R package pheatmap, during which unsupervised clustering was applied using Ward's minimum variance method.

The mRNA expression profile of TCGA-COAD, retrieved from cBioPortal, was normalized using the RSEM method 31. Gene co-expression network analysis (WGCNA) is a bioinformatics method based on expression data and is typically used to identify gene modules with highly synergistic changes 32. We first constructed a gene co-expression network for the mRNA expression profile using the R package WGCNA, during which the soft threshold for the network was set to the recommended value selected by the function pickSoftThreshold (Figure S1). Setting the minimum module size to 100 and other parameters to default, we identified 24 transcriptomic modules (Figure S2). The biological functions of the modules were annotated by the Gene Ontology (GO) over-representation test using the R package clusterProfiler 33, during which the Benjamini-Hochberg method was used to adjust P value for controlling false discover rate. Only those GO terms with adjusted P values lower than 0.05 were considered significantly enriched in a particular module. After that, we calculated Spearman's rank correlation coefficients for each pair of modules and pathological signatures to recognize the modules of interest.

An immune cytolytic activity (CYT) score, defined as the geometric mean of transcript levels of GZMA and PRF1 34, as well as a CD8+ T-effector gene set (CD8A, IFNG, GZMA, PRF1, CXCL9, CXCL10, TBX21, GZMB) 35 was quantified from the RNA-seq data, and subsequently associated with pathological signatures to characterize the correlation with anti-tumor immunity.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A