Comparisons with MOFA+ and totalVI

YH Yuhan Hao
SH Stephanie Hao
EA Erica Andersen-Nissen
WI William M. Mauck, III
SZ Shiwei Zheng
AB Andrew Butler
ML Maddie J. Lee
AW Aaron J. Wilk
CD Charlotte Darby
MZ Michael Zager
PH Paul Hoffman
MS Marlon Stoeckius
EP Efthymia Papalexi
EM Eleni P. Mimitou
JJ Jaison Jain
AS Avi Srivastava
TS Tim Stuart
LF Lamar M. Fleming
BY Bertrand Yeung
AR Angela J. Rogers
JM Juliana M. McElrath
CB Catherine A. Blish
RG Raphael Gottardo
PS Peter Smibert
RS Rahul Satija
request Request a Protocol
ask Ask a question
Favorite

In order to assess the performance of our WNN method alongside other recently proposed multimodal integration tools, we compared the results of WNN, Total Variational Inference (totalVI version 0.6.7) (Gayoso et al., 2019) and Multi-omics factor analysis v2 (MOFA+ version 1.1) (Argelaguet et al., 2020), on the BMNC dataset. We followed the recommended settings and workflows for both methods, and further describe parameter choices below.

For totalVI, we use the RNA and ADT counts matrices as input. We use the subsample_genes function to select 4000 variable genes, and used 500 epochs for model training, as suggested in the totalVI tutorial (https:// scvi-tools.org/en/stable/tutorials/totalvi.html). All other parameters were set to default settings. We identified nearest neighbors, and performed UMAP visualization on the learned latent space.

For MOFA+, we used the same normalization method as Seurat to facilitate direct comparison. As recommended in the MOFA+ tutorial (https://raw.githack.com/bioFAM/MOFA2_tutorials/master/R_tutorials/10x_scRNA_scATAC.html), we used the z-scored data (‘scaled’ data) from the two assays as view1 and view2 for MOFA+. All other parameters were set to default or recommended settings. We identified nearest neighbors, and performed UMAP visualization based on the learned factors.

The UMAP plots in Figures S2A and S2B show the results of all three methods (we also include independent RNA and protein analyses in Seurat for comparison). The plots show that the methods generally reveal similar sets of cell types, but with important differences. For example, regulatory T cells, defined by CD25 expression, are only separated in the WNN UMAP. Figure S2B demonstrates that this is due to the fact that CD25+ cells only form a distinct cluster in WNN analysis.

In order to move beyond visualization and quantify the performance of each method, we averaged the CD25 expression level for the calculated multimodal neighbors of each cell, returning a vector of predicted values. We quantified the performance of the method using the correlation (Pearson; Figure 2D, Spearman; Figure S2), between predicted and measured values. For CD25, WNN analysis achieved the highest correlation, as cells that are CD25+ are correctly identified as neighbors with other cells that are CD25+ in the dataset. We repeated this analysis for all protein features, and found that, WNN analysis consistently achieved the highest correlation. We repeated the analysis for all transcriptomic features as well (Figure S2) and observed similar performance for all methods. We note that transcriptomic correlations were also much lower, likely due to the substantial technical noise inherent to scRNA-seq data.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A