Gaussian process assumption

YA Yuri Ahuja
JW Jun Wen
CH Chuan Hong
ZX Zongqi Xia
SH Sicong Huang
TC Tianxi Cai
request Request a Protocol
ask Ask a question
Favorite

We assume the dense representations of patients’ EHRs (i.e. patient embeddings) over time follow a Gaussian process:

Since the feature embeddings V are engineered to approximately follow a multivariate normal distribution as described in the Producing feature embeddings section of the Supplementary Materials, it is reasonable to assume Xi,t to be a Gaussian process over time t. We further specify the mean and covariance functions μi(t) and Σit respectively. For some parameters θGP={μ0,μ1,μ2,μ3,μ4,μ5,μH,μYH,σk,αk,τk,ρkl,k=1,,p;l=1,,p}, we assume:

In summary, we assume that patient i’s expected embedding at time t, μi(t), is a function of Yit, Hi, and t. We assume that the marginal variance of embedding component k can be represented by some baseline σk2 scaled by Hi. We denote the correlation between embedding components k and l as ρkl, which we assume to be constant over time. Between timepoints, we employ a first-order univariate autoregressive (AR(1)) kernel structure such that the residual at t, ϵi,t,k=Xi,t,k-E(Xi,t,k|Yi,Hi), is a linear function of its preceding value ϵi,t-1,k with autocorrelation coefficient τk:

r[0,1] is an autoregression regularization hyperparameter separately tuned via fivefold cross-validation maximizing the AUROC of Yi,t predictions: r=0 ignores intertemporal correlation while r=1 denotes undampened autoregression. We chose first-degree autoregression over higher-degree models due to computational ease and mitigation of overfitting. We provide a sensitivity analysis with respect to the choice of k-fold cross-validation in Supplementary Fig. S2 that demonstrates no significant effect of k on predictive accuracy.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A