2.6. Clinical Accuracy Assessment

QW Quanzeng Wang
YZ Yangling Zhou
PG Pejman Ghassemi
DM David McBride
JC Jon P. Casamento
TP T. Joshua Pfefer
ask Ask a question
Favorite

The clinical accuracy of IRTs can be evaluated in two ways. One way is to see whether IRTs can accurately measure body temperature in a specific temperature range, called temperature measurement accuracy in this paper. The other way is to see whether IRTs can screen out subjects with EBT from those without EBT, called diagnostic performance in this paper.

We evaluated the temperature measurement accuracy of IRTs using several different approaches. Since there is no standard that covers clinical study data analysis for IRTs, standards for thermometers were used to inform our methodology. The standards ISO 86601-2-56:2017 [34] and ASTM E1965-98:2016 [33] implement three key metrics: clinical bias (Δcb), standard deviation (SD) of Δcb (σΔcb), and clinical repeatability (σr). Δcb is the mean difference between Toral and Tref values for all subjects in the testing set. It shows systematic error of the devices under test. Measurement precision was evaluated using σΔcb, which is based on the SD of differences between Toral and Tref. A value equal to 2 × σΔcb is often called the limit of agreement (LA), as it shows the magnitude of potential disagreement between outputs of two devices when used on the same human subject. Difference plots are used to illustrate Δcb and σΔcb.

Root-mean-square (RMS) difference (Arms=1ni=1n(ToralTref)2, where n is the number of subjects) between Toral and Tref, is another metric used to assess clinical measurement accuracy in medical devices [51]. While Arms will not indicate the direction of error (e.g., overestimate or underestimate) and error distribution, it does quantify the cumulative magnitude of error. We implement it here to provide a single accuracy metric that combines the impact of bias and precision, as well as to ensure that positive and negative local bias values do not cancel out to give an erroneous impression of strong performance, as can occur with Δcb.

Regression analysis [50] can also provide useful insight into the quality of temperature measurements. We generated scatter plots of Toral against Tref and fit linear trendlines to the data; these curves were then compared with the ideal (i.e., Toral = Tref). Pearson correlation coefficients (r values) were also obtained to quantify the degree of linear correlation between Toral and Tref.

In addition to methods focused on temperature measurement accuracy, we also implemented diagnostic performance assessment techniques to evaluate fever screening effectiveness for each IRT. These analyses involved calculation of sensitivity (true positive rate, Se = TP/P, where TP and P represent true positive and condition positive respectively) and specificity (true negative rate, Sp = TN/N, where TN and N represent true negative and condition negative respectively). The focus of this approach is to determine whether febrile subjects can be detected given specific reference temperature thresholds (Tthresh). The value for Tthresh was set to 37.5 °C to define P (Tref  > Tthresh) and N (Tref  < Tthresh) for fever screening [2,27]. We also defined a cutoff temperature (Tcut) to determine positive or negative results based on Toral. Based on the P, N, predicted P (Toral  >  Tcut) and predicted N (Toral  <  Tcut) for all subjects, TP (Toral  >  Tcut and Tref  > Tthresh) and TN (Toral  <  Tcut and Tref  < Tthresh) were obtained to calculate Se and Sp. At each Tcut, a pair of Se/Sp values were determined. An ROC curve for each facial temperature location was generated from 1000 Tcut values equally spaced between 30 °C and 40 °C. The area under the ROC curve (AUC), an effective and combined measure of Se and Sp, was calculated to provide an aggregate measure of performance, where a maximum AUC of 1 indicates perfect diagnostic performance in differentiating diseased with non-diseased subjects [52,53]. The value of (1Se)2+(1Sp)2, notated as dSeSp, indicates the distance between the coordinate points of (1 − Sp, Se) and (0, 1), the perfect 1 − Sp and Se values [52]. The smaller the dSeSp value, the better the performance. The value of dSeSp at Tcut  = Tthresh  = 37.5 °C was used to evaluate the fever screening performance.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A