Data pre-processing and feature extraction

Wonju Seo; You-Bin Lee; Seunghyun Lee; Sang-Man Jin; Sung-Min Park

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Data pre-processing and feature extraction

WS Wonju Seo

YL You-Bin Lee

SL Seunghyun Lee

SJ Sang-Man Jin

SP Sung-Min Park

This method is extracted from research article: BMC Med Inform Decis Mak, Nov 2019

A machine-learning approach to predict postprandial hypoglycemia

DOI: 10.1186/s12911-019-0943-4

Ask a question

Favorite

Each CGM time series was presented as a sequence where the i^th CGM time series is given by:

where N_i is the length of CGM_i,:. For each time series, missing CGM data points were interpolated by the spline method [20] only if less than 3 CGM data points were missing consecutively. The missing CGM data points were reported when the device fails its calibration process [32]. The CGM measurement is taken at every 5 min, and thus CGM_i,t=n means that the CGM data point at 5 ×n^th minute of the i^th CGM time series. In our study, we took CGM data points after meal announcements and each CGM data point is represented in Eq. 2.

where meal_i,j is the time of the j^th meal announcement of the i^th CGM time series, and W is the postprandial period.

We first analyzed the CGM trends of all selected patients’ data to identify meaningful features for postprandial hypoglycemia. A subset of patients experienced postprandial hypoglycemia if they had a small peak or no peak in CGM, probably due to the meal being small or containing only a small portion of carbohydrate (Fig. 1a and b). There was another group of patients experienced hypoglycemia when the CGM increased steeply and then dropped right after the peak; this reaction probably occurred when the patients ingested carbohydrates with high glycemic index or when the pre-meal rapid-acting insulin was injected too late (Fig. 1c). Insulin injected before a preceding meal can affect a glucose level after the meal. In other cases, a decrease in CGM, in spite of meal ingestion, may have been caused by the insulin on board and was associated with future hypoglycemic episodes (Fig. 1d).

Representative CGM time-series data to show different reactions of selected patients’ glucose levels after meals. Blue line: CGM time-series data points; red line and transparent red box: CGM data point <3.9 mmol/L (70 mg/dL); magenta filled circle: CGM data point at the meal; red filled circle: peak CGM data point after the meal; green filled circle: CGM data point at the time of prediction. Clinical explanations: a No peak of CGM data point could occur because the patient ate a small amount of carbohydrates in the meal. b Low peak after the meal, then rapid fall in glucose could occur because patient ate a small amount of carbohydrates in the meal. c Steep peak, then rapid fall in glucose could occur when the patient ate foods rich in carbohydrate with high glycemic index or injected rapid-acting insulin later than he or she should have. d A rapid fall and then no peak after the meal could occur when the insulin injected before the previous meal is still active (insulin on board)

We used above observed data points to define features for predicting hypoglycemia near mealtime. The first feature is defined as ‘the rate of increase in glucose’ (RIG), which is the rate of glucose increase from a meal to a peak:

where $CG M_{i, j, pea k_{t}}$ is the highest CGM data point between the time of the j^th meal announcement of the i^th CGM time series and prediction time t, CGM_i,j,0 is a CGM data point at the j^th meal announcement, and TD_{meal−to−peak} is time difference between the meal announcement to the peak. The RIG is updated until the peak CGM data point is found after the meal announcement. If there is no peak CGM data point, the RIG is set to 0. According to the definition of the RIG, zero implies that there is no increase in glucose after the meal.

Since the change in CGM data points is large before hypgolycemia occurs (Fig. 1), we defined the second feature glucose rate of change (GRC) as:

where CGM_i,j,t is a CGM data point at the time of prediction from the j^th meal announcement of the i^th CGM time series, and CGM_i,j,t−1 is the CGM data point immediately prior to the time of prediction. Since the GRC calculates the near-instantaneous changes in CGM data points around the time of prediction, it can be used to predict hypoglycemia [26, 33]. The third feature is defined to be the CGM data point at the time of prediction (CGM_i,j,t) from the j^th meal announcement of the i^th CGM time series. To define labels, we took into account the presence of a hypoglycemia alert value [34, 35] at the 30-min prediction horizon (i.e., CGM_i,j,t+6). If CGM_i,j,t+6< 3.9 mmol/L (70 mg/dL), we set Label_i,j,t=1. Otherwise, we set Label_i,j,t=0 (Fig. 2).

The three features and the 30-min prediction horizon. Blue line: CGM time-series data points; red line: CGM data point <3.9 mmol/L (70 mg/dL); magenta filled circle: CGM data point at the meal; red filled circle: peak CGM data point after the meal; green filled circle: CGM data point at the time of prediction; black arrow: rate of increase in glucose (RIG); red arrow: glucose rate of change (GRC); transparent yellow box: observational window; transparent green box: the 30-min prediction horizon

We obtained all available CGM data points between 5 min and 3.5 h post mealtime announcements (i.e., from CGM_i,j,1 to CGM_i,j,42). The corresponding hypoglycemia alert values that occur from 35 min to 4 h after meal announcements were included (i.e., from Label_i,j,1 to Label_i,j,42). Although postprandial hypoglycemia can occur later than 4 h after each meal, we chose the window of 35 min to 4 h after the meal because including longer duration after the meal to this time window decreases the prediction accuracy of the algorithm. Since there are already well-established algorithms for predicting fasting or nocturnal hypoglycemia [25, 36], a clinical need of a dedicated algorithm for predicting postprandial hypoglycemia would be most important during the first 4 h after each meal, which is typically difficult to cover using the existing nocturnal hypoglycemic prediction algorithms developed in the setting of gradual changes of blood glucose levels.

The data processing and the feature extraction were performed using the following steps : First, from the i^th CGM time series, the j^th meal announcement is selected and the CGM data points from CGM_i,j,1 to CGM_i,j,42 were sampled. Second, from the sampled series, CGM_i,j,t,RIG_i,j,t, and GRC_i,j,t features were extracted while increasing t from 1 to 42. The label information is obtained from the CGM data point with the 30-min prediction horizon (i.e., CGM_i,j,t+6).

The first and second steps were repeated for 107 CGM time series around mealtimes, and obtained samples : D={(CGM_i,j,t,RIG_i,j,t,GRC_i,j,t,Label_i,j,t) with i=1,...,107, j=1,...,M_i, and t=1,...,42}, where M_i is the total number of meal announcements of the i^th CGM time series. Before training our models, each feature values extracted were normalized with a MinMax Scaler.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol