The oral hygiene analysis is performed through deep learning–based image processing algorithms including object detection, which determines whether the input image is an oral image and localizes the oral region, and instance segmentation, which extracts the dental plaque regions. In this analysis, datasets categorized into oral and nonoral images are used. The 2000 oral images are taken only via the device and stored on the server, and the 2000 nonoral images are Pascal visual object classes images [23] without teeth or gums. Figure 4 illustrates a flowchart of the oral hygiene analysis.
Flowchart of oral hygiene analysis.
For the object detection, the ground-truth bounding box annotations are first performed, which outline the region of interest (RoI) involving only the teeth and gums for each image. Then the single shot multibox detector (SSD) [24], one of the most popular deep learning models for object detection, is applied to detect the RoI within the images (predicted bounding box). The SSD model training is performed with a training dataset after dividing the datasets into a 2000-image training dataset (1000 oral and 1000 nonoral images) and a 2000-image test dataset (1000 oral and 1000 nonoral images). In order to improve the learning performance, the training dataset is augmented with random sample crop, photometric distortion (random transformations in the color HSV domain), rotation, and mirroring. Then the resolutions of the images in the dataset are all converted to 300×300 pixels, and their RGB values are normalized between 0 and 1.
In order to finely extract the red fluorescence–emitted dental plaque region from the oral image, an instance segmentation technique capable of classification and detection of multiple instances in one class is required.
In this study, the Mask region-based convolutional neural network (R-CNN) [25], which is a Faster R-CNN [26] with the addition of a small fully convolutional network that can act as object detection and mask segmentation for each RoI, is used as an instance segmentation technique. The RoI images extracted through the SSD model are used as the input data for Mask R-CNN model training.
Pixel-level annotation for Mask R-CNN training was performed on plaque areas emitting red fluorescence in tooth images according to three criteria:
Connected components labeling, which groups its pixels into components based on pixel connectivity, is performed on plaque areas
Plaque areas that span several teeth are divided at tooth boundary
Plaque thinly distributed between teeth is classified as one instance without following the second criterion
The images are technically prepared in the Common Objects in Context (COCO) data format [27] and augmented with rotation and aspect ratio conversion.
The Mask R-CNN model uses a model pretrained with COCO data as an initial parameter, and the training of the model is performed by the loss function–based stochastic gradient descent method.
This research protocol was approved by the institutional review board (IRB #ERI19046), Seoul National University Dental Hospital. In order to protect users’ privacy information, no personally identifiable information such as name, age, or gender are included in the image, and except for the mouth and the device, the visible parts of the image were mosaicized to make them indistinguishable.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.