To complete the DRoSB database and to compare the performance of Mask-RCNN with U-Net networks, we added the cBaD 2017 database provided for the ICDAR 2017 conference competition [4], the DIVA-HisDB, and the HOME-Alcar database for the training of Mask-RCNN. These databases are diversified, draw on a large number of historical texts, and allow us to have a wide variety of annotations. Boillet et al. [17] performed an in-depth comparative study of the use of U-Net on a large number of databases. Their results made it clear that the cBaD database of ICDAR 2017 is the most discriminative for the segmentation of historical texts, especially the READ-Complex subset. This justifies our choice of this dataset, although it contains very poor segmentation data, as can be seen in Figure 5. In fact, the lines are segmented with rather imprecise bounding boxes, with cutting letters going up and down. Even if it is sufficient to detect the baseline, it remains unsuitable for extracting lines of text for transcription. Therefore, we also chose DIVA-HisDB, which provides more complex polygons, very close to the line, but not the closest. For this reason, to match the annotation style of DIVA, we applied our LMP algorithm described in Section 3.3.3 to the Mask-RCNN output masks. With this method, we obtained masks as close as possible to the text lines recognized by Mask-RCNN. We also applied a dilation to the masks (with a kernel of 2 × 8 and 4 iterations) to obtain masks slightly larger than the text lines. Finally, we included the HOME-Alcar database in our study because this dataset is somewhat in between the cBaD database (it contains cBaD style annotations) and what we would expect in terms of masks (with polygons that take in all the information of the line). For this dataset, we evaluated Mask-RCNN on the raw output masks.
Example of poor mask precision in cBaD 2017 annotation, public database. The green box represents the ground truth’s mask.
On the basis of the Boillet et al.’s results [17], we selected three state-of-the-art U-Net architecture networks: dhSegment [20], ARU-Net [21], and Doc-UFCN [22], to compare the performance of Mask-RCNN. In the present study, we used the performance data reported by these authors in Tables 4 and 5 of their article (note, however, that we replicated these trainings in a pilot study and found approximately the same values, and, most importantly, in the same hierarchy). In our comparisons, we also included the Dilated-FCN network tested by [33] for line segmentation on cBaD and DIVA-HisDB. Unfortunately, these authors only reported a pixel-level metric, the F1-score (see the next section). The comparison of this U-Net network with Mask-RCNN is therefore more limited.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.