For the training subset we applied online-augmentation with light spatial shearing (, ), horizontal flipping (), in-plane rotation (, ), and scaling (, ) [82]. Any spatial transformation of the ground truth data were performed using projections of the numerical coordinate representations to prevent information loss by discrete interpolation. Before network propagation, the image intensities were normalized to the value range of using min–max normalization given by . Since the original images are of different spatial resolutions and aspect ratios, they were brought to common dimensions of by bi-cubic down-sampling. Thereby, the original aspect ratio was maintained, and the images were zero-padded to the target resolution.
After augmentation, the numerical ground truth was converted to its respective spatial representation. For that purpose, the bone outline polygons were converted to binary segmentation masks with the original image resolution. The numerical keypoint coordinates were translated to spatial activation maps/heatmaps by sampling a bivariate Gaussian function without normalization factor (i.e., at keypoint coordinate) and a standard deviation of . Similarly, the line heatmaps were generated by evaluating the point-to-line distance of the line-surrounding points in a locally-bounded hull. The line-orthogonal width of this hull was defined as . The point and line heatmaps were truncated at , the remaining values were set to zero.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.