It is well-known that deep learning training process requires a huge number of samples, hundreds of thousands, or even millions. In addition, training a model from scratch is tremendously expensive in terms of required computational power but also in terms of processing time. Luckily, the availability of a pre-trained model allows the execution of the so-called “fine-tuning”. In the fine-tuning process, a model trained on some huge (millions of samples) dataset is “specialized” on different data and this further training requires much fewer resources.

In our case, we started from a ResNet101 (a convolutional neural network that is 101 layers deep) pre-trained on the MS COCO (Microsoft Common Objects in Context), a dataset containing hundreds of thousands of images belonging to 80 different classes [50]. Basically, a network trained to be able to detect objects belonging to the 80 different classes of the MS COCO has been retrained to specialize on the grape class. The availability of pre-trained weights for MS COCO make easier to start the training since those weights can be used as a starting point to train a variation on the network. We used Google Colab, a cloud service that provides free access to computing resources including GPUs. The experiments were executed by a virtual machine with 12 GB of RAM, an Nvidia graphic card (Tesla P100-16GB), and 68 GB of disk space. We performed fine-tuning (Goodfellow et al. [21]) using the GrapeCS-ML dataset images. The dataset was divided into a train (set 1, containing more than 1000 images), validation (set 2, containing more than 500 images), and test (sets 3, 4, and 5, containing nearly 500 images); see Table 2 for further details.

Number of images contained in the GrapeCS-ML Dataset and in the internal dataset.

The internal dataset collected at the University of Sassari contains 451 images from all around Sardinia. The photos collect images of clusters of the main cultivars grown on the island. Specifically, of the 451 photos, almost 200 are Cannonau and Vermentino cultivars. Every single photo represents a different biotype or clone obtained following two important experimental works on mass and clonal selection for cv. Cannonau [51] and varietal comparison for cv. Vermentino [52].

The other photos were collected mainly in collection vineyards of the University of Sassari where all the regional varieties registered in the national register of Italian vine varieties are grown [53].

Regarding the presence of different varieties, we point out the main difference with respect to similar works. The introduction of several different varieties will probably contribute to the generalization, but it is difficult to evaluate this contribution if examples of all the varieties are present in train, validation and test at the same time. In our work we have done something totally different since there are notable differences, in terms of varieties, between train, validation, and test (in the Australian GrapeCS-ML dataset), and above all, a second test dataset was created with further different varieties (the Italian internal dataset).

Images dimensions in the first four sets of the GrapeCS-ML dataset are almost always 480 × 640 or 640 × 480. Conversely, images dimensions in the set 5 of GrapeCS-ML dataset and in the internal dataset vary a lot, from 480 × 640 up to 3024 × 4032 or 4608 × 3456 and many more (see Table 3). Since those sets are both used to test the system, consistent results could prove the robustness even towards considerable variations in size. In order to be processed by the Mask R-CNN framework all the images are automatically resized to 1024 × 1024 pixels. The aspect ratio is preserved, so if an image is not square it is padded with zeros.

Numerosity (in brackets) per different size of the images contained in the GrapeCS-ML dataset and in the internal dataset.

To expand the size of the training part of the dataset, we used a technique called “data augmentation” through which many modified versions of the images in the dataset are created by horizontally flipping, translating, and adding artificial blur and contrast (a few augmentation examples are shown in Figure 6).

Examples of train dataset augmentation: (a) original image; (b) horizontal flipping; (c) image blurring.

This technique allows to considerably extend the number of samples presented to the network during the training phase and, accordingly, to increase its detection and generalization capabilities. Moreover, variations in blurring, color, and brightness are a major problem in the field of computer vision. While in other works the authors try to limit those variations as much as possible, on the contrary, we have tried to include as many variations as possible in our training using dataset augmentation, so that the system “learns” to detect a grape bunch under as many as possible different conditions. It is worth noting that we only used set 1 for train due to the highest numerosity; more than 1000 images which is half of the entire GrapeCS-ML Dataset. The training of the network with a single variety, which could quickly lead to overtraining, is balanced by the use of data augmentation and the high number of varieties present in the validation set.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.