Statistical analysis

AA Agnese Aguzzoni
MB Michele Bassi
EP Emanuela Pignotti
PR Peter Robatscher
FS Francesca Scandellari
WT Werner Tirler
MT Massimo Tagliavini
request Request a Protocol
ask Ask a question
Favorite

After data logarithmic transformation, two different models were tested to highlight statistical differences among cultivation areas. The first one was a linear regression model in which only a fixed effect was considered, namely the cultivation area. The second model was a linear mixed model in which, beside the fixed effect (cultivation area), the sampling site was included as random effect in order to take into account the hierarchical structure of the data. Results of the two models were compared through the analysis of variance (ANOVA) test and the outputs of the model with the lowest AIC (Akaike information criterion) were chosen. Level of significance was fixed at P‐value = 0.05. Tukey HSD (honestly significant difference) post hoc test was applied for multiple comparisons among cultivation areas.

To improve the identification of sample origin based on the results of the multi‐chemical approach, multivariate data analysis was performed based on a supervised classification method, namely the LDA. A first model was developed analysing the multi‐chemical composition of 117 apple samples divided in four main groups according to their cultivation areas. Then, a second model was developed limiting the analysis to the multi‐chemical composition of South Tyrolean apples PGI (51 apple samples). For this second model, the grouping factor was the cultivation district (three groups). Prior to LDA, data were centred and scaled. The discriminant models were validated by ‘leave‐one‐out’ cross‐validation. Results of the confusion matrix were elaborated to get the sensitivity, specificity, precision, false discovery rate and balanced accuracy of the developed model applying the following equations:

where TP stands for true positive, TN for true negative, FP for false positive, and FN for false negative.41

The statistical analysis was performed using the computing environment R (R Core Team, Vienna, Austria, 2016).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A