2.3. Statistical Analysis

Lena Friedrich; Joachim Krieter; Nicole Kemper; Irena Czycholl

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3. Statistical Analysis

LF Lena Friedrich

JK Joachim Krieter

NK Nicole Kemper

IC Irena Czycholl

This method is extracted from research article: Animals (Basel), Jun 2019

Test−Retest Reliability of the ‘Welfare Quality® Animal Welfare Assessment Protocol for Sows and Piglets’. Part 1. Assessment of the Welfare Principle of ‘Appropriate Behavior’

DOI: 10.3390/ani9070398

Request a Protocol

Ask a question

Favorite

The results of the QBA were calculated for each adjective by reading out the length (mm) on the visual analogue scale with a ruler. Subsequently, the scores in mm were divided over the total length of the visual analogue scale. Thus, the dataset contained one percentage score for each adjective and each farm visit to every farm (e.g., farm 1, farm visit 1: ‘happy’ 52%; range for each adjective: 0–100%). The sum of the 20 adjectives in one farm visit did not account for 100% because adjectives were not mutually exclusive and therefore one animal could be rated as more than one adjective at the same time. The results of the ISS were expressed as a percentage of the total active behavior during each farm visit to every farm (e.g., farm 1, farm visit 1: positive social behavior: 5%, negative social behavior: 4%, use of enrichment material: 10%, investigation of the pen: 6%, other active behavior: 75%; sum: 100%). Finally, the percentage of animals within each category of the HAR and ST was calculated for each farm visit to every farm (e.g., farm 1, farm visits 1: HAR category 0: 90%; HAR category 1: 6%; HAR category 2: 4%; sum = 100%). The results of QBA, ISS, HAR, and ST represent a random sample of the population within a farm and provide an overview of farm level dynamics. Consequently, for each of the five farm visits, the dataset contained thirteen observations, which was equivalent to the number of farms included in the present study. A pairwise comparison was carried out between farm visits 1 (F1; day 0) as a reference and subsequent farm visits (F2–F5; day 3, week 7, month 5, month 10) for the adjectives of the QBA, the categories of the ISS, and each category of the HAR and ST. Therefore, Spearman’s rank correlation coefficient (RS) and intraclass correlation coefficient (ICC) were calculated as reliability parameters, whereas smallest detectable change (SDC) and limits of agreement (LoA) were calculated as agreement parameters. All statistical analyses were performed using the statistical software SAS^® 9.4. [18]. The RS was calculated by the procedure PROC CORR. The procedure PROC GLM was applied to calculate the ICC. The SDC, as derived from the ICC, and LoA were calculated using the formulas which are explained below. The QBA was further analyzed by means of principal component analysis (PCA). The statistics are described in detail in the following.

The RS is a non-parametric measure of rank correlation [19]. Rank correlations range from −1.00 to 1.00, with positive correlations closer to 1.00 providing greater confidence for test−retest reliability. The RS is calculated by:

with d_i being the difference between the ranks for each x_iyi data pair, and n being the number of data pairs [19]. In the present study, using guidance from Martin and Bateson [8], RS equal to or greater than 0.40 was interpreted as acceptable reliability and RS equal to or greater than 0.70 was interpreted as good reliability.

Variance is the basis of the ICC. Thereby, the ICC places the variance between study objects (farms) in proportion to the variance between study objects plus measurement error [20]. For the analysis of variance, the following two-way model regarding to Shrout and Fleiss [21] and McGraw and Wong [22] was assigned:

with X_ij being the measured value, µ the general average value, α_i the random effect of the differences between the study objects (farms), β_j the fixed effect of the farm visits, and ε_ij as the general error term.

The ICC was calculated with regard to the formula of consistency, which was published by de Vet et al. [20], in the following way:

with σ² representing the variance of the study objects (farms) and the residual variance, respectively.

By definition, ICC can range between 0.00 and 1.00, whereby a value of 0.00 indicates the total absence of reliability and a value of 1.00 indicates perfect reliability. Regarding the interpretation, an ICC equal to or greater than 0.40 implied acceptable reliability and an ICC equal to or greater than 0.70 implied good reliability [22].

The SDC is an expression of the measurement error, which contains the residual variance. According to de Vet et al. [20], the SDC is calculated by

with σ² being the residual variance.

The SDC outputs the smallest change in the score that can be detected despite the measurement error. Thereby, the values of SDC correspond to the measurement unit of the indicators under assessment. In the present study, the measurement unit was displayed in percent as a decimal number. Relating to the simple agreement coefficient calculated by de Vet et al. [20], a SDC smaller than or equal to 0.10 indicated acceptable agreement, a SDC smaller than or equal to 0.05 good agreement.

The LoA were calculated with regard to the formula named below, which corresponds to de Vet et al. [20]:

with σ² representing the residual variance.

The LoA estimates the differences between two sets of measurement values. In this case, these were the differences of the measurements obtained between the farm visits and the standard deviation of these differences. Most of the differences are expected to be less than two standard deviations. In this study, LoA is expressed as a relative frequency between −1.00 and 1.00. Again, the interpretation was based on the simple agreement coefficient of de Vet et al. [20] and therefore an interval smaller than or equal to −0.10 to 0.10 was interpreted as acceptable agreement, an interval smaller than or equal to −0.05 to 0.05 as good agreement.

For better understanding, the term ‘reliability’ is used throughout the manuscript when referring to the results of the reliability parameters RS and ICC, the term ‘agreement’ is applied for the results of the agreement parameters SDC and LoA. The differentiation between reliability and agreement parameters and their interpretation are further discussed in the section ‘Reliability and agreement parameters’. The final evaluation, which summarizes all statistical parameters, is covered by the term ‘test−retest reliability’.

According to the definitions introduced above, an acceptable test−retest reliability was obtained when RS and ICC were equal to or greater than 0.40, when SDC was equal to or smaller than 0.10, and when LoA was equal to or smaller than −0.10 to 0.10. Ideally, an indicator achieved acceptable test−retest reliability when all statistical parameters reached the thresholds for acceptability. However, two exceptional cases were defined: On the one hand, the test−retest reliability was rated as acceptable when the repeated farm visits were close to each other, which is indicated by acceptable agreement in the statistical parameters concerning agreement (SDC equal to or smaller than 0.10 and LoA equal to or smaller than −0.10 to 0.10). On the other hand, the test−retest reliability was evaluated as acceptable when the farms could be distinguished from each other within the repeated farm visits, which was indicated by acceptable reliability in the statistical parameters in relation to reliability (RS and ICC equal to or greater than 0.40).

The PCA was performed for further analysis of the QBA as advised by Wemelsfelder et al. [23,24]. Therefore, the procedure PROC FACTOR was applied. Following Temple et al. [10], raw data were transformed into a correlation matrix. For the analysis, a single PCA was calculated for each farm visit (F1–F5). In doing so, no rotation was applied. The first two principal components (PC; PC1 and PC2), which had an eigenvalue of greater than 1.00, were used for the comparison. Each adjective achieved a certain factor loading on PC1 and PC2, which is a dimensionless number between −1.00 and 1.00. Finally, the factor loadings on PC1 and the factor loadings on PC2 were compared between the farm visits (F1 vs. F2, F1 vs. F3, F1 vs. F4, F1 vs. F5) by means of RS. Thereby, F1 counted as a reference value as in the previous analyses. A RS equal to or greater than 0.40 was evaluated as acceptable correlation and a RS equal to or greater than 0.70 as good correlation [8]. Further, the RS was used to determine the correlation between the adjectives of the QBA. A correlation matrix based on RS was used to sort the adjectives of the QBA into different groups. The underlying hypothesis was that the expressive quality of behavior can be subdivided into distinct groups, which may achieve varying degrees of test−retest reliability. Subdivided into groups, the PCs, which were calculated as explained above, were compared for each group of adjectives.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol