Statistical analyses

AM Alexandra Morton
RR Richard Routledge
SH Stacey Hrushowy
MK Molly Kibenge
FK Frederick Kibenge
request Request a Protocol
ask Ask a question
Favorite

The data files used in the following analyses are available in S3 and S4 Tables.

The relationship between the viral screening results and exposure to salmon farms was first examined using a cluster analysis on the proportions of PRV-positive test results within farmed fish (Atlantic salmon and steelhead) and the nine wild fish regions (all species combined). Additionally, logistic regression analyses were used to: (a) probe for potential underlying causes for the geographic patterns in these proportions, (b) generate leads for further investigation, and (c) check for the potential that any apparent patterns could be attributable to other causes. The focus in the logistic regression analysis was on levels of exposure to salmon farms, return migration challenge, and host species. Lastly, the proportions of Atlantic salmon testing positive for PRV were assessed for inter-annual variation using likelihood-based inference.

Because so little is known about the potential epidemiological interactions between farmed and wild salmon in the North Pacific, an exploratory approach to our analyses was used. Thus, in keeping with the spirit of exploratory data analysis [43], we adopted a flexible approach to the selection of statistical methods and models, and put forward our conclusions as hypotheses worthy of further attention.

To perform the cluster analysis on the regional proportions of PRV results we applied the agglomerative, hierarchical clustering method based on cluster centroids as implemented in the SAS® CLUSTER procedure, SAS software, Version 9.4. In keeping with commentary in SAS 2013, the centroid method was selected to avoid giving too much influence to the much larger proportion of PRV positive fish in the farmed Atlantic salmon category.

The logistic regression analysis was conducted solely on the wild fish. The factors of primary interest were: salmon farm exposure, migration challenge, and host species.

The number of categories was restricted to avoid the potential for over-parameterization. Farm exposure and migration challenge were categorized as low or high as described above. Host species were reduced to four taxonomic units among the wild fish by combining lineages that had not diverged prior to approximately 7.5 million years ago [44, 45]–chinook-coho salmon with 168 samples, chum-pink salmon with 175 samples, sockeye salmon with 220 samples, and rainbow-cutthroat trout, with 38 samples.

Two other factors, life stage and year, were included in the logistic regression analysis to probe for potential confounding effects. The wild salmon life stages were divided into 2 categories: juveniles (192 fish) and adults (409 fish).

Observations used in the logistic regression analysis were limited to 2012 and 2013, the years for which farmed and wild salmon were concurrently sampled in sufficient numbers. There was insufficient data to extend formal inferences to other years. There were too few degrees of freedom, and the standard assumption of independence between years that underlies the usual models for random effects would have been compromised if, for example fish returning at ages 4 and 5 from the same cohort were both exposed to the same PRV source at an earlier life stage. Furthermore, Taksdal [46] highlights the potential both for differences in virulence between virus subtypes, and for relatively abrupt changes in viral-subtype presences that could produce sudden jumps in the proportions of positive tests. Both of these events would reduce the comparability of years in which only farmed Atlantic or wild salmon were collected. Such complex behavior calls for more elaborate modelling. Hence, inferences have been limited to 2012–2013, year effects were treated as fixed, and Oweekeno Lake data was not included in the analysis.

Furthermore, there were sufficient numbers of observations to assess the main effects of each of the factors, but not necessarily for interactions between them (see S1 File for further explanation).

Finally, a random effect associated with the within-cluster correlation of fish obtained from the same location and year was included to account for potential dependency in PRV presence among fish sampled from the same effective host population. This term additionally compensates for cross-contamination within a sampling event, as this would have had a comparable impact to the contagious spread of virus within a school of fish before they were caught.

A more formal description of the statistical model is provided in S1 Text.

We used a stepwise approach to our logistic regression (starting with a full model) to screen for potentially influential factors. To reduce the likelihood of deleting potentially important variables in this exploratory analysis, we planned to remove, at each deletion step, the variable with the highest p-value from the model only if its p-value exceeded 0.10. Competing methods based on AIC and other similar measures of goodness of fit were complicated by occasional cases of missing information on some variables. Hence, the stepwise approach was more appropriate, and generated a preferred model after only two deletion steps. All mixed-effects logistic regression inferences were performed using the SAS GLIMMIX procedure as implemented in SAS software, Version 9.4.

We also formally compared the proportions of PRV-positive tests for the farmed Atlantic salmon between 2012 and 2013. Because multiple fish were purchased from the same outlet on the same day, we needed to account for potential dependence within such clusters of sampled fish generated by factors such as a common farm of origin and cross-contamination in processing and handling during harvest. We did so by incorporating a random effect term similar to that used in the logistic regression model. Details are provided in S2 Text.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

post Post a Question
0 Q&A