2.3. Sequence Analysis

Segundo Fuentes; Adrian J. Gibbs; Mohammad Hajizadeh; Ana Perez; Ian P. Adams; Cesar E. Fribourg; Jan Kreuze; Adrian Fox; Neil Boonham; Roger A. C. Jones

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.3. Sequence Analysis

SF Segundo Fuentes

AG Adrian J. Gibbs

MH Mohammad Hajizadeh

AP Ana Perez

IA Ian P. Adams

CF Cesar E. Fribourg

JK Jan Kreuze

AF Adrian Fox

NB Neil Boonham

RJ Roger A. C. Jones

This method is extracted from research article: Viruses, Apr 2021

The Phylogeography of Potato Virus X Shows the Fingerprints of Its Human Vector

DOI: 10.3390/v13040644

Request a Protocol

Ask a question

Favorite

Genomic sequences were edited using BioEdit [64] to extract their five gene regions (replicase, gp2 (25K), gp3 (12K), gp4 (8K) and gp5 (CP)). The sequences of each gene region were aligned using the encoded aa′s as a guide, by the TranslatorX online server [65] (http://translatorx.co.uk; accessed on 1 June 2019) with its Multiple Alignment using Fast Fourier Transform (MAFFT) option [66]. The alignments were appended sequentially to form an alignment of concatenates with all genes in the same reading frame. A separate CP alignment was made from the new CP genes after 45 near-duplicate Peruvian sequences had been removed for computing convenience, and all of the PVX CP genes downloaded from GenBank.

The concatenated sequences (concats) were tested for the presence of phylogenetic anomalies using the full suite of options in the Recombinant Detection Program RDP4 [67] with default parameters [68,69,70,71,72,73,74,75,76,77]. Anomalies found by less than five methods and with greater than 10⁻⁵ random probability were ignored. Models for Maximum Likelihood (ML) analysis were compared using MEGA7 [78]. The best-fit models were found to be GTR + Г₄ + I [79] for nucleotide (nt) sequences and LG + Г₄ + I [80] for aa sequences.

Phylogenetic trees were calculated using the neighbor joining (NJ) option in ClustalX [81], and/or in Phylogenetic Maximum Likelihood (PhyML) 3.0 for ML [82]. In PhyML, the statistical support for their topologies was assessed using the Shimodaira and Hasegawa (SH) method [83]. Trees were drawn using Figtree Version 1.3 (http://tree.bio.ed.ac.uk/software/figtree/; accessed on 12 May 2018) and a commercial graphics package. PATRISTIC [84] was used to check for mutational saturation by comparing the patristic distances of the nt phylogenies with those of the aa′s they encoded and confirmed by the method of Xia [85]. The BlastN and BlastP online facilities of GenBank [86] were used to search for potexvirus sequences with which to compare, and also to root, the PVX phylogenies.

The program DnaSP v.6.10.01 [87] was used to analyze genetic differences between selected populations of sequences. We used it to estimate average pairwise nt diversity (π), number of synonymous sites (SS), number of non-synonymous sites (NS), mean synonymous substitutions per synonymous site (dS), mean non-synonymous substitutions per non-synonymous site (dN) and ratio of non-synonymous nt diversity to synonymous nt diversity (dN/dS). It was concluded that genes were under positive, neutral or negative selection when their dN/dS ratios were >1, =1 and <1, respectively. Tajima′s D statistical test was used to identify non-random evolutionary events such as population expansion, bottlenecks and selection by comparing the estimated number of segregating sites with the mean pairwise difference among sequences [88]. DnaSP v.6.10.01 was also used to assess the extent of genetic differentiation of PVX populations, measured as the amount of gene flow between them. This was done using the coefficient of genetic differentiation F_ST (=the inter-populational component of genetic variation or the standardized variance in allele frequencies across populations) [89] and the gene flow parameter Nm (the product of the effective population number and rate of migration among populations) [90].

The TempEst program [91] was used to check for the presence of a linear temporal signal in all the dated sequences, and all those in Cluster B. The ‘Least Squares Dating’ (LSD) method Version lsd-0.3beta of To et al. [92] was used to estimate the TMRCAs (Time to the Most Recent Common Ancestor) of Cluster B. The statistical significance of correlation coefficients was calculated using the Social Science Statistics online site (https://www.socscistatistics.com/pvalues/pearsondistribution.aspx; accessed on 3 August 2020). Some alignments were separated into three sub-alignments using NSplitter (https://github.com/HarryGibbs/NSplitter; accessed on 3 August 2020): one was of all the codon positions that had only changed synonymously, another was of codons that included at least one non-synonymous change and the third was of codons that had not changed.

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol