To further test admixture and introgression, we used the maximum pseudolikelihood estimation of species networks applying quartets under incomplete lineage sorting [SNaQ (32)] using the PhyloNetworks analysis for phylogenetic networks available at For this, we used the high-coverage data on 10 individuals: Arctic cod, Pacific cod, Greenland cod, walleye pollock, and six Atlantic cod [one individual from Sable Bank, Nova Scotia; one individual from Trinity Bay, Newfoundland; a PanI heterozygote from Iceland (referred to as Hybrid); a coastal ecotype from Iceland (Coastal3, PanI AA); and two individuals from the North Sea (both PanI AA and likely coastal ecotype)]. The Polar cod specimen sequenced at a lower coverage was also included. We did not have a pure frontal fish in the high-coverage data and therefore made a “composite” individual. We used 10 low-coverage individuals that grouped together, and we defined as a “Frontal” taxon by the posterior membership probabilities of a discriminant analysis of principal components (DAPC), analysis [see (29)]. We aligned the fastq files of these 10 individuals on the gadmor2 reference genome using bwa mem as if they were one individual. We also did the same using four individuals classified as a “coastal” taxon and referred to it as Coastal4 (29). For the network analysis, we aligned the sequences onto the new gadmor2 assembly (49) and also included the reference genome as a 14th taxon (referred to as gadmor2). The reference genome is from the North-East Arctic cod stock and is of a migratory frontal ecotype. We used ANGSD -doFasta 3, which calls bases by the effective base depth, to generate fasta files from bam alignments. These were used as input for the Tree Incongruence Checking in R (TICR) pipeline precursor to the SNaQ analysis.

Briefly, the TICR pipeline (see and references therein) works as follows. First, a quick parsimony analysis finding parsimony informative sites is done using PAUP. PAUP is then used to estimate parsimony scores on every possible breakpoint in a sequence alignment. The parsimony scores are used by the minimum description length program mdl to partition the sequence into “recombinational genes” by placing breakpoints such that each partition shows the same tree topology. Next, MrBayes is used to find posterior distributions of the partitions or recombinational genes generated by mdl. The program mbsum transforms the MrBayes output into a format of concordance factors suitable for Bayesian concordance analysis, which estimates the proportion of the genome supporting each topology with the program BUCKy. The program Quartet MaxCut estimates a population tree from a split of highest concordance factors of four-taxon sets (quartets). Last, PhyloNetworks implements the SNaQ method (32) to estimate a phylogenetic network and hybridization from quartet concordance factors. The bootstrap support, the percentage of bootstrap networks in which a clade is a hybrid or a sister to a hybrid, can also be estimated. We visualized the networks using PhyloNetworks and edited using inkscape.

On the basis of a multispecies coalescent, the internal branch lengths (t) of the population tree yield expected concordance factors under incomplete lineage sorting for the major [(AB)(CD)] form (expectation 1 – 2/3et) and two minor [(AC)(BD) and (AD)(BC)] forms (expectation 1/3et each) of each quartet of taxa. A deviation of observed concordance factors from expectations is then evidence for reticulation. The test is more general but analogous to the ABBA-BABA test with the major quartet form representing the BBAA pattern (the inner taxa) and the two minor forms, the ABBA and BABA patterns.

We used these options if they differed from default options for the above analysis. For mdl, we used --block-size 100 as the length of the smallest possible partition for mdl and forced a break every 5000 parsimony informative sites with option --forced-break 5000. Each block of 5000 parsimony-informative characters is then run independently through mdl. The number of mdl generated recombinational genes ranged from 2985 for LG23 to 6246 for LG04 (table S6). For MrBayes, we used the default mb-block.txt of the TICR pipeline and cleaned up the results by removing runs that did not meet MCMC convergence criteria with –remove 0.05 before re-running MrBayes. We used default options for BUCKy and SNaQ. For bootstrapping SNaQ networks, we ran 200 bootstrap replicates with 10 runs each using convergence criteria ftolAbs = 0.00001, xtolAbs = 0.01, and xtolRel = 0.1 that are less strict than criteria for the initial runs to avoid excessive computer time.

The computations in this paper were run on the Odyssey cluster supported by the FAS Division of Science, Research Computing Group at Harvard University. Some computations were run on the Mimir bioinformatics server and the Garðar and the Garpur high-performance clusters at the University of Iceland.

DNA sequences were deposited in the sequence read archive (SRA), databank with accession number SRP065670. We requested adherence to the Fort Lauderdale principles of data sharing granting us rights of the first publication.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.