A mitogenome was diagnosed as resulting from misidentification if the sequence ended up at the same incorrect position in all gene trees and if the sequence was not placed on a long branch. Identification was only attempted at the level of species. Subspecies typically do not show reciprocally monophyletic groups (Zink 2004) which is expected because the phenotypic differences on which subspecies tend to be based may be ephemeral or represent local adaptations, and often originate before coalescence of mitochondrial DNA (Patten and Remsen 2017). If reference sequences are unresolved (did not show reciprocal monophyly of species) the identity of the mitogenome was scored as “could not be verified.”
Mitochondrial introgression may produce the same effect as misidentification: A mismatch between the species label of a mitogenome and its position in a mitochondrial gene tree. We did not attempt to differentiate between these processes due to lack of access to morphological (e.g., specimen) data or nuclear DNA evidence of the relevant samples. However, we noted those cases where the species were known to hybridize (McCarthy 2006).
A mitogenome was diagnosed as a chimera if different fragments showed a close match to different species. For each chimera we identified the homospecific (if any) and heterospecific fragments by direct comparison with sequences of these species or those of closely related species. The combined length of these fragments was used to estimate the heterospecific proportion of the mitogenome. We also calculated the sequence divergence between the homo- and heterospecific fragments. In some cases, if reference sequences of ND2, COI or cytochromeb allowed identification of the homo- or heterospecific fragment(s) but no authentic full mitogenome of these species was available, we did not identify the homo- or heterospecific fragments, and did not calculate the homo- or heterospecific proportions of the mitogenome.
A mitogenome was diagnosed as having sequencing errors or possible numts if there were multiple insertions or deletions in at least one of the three PCGs, or if the divergent part(s) of the sequence—as established from side-by-side comparison with sequences from conspecifics (which we assumed to be trustworthy)—did not closely match any species (as verified with BlastN). No distinction was made between sequencing errors and numts due to difficulties in formally diagnosing the latter (which requires matching the fragment with a known nuclear copy; Nacer and do Amaral 2017).
In cases where a species-specific DNA fragment of a PCG was located at a nonhomologous position, or partially duplicated, the problem was classified as “incorrect sequence assembly.”
If a sequence was correctly identified on GenBank but incorrectly identified in the paper, the sequence was diagnosed as being “mislabeled in paper.”
A sequence that was published under the correct name in the paper but incorrectly on GenBank was diagnosed as being “mislabeled on GenBank.”
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.