Data description

Matthieu Willems; Etienne Lord; Louise Laforest; Gilbert Labelle; François-Joseph Lapointe; Anna Maria Di Sciullo; Vladimir Makarenkov

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Data description

MW Matthieu Willems

EL Etienne Lord

LL Louise Laforest

GL Gilbert Labelle

FL François-Joseph Lapointe

AS Anna Maria Di Sciullo

VM Vladimir Makarenkov

This method is extracted from research article: BMC Evol Biol, Sep 2016

Using hybridization networks to retrace the evolution of Indo-European languages

DOI: 10.1186/s12862-016-0745-6

Request a Protocol

Ask a question

Favorite

Several important studies dedicated to the classification of IE languages [7, 8, 10, 29] have examined the data from the 84 IE language database organized by Dyen and colleagues [46]. The Dyen database contains word forms for the meanings of the 200-meaning Swadesh list [19]. This list is one of a few lists of fundamental meanings collected by M. Swadesh in the 1940s and 50s. It is often used in lexicostatistics, which focuses on quantitative evaluation of lexical cognates, and in glottochronology, which focuses on dating divergence times of natural languages. Swadesh lists have been used by linguists to test the level of chronological separation of languages by comparing words, as they contain universal stable items with low levels of borrowing [7, 8]. However, it has been noticed that even though the use of Swadesh lists may decrease the level of borrowings to a certain degree, it cannot exclude all of them [21]. For each of the 200 basic meanings of the Swadesh list, the Dyen database contains their word forms in 84 IE languages. These word forms have been regrouped in cognate sets [46]. Two word forms were identified as cognate if they share an uninterrupted evolutionary history characterized by the presence of a common ancestral form. The word forms resulting from word borrowing (e.g., English word fruit which was borrowed from Old French) and those related by accidental similarity (e.g., the word form bad exists in both English and Farsi, but this is rather considered as an accidental similarity by linguists) were placed in a separate class. When it was difficult to differentiate between cognates and word forms resulting from borrowing or accidental similarities, the corresponding word forms, albeit not numerous, were categorized as doubtful cognates. For instance, this database was used by Gray and Atkinson [7] and Atkinson and Gray [47] to infer evolutionary trees of IE languages. In order to reconstruct our hybridization networks, we also considered some additional linguistic resources (Douglas Harper’s Online Etymology Dictionary [48], the IE Lexical Cognacy Database (IELex) [49] and the IE etymological dictionaries collection [50]), which include relevant etymological information regarding loanwords and accidental similarities. Using these resources, we modified some of the original cognate sets created by Dyen et al. [46]. Precisely, the loanwords, put aside by Dyen and colleagues, were added to the corresponding cognate sets (i.e., cognate sets containing the donor forms for these loanwords). In some rare cases, the original cognate sets including doubtful cognates were either merged or eliminated. In total, our modified database included 1315 cognate sets. It is available at: http://www.trex.uqam.ca/biolinguistics.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

0/150

tip Tips for asking effective questions

+ Description

Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol