Now, it may be appropriate to specify the concept of “occurrence”. Butler and Barrett [15] defined it as “the presence of a particular taxon at a particular locality”. In this work, an occurrence is the presence of a particular taxon at a particular locality and time. Although, in most cases, Butler and Barrett’s [15] concept of occurrence is, in practice, also specific regarding time, there are occasions in which it is not true. For instance, this is the case of the occurrence of Spinosauridae in the Late Jurassic of Tanzania. For the PaleoDB the presence of two isolated teeth attributable to spinosaurids is counted as a single occurrence. However, as Buffetaut [36] indicated that they come from different stratigraphic levels with distinct ages, we consider each tooth as a single occurrence, so in our dataset there are two occurrences of spinosaurids in Tanzania (S1 Dataset). We adopted the same procedure whenever possible.
This leads to other questions. For instance, what about two localities that belonged to the same paleoenvironment? If they are counted individually we may be overestimating the presence of a particular taxon in a particular environment (Fig 2). We called these occurrences as “possibly paralogous occurrences”. Although possibly paralogous occurrences encompass mainly localities which pertain to the same geological formation and are close to each other, clearly it will not be the case for close localities classified as different paleoenvironments, i.e., terrestrial and coastal, terrestrial and marine, or coastal and marine (Fig 2). For practical purposes, we considered all occurrences pertaining to the same stratigraphic unit and age that are attributable to the same broad paleoenvironmental category as possibly paralogous occurrences (S1 Dataset). Distinct fossils coming from close localities, but lacking detailed stratigraphic data were also considered as possibly paralogous occurrences.
Consider two distinct localities A and B indicated by dark stars. In a given time t1, A and B are placed in distinct paleoenvironments, coastal and terrestrial, respectively. However, in t3, A and B are part of the same broad ecosystem, so counting these localities as distinct occurrences leads to the overrepresentation of a particular fossil taxon, present in both localities, in this paleoenvironment within the dataset. Thus, distinct localities and occurrences pertaining to the same stratigraphic units and ages and classified as the same broad paleoenvironment are considered as possibly paralogous occurrences. Also, locality B is part of terrestrial paleoecosystems in both t1 and t3, so those paleoecosystems may be the same throughout the time span between t1 and t3. However, as usual, the sedimentary and, consequently, the fossil records may be fragmentary and doubtful (t2), so it is not possible to track the entire paleoenvironmental history of locality B and, hence, be sure if it represents the same paleoenvironment in t1 and t3.
Questions may arise regarding the possibility of occurrences also being paralogous in relation to time. This possibility is real because one paleoenvironment might have existed for a time long enough to be represented in different stratigraphic levels. However, keeping in mind that the sedimentation is rather episodic and that there are many gaps in the stratigraphic sequence, it may be argued that it is not possible to rule out the hypothesis of these same paleoenvironments being temporally unrelated and distinct from each other (Fig 2). Due to the virtual impossibility of evaluating all of these parameters and that many sedimentary deposits lack a detailed stratigraphic analysis, we limited the concept of paralogy to the criteria mentioned in the previous paragraph.
Another issue that pervades this kind of analysis is the taxonomic one: different authors, different taxonomic attributions. We followed recent taxonomic reviews and phylogenies for our taxonomic assignments (e.g., [37–39]). However, different assignments are sometimes symptomatic of the fragmentary nature of the fossil record. Furthermore, some occurrences listed in the PaleoDB are based on references that did not figure the material attributed to a particular taxon. This was the case of some complete papers and abstracts published in some annals (e.g., [40–44]). These occurrences were considered as dubious. Again, for practical purposes, those occurrences based on a single tooth, which correspond to a relevant portion of our dataset, were also kept as dubious (e.g., [45–48]). One special case is that of the post-Cenomanian Brazilian occurrences of carcharodontosaurids. Due to their questioned identities because of temporal unconformity with other global occurrences, they were also considered as dubious. All dubious occurrences are indicated in S1 Dataset.
There are practical implications when considering some occurrences as possibly paralogous and/or dubious. As detailed below, we performed statistical tests including and excluding those kinds of problematic occurrences. So, when two or more occurrences were considered as possibly paralogous, they were counted only once for the tests excluding paralogy, a procedure we called as “synonymization of occurrences”. When the paralogy was between valid and dubious occurrences, the occurrence resulted from the synonymization procedure was no longer considered as dubious. Also, when the paralogy was between those pertaining to different taphonomic categories, the combined occurrences were included in category 1 after being synonymized. In short, the number of occurrences analyzed by the tests excluding both paralogous and dubious ones was not simply their total number minus the number of both possible paralogies and dubious records, especially when considering the taphonomic categories (see below).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.