Each trait was isolated from the original Excel® database and modelled to establish how secondary data should be automatically generated (Table (Table1).1). The relevant columns and lines were first exported as a csv file and then processed using Python regular expression. The general overview of the process used to extract primary data is presented in Figure Figure1.1. For categorical traits where one or more categories were present, a keyword search was performed on the primary data. If the keyword(s) search was successful, the category value was assigned, otherwise an expert curation was required to avoid any ambiguity. For numerical traits, three type of values were mostly present: (i) one number, which was extracted as is; (ii) one interval if a hyphen was between two numbers—in this case, the mean of the two numbers was calculated; and (iii) multiple numbers (single or interval)—in this situation, only the first value (or mean of an interval) was stored. Some traits contained only qualitative data as in egg buoyancy, or spawning substrate and other traits contained both numerical and qualitative values as egg diameter or larval size upon hatching (31). In this case, both numerical and categorical extractions may be performed but only the relevant data type is displayed in STOREFISH 2.0 (Table (Table11).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.