Data Set

MS Mohsen Shahhosseini
GH Guiping Hu
SA Sotirios V. Archontoulis
request Request a Protocol
ask Ask a question
Favorite

County-level historically observed corn yields were obtained from the USDA National Agricultural Statistics Service (NASS, 2019) for years 2000–2018. A data set was developed containing observed information of corn yields, management (plant population and planting date), and environment (weather and soil) features.

Plant population: plant population measured in plants/acre, downloaded from USDA NASS

Planting progress (planting date): The weekly cumulative percentage of corn planted over time within each state (NASS, 2019)

Weather: 7 weather features aggregated weekly, downloaded from Daymet (Thornton et al., 2012)

Daily minimum air temperature in degrees Celsius.

Daily maximum air temperature in degrees Celsius.

Daily total precipitation in millimeters per day

Shortwave radiation in watts per square meter

Water vapor pressure in pascals

Snow water equivalent in kilograms per square meter

Day length in seconds per day

Soil: The following soil features were considered in this study: soil organic matter, sand content, clay content, soil pH, soil bulk density, wilting point, field capacity, saturation point, and hydraulic conductivity. Because these features change across the soil profile, we used different values for different soil layers, which resulted in 180 features for soil characteristics of the selected locations, downloaded from the Web Soil Survey (Soil Survey Staff et al., 2019)

Yield: Annual corn yield data, downloaded from the USDA National Agricultural Statistics Service (NASS, 2019)

The developed data set consists of 5,342 observations of annual average corn yields for 293 counties across three states on the Corn Belt and 597 input features mentioned above. The reason to choose these components as the explanatory features is that the factors affecting yield performance are mainly environment, genotype, and management. Weather and soil features were included in the data set to account for environment component, as well as management, but since there is no publicly available genotype data set, the effect of genotype on the yield performance is not considered. In this study we used many input parameters that are probably less likely to be available in other parts of the world. In this case we recommend use of gridded public soil or weather databases used to drive global crop production models (Rosenzweig et al., 2013; Hengl et al., 2014; Elliott et al., 2015; Han et al., 2019).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A