Three major data sources were used to create our SHDI database. We approached statistical offices, including Eurostat, the statistical office of the European Union (https://ec.europa.eu/eurostat), by email communication or visiting their websites to obtain data. We downloaded data from the Area Database of the Global Data Lab (https://www.globaldatalab.org). And we downloaded data from the HDI website of the Human Development Report Office of the United Nations Development Program (http://hdr.undp.org). In the ‘SHDI Start’ data file (Data Citation 1), for each country information is provided on the data source(s) used for the subnational values of the indicators. In this file also for each country the years for which data is available, the number of subnational regions and the population size is presented. Below we discuss the three main data sources in more detail.
For most EU countries the data was derived from the Eurostat database (https://ec.europa.eu/eurostat/data/database). The definition of subnational areas used by Eurostat is based on the NUTS classification (Nomenclature of territorial units for statistics, https://ec.europa.eu/eurostat/web/nuts), a hierarchical system for dividing up the economic territory of the EU. NUTS1 are the major socio-economic regions and NUTS2 basic regions for the application of regional policies. For most EU countries, data was used at NUTS2 level. For Germany and the UK this level is so detailed that data at the NUTS1 level was used. For some EU countries, no subnational data could be obtained from Eurostat and other sources had to be used. For Estonia, Ireland, Lithuania, Latvia, Malta and Slovenia, data from their national statistical offices was used. For Cyprus and Luxemburg no subnational data could be obtained.
Eurostat data for mean years of schooling was available for 2000–2017, for expected years of schooling from 2013–2016, for GDP in Euros PPP from 2004–2016, and for life expectancy at birth from 1990–2016. For Australia, Canada, China, Croatia, Japan, New Zealand, South Korea, Russia, and the USA, data from national statistical offices was used. For South Korea and Russia, no usable educational data could be derived from their statistical offices. For these countries, data on education was derived from survey datasets. For Russia, data from the European Social Survey for 2012 and 2017 were used. For South Korea, data from the World Values Survey 2010 was used.
The Global Data Lab provides since 2016 freely downloadable subnational development indicators for LMICs through its Area Database (GDL-AD; https://www.globaldatalab.org/areadata). These indicators are constructed by aggregation from representative survey and census datasets. The major data sources used by GDL for this purpose are Demographic and Health Surveys (DHS, https://www.dhsprogram.com), UNICEF Multiple Indicator Cluster Surveys (MICS, http://mics.unicef.org) and datasets from population censuses distributed by IPUMS International (https://international.ipums.org). These sources provide large samples, often 50,000 to 100,000 or more respondents, containing information on all household members. For LMICs for which these sources are not available, GDL uses other – country-specific – surveys, or less comprehensive data sources, like Afrobarometer or Americas barometer surveys (http://www.afrobarometer.org, http://www.americasbarometer.org), which include only adults instead of complete households.
For most LMICs, GDL-AD provides the two indicators needed for creating the educational index, mean years of schooling and expected years of schooling. However, the indicators needed for the health and income dimensions are usually not available in the required form in household survey and census datasets. The subnational values of these indicators for LMICs are therefore estimated using data on child mortality and household wealth that is derived from GDL-AD.
The third database used for constructing the SHDI database is the database with national development indicators maintained by the Human development Report Office of the United Nations Development Program (http://hdr.undp.org/en/data). This database contains time series for the period 1990–2017 for the HDI, its dimension indices, and the indicators used for creating the dimension indices, plus a large number of other socio-economic, health, education, demographic and environmental indicators. From this database, the national data is derived that is used to scale the SHDI indicators to their UNDP values.
For Kosovo, Somalia and Taiwan, no national data were available in the UNDP database. For Kosovo, data for 2015 was derived from the national Human Development Report 20158. For Somalia national GDP per capita was derived from the World Bank’s World Development Indicators (http://wdi.worldbank.org) and schooling data for 2012 from the national Human Development Report 20129 and from GDL-AD for 2006. For Taiwan, data from the Taiwanese Directorate General of Budget, Accounting and Statistics was used (http://eng.stat.gov.tw/ct.asp?xItem=25280&ctNode=6032&mp=5). Taiwan and Hong Kong are included in the SHDI Database among the provinces of China.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.