A corpus of names was obtained from the UK ‘local BMD’ project (http://www.ukbmd.org.uk/local), an ongoing volunteer effort to transcribe the local indices of the UK births, marriages and deaths (BMD) registers for digital preservation. BMD registration began in England and Wales in 1837, and became compulsory with the Births and Deaths Registration Act 1875. Each quarter, copies of the BMD indices generated locally are sent to the General Register Office in London, where they are re-transcribed to form a national catalogue. However, the data is not publicly available in a form amenable to large-scale analysis, the websites hosting the records only permitting the bulk download of 25 years’ worth of records at a time for a single letter, i.e. a subset of records with surnames beginning with A, and so on. To obtain the dataset used here, 1716 files spanning all years and regions had to be individually downloaded.
Data was collated from all participating areas in the UK local BMD project: the cities, counties and regions of Bath, Berkshire, Cheshire, Cumbria, Lancashire, North Wales, Staffordshire, West Midlands, Wiltshire, and Yorkshire (Table 1), and downloaded on 12th September 2016. Each of these areas constitutes a different record transcription project. These are run by volunteers, with larger volunteer efforts in different areas. As such, the data is non-uniform both in terms of records per geographical region and depth of coverage over time. Several of these projects (Berkshire, Cumbria, North Wales) are not actively maintained, and contain no new birth records for 4–5 years prior to data collation. The available fields for each birth record were the first name, middle name(s) and surname, year of birth, district in which the birth was registered, and identification number. The data includes 143,259 unique names from approx. 22 million individuals over 177 years, from 1838 (the first complete year of BMD registration) to 2014. This approximates 130,000 to 230,000 records per year from 1838–1950, 25,000 to 100,000 records per year from 1951–2000, and 5000 to 15,000 records per year from 2001 to 2014. As such, we assume its scope is sufficiently broad to be representative of UK naming patterns.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.