Each of the four separate administrative health datasets used in the evaluation contained multiple event-based records per person. The datasets comprised: ten years of Western Australian (WA) hospital admission data (n = 6,772,949), ten years of New South Wales (NSW) hospital admissions data (n = 19,874,083), three years of NSW public emergency department (ED) presentation data (n = 4,304,459) and three years South Australian (SA) ED presentation data (n = 813,839). Each dataset contained errors typical of administrative data, including missing and incorrect identifiers, and identifiers that change over time. Each dataset had previously been de-duplicated (identifying the records within each dataset belonging to the same individual) to a high quality by jurisdictional linkage units (the Centre for Health Record Linkage, the Western Australian Data Linkage Branch, and SANT Data Link for NSW, WA, and SA datasets respectively). These linkage units utilised a variety of deduplication methods including probabilistic record linkage, intensive manual review of created links and quality assurance procedures to analyse and review potential errors [28, 29]. The links created by these linkage units have been further validated through their regular use in academic and government research [2]. The linkage units provided the present study with the results of their matching processes, allowing us to use this as a ‘gold-standard’ with which to compare the results of our deduplication of these datasets. The data was made available as part of proof of concept work for the Population Health Research Network [30].
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.