The MDP

JL Jing Li
GY Gang Yu
WD Wen Ding
JH Jian Huang
ZL Zheming Li
ZZ Zhu Zhu
DW Dejian Wang
JZ Jie Zhang
JW Jing Wang
JY Jianwei Yin
ask Ask a question
Favorite

As the 1-stop platform for clinical researchers, the MDP provides medical data of compliance, multidimension and high quality, as well as research tools of effectiveness, convenience, and visualization. The MDP primarily includes 3 parts; these are the data acquisition layer, middle platform, and application layer.

The data acquisition layer acquires medical data required by research projects. The obtained data contain various categories, such as clinical data from health information technology systems (e.g., electronic medical records, hospital information systems, laboratory information systems, and picture archiving and communication systems), omics data from researchers (e.g., genomics, metabonomics, proteomics, immunomics, and ultrasomics), and data from other sources (e.g., biobank, wearables, electronic health records, epidemiology, climate, and environment).

Different techniques [e.g., database batch push, application programming interface (API) transmission, and uploads in files and tables] can be adopted in data acquisition according to the actual conditions. The data are gathered and stored as “raw data” in the data acquisition layer. Raw data are processed through a privacy protection module that deletes patients’ unnecessary private information of each data entry and encrypts what is left.

After preprocessing, data are defined as desensitized data and duplicated in the database. Each copy of desensitized data is supervised by the system. It can be recovered from the other copy in case of data loss.

As the core layer of the MDP, the middle platform manages data quality, research database, and system configuration.

The data quality in the data acquisition layer is not good enough for clinical research in terms of completeness, accuracy, and consistency. A closed-loop mechanism is designed for data quality improvement. The module of data quality management can discover and solve most problems of data quality. The workflow of data quality improvement is shown in Figure 2.

Workflow of data quality management. AI, artificial intelligence.

Problem discovery, problem locating, problem solving, and solution verifying together constitute the process group of data quality management. Most processes of data quality management are triggered by the system automatically according to the established criteria. In the case that a data quality problem is discovered by researchers, the process can also be initiated manually. Once the process is launched, the metadata management module can locate the root cause of the problem and visualize it on the lineage diagrams. Given different scenarios, problems are addressed by the system or platform administrators. It is required to verify the solutions by the system administrators or researchers. All the processes are recorded in the system log and shown on the problem reports.

The module of data quality management works continuously, so that the completeness, accuracy, and consistency of the medical data on the data platform can be improved constantly.

High-quality medical data are stored in the RDR for research projects. The workflow of data processing from raw data to the RDR is shown in Figure 3. The data directory is created automatically by the system. Databases and knowledge graphs for specific diseases can also be generated. Metadata (including, but not limited to, data category, quantity, data source, and update time) of the medical data are analyzed statistically and visualized by the business intelligence module.

Dataflow of the Medical Data Governance System. AI, artificial intelligence; RDR, research data repository.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A