基于开源软件的生物医学研究实验室元数据轻松管理系统

Manuel A. Castro-Alamancos

doi:10.21769/BioProtoc.4404

Improve Research Reproducibility A Bio-protocol resource

提交稿件
订阅
登录
/
注册
- 个人主页
- 编辑个人信息
- 修改密码
- 退出
CN
- EN - English
- CN - 中文

Peer-reviewed

A System to Easily Manage Metadata in Biomedical Research Labs Based on Open-source Software

基于开源软件的生物医学研究实验室元数据轻松管理系统

MC Manuel A. Castro-Alamancos email

发布: 2022年05月05日第12卷第9期 DOI: 10.21769/BioProtoc.4404 浏览次数: 2321

评审: Palaniappan SivasankarAnonymous reviewer(s)

PDF

Q&A

引用

Cited by

参见作者原研究论文

The authors used this protocol in:

Cover of The Journal of Neuroscience, featuring study using the protocol.

May 2021

Bio-protocol welcomes Protocols in Bioinformatics and Computational Biology

实验方案合集

Cell Imaging - A Special Collection for Cell Bio 2023

Abstract

In most biomedical labs, researchers gather metadata (i.e., all details about the experimental data) in paper notebooks, spreadsheets, or, sometimes, electronic notebooks. When data analyses occur, the related details usually go into other notebooks or spreadsheets, and more metadata are available. The whole thing rapidly becomes very complex and disjointed, and keeping track of all these things can be daunting. Organizing all the relevant data and related metadata for analysis, publication, sharing, or deposit into archives can be time-consuming, difficult, and prone to errors. By having metadata in a centralized system that contains all details from the start, the process is greatly simplified. While lab management software is available, it can be costly and inflexible. The system described here is based on a popular, freely available, and open-source wiki platform. It provides a simple but powerful way for biomedical research labs to set up a metadata management system linking the whole research process. The system enhances efficiency, transparency, reliability, and rigor, which are key factors to improving reproducibility. The flexibility afforded by the system simplifies implementation of specialized lab requirements and future needs. The protocol presented here describes how to create the system from scratch, how to use it for gathering basic metadata, and provides a fully functional version for perusal by the reader.

Graphical abstract:

Lab Metadata Management System.

Keywords: Metadata (元数据)

Lab management (实验室管理)

Data (数据)

Database (数据库)

Rigor (严谨)

Reproducibility (重复性)

Background

The process of acquiring, analyzing, and sharing research data is complex. As currently implemented by most biomedical research labs, this process is prone to significant errors, leading to problems with scientific rigor and reproducibility. These negative outcomes signify wasted efforts and investments that frustrate both researchers and funding agencies. This problem has prompted NIH to make this a major consideration in peer review. There are now hundreds of academic references directly addressing this problem. A simple PUBMED search (rigor AND reproducibility) yields hundreds of hits, most of which are from the last few years (e.g., (Landis et al., 2012; Steward and Balice-Gordon, 2014; Bandrowski and Martone, 2016; France, 2016; Sahoo et al., 2016; Yates, 2016; Baxter and Burwell, 2017; Williams et al., 2017; Borghi and Van Gulick, 2018; Botker et al., 2018; Brown et al., 2018; Dingledine, 2018; Gulinello et al., 2018; Lee and Kitaoka, 2018; Plant et al., 2018; Yosten et al., 2018; Prager et al., 2019; Turner, 2019). Even the popular press has widely disseminated this problem in many high-profile articles (New York Times, https://www.economist.com/briefing/2013/10/18/trouble-at-the-lab; The Economist, https://www.economist.com/briefing/2013/10/18/trouble-at-the-lab; The Atlantic, https://www.theatlantic.com/magazine/archive/2015/09/a-scientific-look-at-bad-science/399371/, etc.). While the causes of this crisis are complex, a main part of the problem is associated with the way basic biomedical research labs are accustomed to managing (gathering, organizing, storing, accessing, and sharing) the details about experiments (metadata). Essentially, there are no established methods for doing this, and labs are basically on their own. While there are several electronic laboratory notebooks in the marketplace, most of these have significant limitations, such as limited scope, focus on a particular subfield, high cost, and inflexibility.

This protocol describes a simple method to set up a freely available lab management system for biomedical research labs, that provides an easy way to store, access, peruse, and organize metadata. Lab metadata are all the information required to understand the data generated by the lab. Metadata includes details about subjects, samples, materials, chemicals, methods (protocols), data files, analyses, etc. Without proper metadata, the data derived from any experiment are useless or, at a minimum, can lead to misrepresentations and faulty conclusions. With just a few clicks on a browser (e.g., Chrome), the system presented in this protocol allows the user to know, for example: (i) exactly what was done (detailed protocols), who did it, and when it was done, (ii) what samples and/or subjects were used, (iii) what materials, chemicals, drugs, and equipment were employed, and (iv) where the data files are stored. This information can go back as many years as the system has been used in the lab. The system described has been operational for several years in the author’s lab, and the experience has convinced everyone who has used the system that this approach is the best way to enhance efficiency, transparency, reliability, and rigor, and is therefore likely to improve reproducibility.

The system described has several important features. First, it is based on freely available, and open-source software called DokuWiki, which is widely employed for many purposes. It is valued for its simplicity, while providing a high level of security, including access-control list (ACL) permissions. Second, a wiki is the ideal platform for a research lab since, by definition, it is a repository of knowledge. Third, by running on a popular wiki platform, the system can be enhanced through the availability of hundreds of plugins, that allow users to add features required for their specific needs. In addition, specialized plugins can be developed and easily integrated. Fourth, the system can be easily deployed by lab personnel that have only basic computing knowledge. Having lab members manage their experimental metadata, with minimal effort, for themselves, is practical, and provides both flexibility and control. Fifth, the system can be set up on a network-attached server (NAS), personal computer, or a web hosting service. However, a NAS running in the local lab network is most desirable, and is described in the present protocol. Sixth, all the metadata in the system are stored in standard, and universally accessible formats. For example, part of the system includes a relational database (sqlite3), which can be directly visualized through aggregations on the wiki platform, or accessed using SQL commands through connection from third party software commonly used in research labs (e.g., LabVIEW, MATLAB, OriginLab, Python, R, etc.).

There are obviously many reasons why having all lab metadata organized in a centralized system would be useful. First, in most cases, individuals gather their metadata in paper notebooks, spreadsheets, or, sometimes, electronic notebooks. Moreover, when data analyses occur, the related details usually go into other notebooks or spreadsheets, and even more metadata becomes available. The whole thing rapidly becomes very complex and disjointed. By having all lab metadata available for perusal in one location, the process becomes simpler. Second, most labs use an open approach by which each lab member independently organizes the metadata they generate. However, “to be organized” means different things to different people. Moreover, having metadata dispersed, and organized following different reasoning and in different formats, is highly inefficient and prone to errors. A lab system that provides a flexible, coherent, and logical structure eliminates guessing about how to be organized. Third, a typical scenario is that the lab may need to repeat a successful procedure exactly as it was done several years ago, by someone no longer present in the lab, or who may not recall. Having to troubleshoot a complex procedure again can lead to wasted efforts, and significant problems. The system presented here provides a central location to track, update, and use media (images, videos, tables, etc.) to detail protocols for all lab procedures eliminating uncertainty. Fourth, there are a number of electronic notebook services available in the marketplace, but these can be costly, and may offer little flexibility or control. The system described here is free, highly adaptable, and runs within the lab. As already noted, a wiki is the ideal repository for a lab metadata management system. Fifth, being able to instantly access metadata on a database greatly facilitates not only manual perusal, but also automated data analyses through scripts, eliminating common human errors. Paper notebooks are no longer required, but can be used as support if deemed useful. Sixth, compliance is a required, and ever increasing institutional burden on research labs. The system described here implements standard record keeping, such as animal usage and breeding — related to Institutional Animal Care and Use Committee (IACUC) protocols, and controlled drug usage — related to Drug Enforcement Agency (DEA) licenses. Seventh, the process of organizing metadata for analyses, publication, and output to data archives can be very time-consuming, sometimes difficult, and is prone to a high degree of errors. By having all metadata, from the start, in a centralized system that contains all experimental details, the process is greatly simplified. For example, the process of uploading specific data with its metadata to data archives can be easily automated. Finally, reproducibility is a recognized major problem in biomedical research labs, and in other science fields. The rigor implemented by deploying the system described here should result in significant improvements in reproducibility.

The system is easy to set up and use. Researchers input metadata from the moment the data are gathered using previously defined, and easily selectable terms that refer to defined variables and procedures (method descriptions), which are described in detail within the system, and constantly updated by lab members. The maxim of the system is that any piece of metadata is input only once, and becomes immediately accessible for any purpose. Moreover, many programming software packages commonly used in labs (e.g., LabVIEW, MATLAB, OriginLab, Python, R, etc.) can directly communicate with the system, which limits the need for human intervention during data acquisition or analysis. This protocol describes how to set up a fully functional system; a completed operational example is included for perusal. Readers should first peruse the example (step A in Procedure), and learn how to use the system (step B). If the reader decides to implement the system in their lab, they should proceed to the instructions on how to host DokuWiki on their local NAS (step H; preferred), or using a hosting service (step J), followed by the instructions on how to create their lab metadata system (step K). The focus of the protocol presented here is on basic experimental metadata. It does not include how to handle metadata generated during analyses, because this is done best in combination with programming software, and will be described elsewhere. The system provides biomedical researchers with an easy way to be rigorous about managing metadata, so their efforts can focus on the complexities of the science.

Materials and Reagents

NAS drives used to set up the NAS (WD Red Pro WD6003FFBX 6TB)
External drives used for final data storage (5 TB WD Elements, WDBU6Y0050BBK-WESN)

Equipment

NAS server (e.g., Synology DS3617xs). If the server can host DokuWiki, it will work. The size of the server depends on the lab data requirements, and how the lab plans to store the data. If files generated by the lab are small, it is possible to store them, and have them readily accessible on the server. If the data files generated are generally large, it is more practical to store the raw data in duplicate external drives (e.g., same data in two drives), which are inexpensive, and to access those drives when needed. The options are flexible depending on the lab requirements. The important consideration is that the system knows the location of the data.

Software

Dokuwiki (www.DokuWiki.org)

Procedure

English

中文翻译

文章信息

版权信息

如何引用

Castro-Alamancos, M. A. (2022). A System to Easily Manage Metadata in Biomedical Research Labs Based on Open-source Software. Bio-protocol 12(9): e4404. DOI: 10.21769/BioProtoc.4404.

Download Citation in RIS Format

分类

您对这篇实验方法有问题吗？

在此处发布您的问题，我们将邀请本文作者来回答。同时，我们会将您的问题发布到Bio-protocol Exchange，以便寻求社区成员的帮助。

发布问题

0 Q&A

提交稿件