发布: 2022年05月05日第12卷第9期 DOI: 10.21769/BioProtoc.4404 浏览次数: 1855
评审: Palaniappan SivasankarAnonymous reviewer(s)
Abstract
In most biomedical labs, researchers gather metadata (i.e., all details about the experimental data) in paper notebooks, spreadsheets, or, sometimes, electronic notebooks. When data analyses occur, the related details usually go into other notebooks or spreadsheets, and more metadata are available. The whole thing rapidly becomes very complex and disjointed, and keeping track of all these things can be daunting. Organizing all the relevant data and related metadata for analysis, publication, sharing, or deposit into archives can be time-consuming, difficult, and prone to errors. By having metadata in a centralized system that contains all details from the start, the process is greatly simplified. While lab management software is available, it can be costly and inflexible. The system described here is based on a popular, freely available, and open-source wiki platform. It provides a simple but powerful way for biomedical research labs to set up a metadata management system linking the whole research process. The system enhances efficiency, transparency, reliability, and rigor, which are key factors to improving reproducibility. The flexibility afforded by the system simplifies implementation of specialized lab requirements and future needs. The protocol presented here describes how to create the system from scratch, how to use it for gathering basic metadata, and provides a fully functional version for perusal by the reader.
Graphical abstract:
Lab Metadata Management System.
Background
The process of acquiring, analyzing, and sharing research data is complex. As currently implemented by most biomedical research labs, this process is prone to significant errors, leading to problems with scientific rigor and reproducibility. These negative outcomes signify wasted efforts and investments that frustrate both researchers and funding agencies. This problem has prompted NIH to make this a major consideration in peer review. There are now hundreds of academic references directly addressing this problem. A simple PUBMED search (rigor AND reproducibility) yields hundreds of hits, most of which are from the last few years (e.g., (Landis et al., 2012; Steward and Balice-Gordon, 2014; Bandrowski and Martone, 2016; France, 2016; Sahoo et al., 2016; Yates, 2016; Baxter and Burwell, 2017; Williams et al., 2017; Borghi and Van Gulick, 2018; Botker et al., 2018; Brown et al., 2018; Dingledine, 2018; Gulinello et al., 2018; Lee and Kitaoka, 2018; Plant et al., 2018; Yosten et al., 2018; Prager et al., 2019; Turner, 2019). Even the popular press has widely disseminated this problem in many high-profile articles (New York Times, https://www.economist.com/briefing/2013/10/18/trouble-at-the-lab; The Economist, https://www.economist.com/briefing/2013/10/18/trouble-at-the-lab; The Atlantic, https://www.theatlantic.com/magazine/archive/2015/09/a-scientific-look-at-bad-science/399371/, etc.). While the causes of this crisis are complex, a main part of the problem is associated with the way basic biomedical research labs are accustomed to managing (gathering, organizing, storing, accessing, and sharing) the details about experiments (metadata). Essentially, there are no established methods for doing this, and labs are basically on their own. While there are several electronic laboratory notebooks in the marketplace, most of these have significant limitations, such as limited scope, focus on a particular subfield, high cost, and inflexibility.
This protocol describes a simple method to set up a freely available lab management system for biomedical research labs, that provides an easy way to store, access, peruse, and organize metadata. Lab metadata are all the information required to understand the data generated by the lab. Metadata includes details about subjects, samples, materials, chemicals, methods (protocols), data files, analyses, etc. Without proper metadata, the data derived from any experiment are useless or, at a minimum, can lead to misrepresentations and faulty conclusions. With just a few clicks on a browser (e.g., Chrome), the system presented in this protocol allows the user to know, for example: (i) exactly what was done (detailed protocols), who did it, and when it was done, (ii) what samples and/or subjects were used, (iii) what materials, chemicals, drugs, and equipment were employed, and (iv) where the data files are stored. This information can go back as many years as the system has been used in the lab. The system described has been operational for several years in the author’s lab, and the experience has convinced everyone who has used the system that this approach is the best way to enhance efficiency, transparency, reliability, and rigor, and is therefore likely to improve reproducibility.
The system described has several important features. First, it is based on freely available, and open-source software called DokuWiki, which is widely employed for many purposes. It is valued for its simplicity, while providing a high level of security, including access-control list (ACL) permissions. Second, a wiki is the ideal platform for a research lab since, by definition, it is a repository of knowledge. Third, by running on a popular wiki platform, the system can be enhanced through the availability of hundreds of plugins, that allow users to add features required for their specific needs. In addition, specialized plugins can be developed and easily integrated. Fourth, the system can be easily deployed by lab personnel that have only basic computing knowledge. Having lab members manage their experimental metadata, with minimal effort, for themselves, is practical, and provides both flexibility and control. Fifth, the system can be set up on a network-attached server (NAS), personal computer, or a web hosting service. However, a NAS running in the local lab network is most desirable, and is described in the present protocol. Sixth, all the metadata in the system are stored in standard, and universally accessible formats. For example, part of the system includes a relational database (sqlite3), which can be directly visualized through aggregations on the wiki platform, or accessed using SQL commands through connection from third party software commonly used in research labs (e.g., LabVIEW, MATLAB, OriginLab, Python, R, etc.).
There are obviously many reasons why having all lab metadata organized in a centralized system would be useful. First, in most cases, individuals gather their metadata in paper notebooks, spreadsheets, or, sometimes, electronic notebooks. Moreover, when data analyses occur, the related details usually go into other notebooks or spreadsheets, and even more metadata becomes available. The whole thing rapidly becomes very complex and disjointed. By having all lab metadata available for perusal in one location, the process becomes simpler. Second, most labs use an open approach by which each lab member independently organizes the metadata they generate. However, “to be organized” means different things to different people. Moreover, having metadata dispersed, and organized following different reasoning and in different formats, is highly inefficient and prone to errors. A lab system that provides a flexible, coherent, and logical structure eliminates guessing about how to be organized. Third, a typical scenario is that the lab may need to repeat a successful procedure exactly as it was done several years ago, by someone no longer present in the lab, or who may not recall. Having to troubleshoot a complex procedure again can lead to wasted efforts, and significant problems. The system presented here provides a central location to track, update, and use media (images, videos, tables, etc.) to detail protocols for all lab procedures eliminating uncertainty. Fourth, there are a number of electronic notebook services available in the marketplace, but these can be costly, and may offer little flexibility or control. The system described here is free, highly adaptable, and runs within the lab. As already noted, a wiki is the ideal repository for a lab metadata management system. Fifth, being able to instantly access metadata on a database greatly facilitates not only manual perusal, but also automated data analyses through scripts, eliminating common human errors. Paper notebooks are no longer required, but can be used as support if deemed useful. Sixth, compliance is a required, and ever increasing institutional burden on research labs. The system described here implements standard record keeping, such as animal usage and breeding — related to Institutional Animal Care and Use Committee (IACUC) protocols, and controlled drug usage — related to Drug Enforcement Agency (DEA) licenses. Seventh, the process of organizing metadata for analyses, publication, and output to data archives can be very time-consuming, sometimes difficult, and is prone to a high degree of errors. By having all metadata, from the start, in a centralized system that contains all experimental details, the process is greatly simplified. For example, the process of uploading specific data with its metadata to data archives can be easily automated. Finally, reproducibility is a recognized major problem in biomedical research labs, and in other science fields. The rigor implemented by deploying the system described here should result in significant improvements in reproducibility.
The system is easy to set up and use. Researchers input metadata from the moment the data are gathered using previously defined, and easily selectable terms that refer to defined variables and procedures (method descriptions), which are described in detail within the system, and constantly updated by lab members. The maxim of the system is that any piece of metadata is input only once, and becomes immediately accessible for any purpose. Moreover, many programming software packages commonly used in labs (e.g., LabVIEW, MATLAB, OriginLab, Python, R, etc.) can directly communicate with the system, which limits the need for human intervention during data acquisition or analysis. This protocol describes how to set up a fully functional system; a completed operational example is included for perusal. Readers should first peruse the example (step A in Procedure), and learn how to use the system (step B). If the reader decides to implement the system in their lab, they should proceed to the instructions on how to host DokuWiki on their local NAS (step H; preferred), or using a hosting service (step J), followed by the instructions on how to create their lab metadata system (step K). The focus of the protocol presented here is on basic experimental metadata. It does not include how to handle metadata generated during analyses, because this is done best in combination with programming software, and will be described elsewhere. The system provides biomedical researchers with an easy way to be rigorous about managing metadata, so their efforts can focus on the complexities of the science.
Materials and Reagents
NAS drives used to set up the NAS (WD Red Pro WD6003FFBX 6TB)
External drives used for final data storage (5 TB WD Elements, WDBU6Y0050BBK-WESN)
Equipment
NAS server (e.g., Synology DS3617xs). If the server can host DokuWiki, it will work. The size of the server depends on the lab data requirements, and how the lab plans to store the data. If files generated by the lab are small, it is possible to store them, and have them readily accessible on the server. If the data files generated are generally large, it is more practical to store the raw data in duplicate external drives (e.g., same data in two drives), which are inexpensive, and to access those drives when needed. The options are flexible depending on the lab requirements. The important consideration is that the system knows the location of the data.
Software
Dokuwiki (www.DokuWiki.org)
Procedure
文章信息
版权信息
© 2022 The Authors; exclusive licensee Bio-protocol LLC.
如何引用
Castro-Alamancos, M. A. (2022). A System to Easily Manage Metadata in Biomedical Research Labs Based on Open-source Software. Bio-protocol 12(9): e4404. DOI: 10.21769/BioProtoc.4404.
分类
生物信息学与计算生物学
生物工程
您对这篇实验方法有问题吗?
在此处发布您的问题,我们将邀请本文作者来回答。同时,我们会将您的问题发布到Bio-protocol Exchange,以便寻求社区成员的帮助。
提问指南
+ 问题描述
写下详细的问题描述,包括所有有助于他人回答您问题的信息(例如实验过程、条件和相关图像等)。
Share
Bluesky
X
Copy link