2.4. Bioinformatics analyses

XL Xingxing Li
SL Shunda Liang
DZ Dan Zhang
MH Miao He
HZ Hong Zhang
ask Ask a question
Favorite

All original sequence reads are eliminated by the bioinformatics analysis software for low-quality and complex reads, duplicate reads, reads shorter than 50bp, contamination reads, and human sequence data (Li and Durbin, 2009; Bolger et al., 2014). In the end, there were approximately 13,000 genomes included in the final database. The remaining sequence data were aligned to a microbial database (including bacteria, viruses, fungi, and parasites) designed by a technology company, which is comparable to the National Center for Biotechnology Information (NCBI) Nucleotide and Genome databases, to determine the species and relative abundance of pathogens. Pathogen lists were chosen based on three references: 1) Manual of Clinical Microbiology, 2) Johns Hopkins ABX Guide, and 3) clinical case reports or academic studies recently appeared in peer-reviewed publications. RPM-r was defined as the reads per million (RPM) of a particular organism in the clinical sample divided by the RPM of the negative control. If the RPM-r was ≥ 5, and the RPM for bacteria and fungi were more than 10 and 2 respectively, there was a reported positive detection for the certain pathogen (Miller et al., 2019; Zinter et al., 2019). A viral detection result was considered positive when three or more non-overlapping areas of the genome were covered.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A