To construct the skin microbiome gene catalog, sequencing reads from this study as well as from HMP were processed (quality control, removal of human sequences, assembling, gene prediction) using the pipeline shown in Supplementary Fig. 1. SOAPnuke [62] was used for quality control. SOAPaligner2 [63] was for identifying and removing human sequences if they shared > 95% similarity with the human genome reference sequence (hg19) [11]. Consistent with previous findings, on average 80% reads were from human origin instead of microorganisms (Supplementary Fig. 2b). High-quality reads were used for de novo assembly via SPAdes (version 3.13.0) [64], which generated the initial assembly results based on different k-mer sizes (k = 21, 33, 55, 77,99). Ab initio gene identification was performed for all assembled scaffolds by MetaGeneMark (version 3.26) [65]. These predicted genes were then clustered at the nucleotide level by CD-HIT (version 4.5.4), CD-HIT parameters are as follows: - G 0 - M 90000 - R 0 - t 0 - C 0.95 - as 0.90 [66], genes sharing greater than 90% overlap and greater than 95% identity were treated as redundancies. Thus, we obtained a two cohorts non-redundant gene catalog (2CGC) including 13,324,649 genes. To further ensure the integrity of the gene catalog, we did the following: first, sequence alignment was carried out between 2CGC and National Center for Biotechnology Information non-redundant nucleotide (NCBI-NT, downloaded at Aug. 2018): 931 genera genomes (including 2,761 prokaryotes, 112 fungi, 479 viruses)—were identified to be existing in 2CGC (Table S2); we then downloaded the genomes or draft genomes of these microbes and used MetaGeneMark to predict the coding regions; these predicted genes were later pooled, and the software CD-HIT was used to remove the redundant genes. Thus, we got 7,496,818 non-redundant genes, which we refer to as the sequenced gene catalog (SGC). Finally, the gene catalogs based on 2CGC and SGC were combined using CD-HIT. Genes existing in at least ten samples were selected to form the final iHSMGC, which comprised 10,930,638 genes.

Note: The content above has been extracted from a research article, so it may not display correctly.

Please log in to submit your questions online.
Your question will be posted on the Bio-101 website. We will send your questions to the authors of this protocol and Bio-protocol community members who are experienced with this method. you will be informed using the email address associated with your Bio-protocol account.

We use cookies on this site to enhance your user experience. By using our website, you are agreeing to allow the storage of cookies on your computer.