Read trimming and contig assembly

PJ Patrick A. de Jonge
KW Koen Wortelboer
TS Torsten P. M. Scheithauer
BB Bert-Jan H. van den Born
AZ Aeilko H. Zwinderman
FN Franklin L. Nobrega
BD Bas E. Dutilh
MN Max Nieuwdorp
HH Hilde Herrema
request Request a Protocol
ask Ask a question
Favorite

For both WGS and VLP datasets, post-sequencing data analysis was identical. Analysis of sequencing output started with adapter trimming and quality control of sequencing reads using fastp v0.23.179, using standard settings. Trimmed reads were mapped to the human genome (GRCh37) using bowtie2 v2.4.080, which showed that samples contained 0.13 ± 0.26 % human reads. High-quality reads were then assembled per sample (i.e., 196 WGS and 48 VLP assemblies) into contigs using the metaSPAdes v3.14.1 software81. For each sample, we selected contigs of more than 5,000 bp for further analysis. In addition, among contigs between 1,500 and 5,000 bp we identified circular contigs by checking for identical terminal ends using a custom R script that employed the Biostrings R package v3.1282. Assemblies yielded a total of 9,108,147 circular contigs and contigs over 5,000 bp. Three VLP samples were subsampled differently due to memory issues encountered in assemblies. These were S038 and S192 (subsampled to 40 million read pairs), and S069 (subsampled to 25 million read pairs).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A