The workflow for this study is presented in S1 Fig. A read count table was obtained using the raw sequencing reads. After trimming reads with fastp v.0.20.0 (default parameters, [34]), fastq-formatted reads were aligned to the genome of F. prausnitzii strain A2-165 (Genome Assembly ASM273414v1) using BWA v.0.7.17 [35], allowing a single mismatch in the read. Then, sam-formatted alignments were sorted and converted to bam output files using SAMtools v.1.10 [36]. The number of reads per transcript from each sample was counted using HTSeqCount v.0.12.4 [37] and GFF-formatted gene annotations downloaded from NCBI. We checked the distribution of raw counts and performed principal component analysis in each dataset (S2 Fig). Gene expression values were normalized using the DEseq2 package v.1.34.0, in R [38]. The count table, containing 2,950 genes (S1 Table), was filtered to eliminate non-expressed genes. The resulting final dataset contained 2,902 genes and was further processed with WGCNA [39]. The quality of the expression matrix was evaluated by hierarchical clustering based on the distance between different samples, measured using Spearman’s correlation. No outliers were detected (S3 Fig).
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.
Tips for asking effective questions
+ Description
Write a detailed description. Include all information that will help others answer your question including experimental processes, conditions, and relevant images.