The pangenome reconstruction of the blautia dataset was performed with Roary (Page et al., 2015), panX (Ding et al., 2018) and PEPPAN (Zhou et al., 2020). For all the programs, input files were generated by prokka (default settings, –kingdom Bacteria) (Seemann, 2014). For Roary and PEPPAN, GFF files were used while for PanX, GenBank archives. Roary was run with ‘-e -n -p 24 -v -r -i 80 –group_limit 100000’ options. PEPPAN and PanX were run with default options. The output from PEPPAN was parsed using PEPPAN_parser with ‘-t -c -a 95′ settings. Using python scripts, the output of the previous step, namely allele.fna, PEPPAN.gff and PEPPAN.gene_content.Rtab, was used to generate a multifasta file containing the pangenome. The rarefaction curves for this pangenome were taken from the file PEPPAN.gene_content.curve and plotted using pandas (Reback et al., 2020) and matplotlib (Hunter, 2007). For PanX, the file geneCluster.json was parsed with pandas to generate a “presence and absence gene” matrix, to then obtain basic statistics about the pagenome. To estimate the pangenome openness/closedness, this matrix was fed into the R library micropan v2.1 (Snipen and Liland, 2015) and an alpha value was estimated.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.