2.4 Bioinformatic analysis

Brendha Truccollo; Paul Whyte; Catherine Burgess; Declan Bolton

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.4 Bioinformatic analysis

BT Brendha Truccollo

PW Paul Whyte

CB Catherine Burgess

DB Declan Bolton

This method is extracted from research article: PLoS One, Mar 2021

Genetic characterisation of a subset of Campylobacter jejuni isolates from clinical and poultry sources in Ireland

DOI: 10.1371/journal.pone.0246843

Request a Protocol

Ask a question

Favorite

Reads were adapter trimmed using Trimmomatic 0.30 with a sliding window quality cut-off of Q15 [19]. They were de novo assembled on SPAdes v.3.14 with the—careful command [20]. The quality of the assemblies was assessed on QUAST v5.0.2 using C. jejuni subsp. jejuni NCTC 11168 as a reference [21, 22]. Kraken v.2.0 was used for taxonomic classification [23]. Annotation was carried out in Prokka v.1.14 with -usegenus and -genus Campylobacter parameters [24]. Multilocus sequence typing (MLST) was carried out on PubMLST on the BIGSdb database [25].

Annotated genomes were separated into three groups for pan-genomic analysis on Roary [26]: one group comprising all 30 genomes in the dataset; clinical, comprising 15 C. jejuni genomes from clinical human infections; and broiler, comprising the remaining 15 C. jejuni genomes isolated from broilers. The purpose of this was for analysing the overall pangenome of these isolates and to compare differences in the pangenome of the clinical and broiler C. jejuni genomes in this study. Roary was run with standard parameters and without paralog splitting. The query_pan_genome command was used to produce files containing information on the core and accessory genes of each group. To isolate the genes that were uniquely present in clinical C. jejuni or broiler C. jejuni genomes, query_pan_genome -a difference was used. Sequences for all core, accessory and unique genes within each group were extracted from the pan_genome_reference.fa output. The sequences were concatenated and submitted to eggNOG Mapper v.2 for functional annotation using default parameters and with a taxonomic scope limited to Epsilonproteobacteria [27]. The outputs of this analysis can be found in S2 Appendix. The pangenome results were visualised on Phandango using the gene_presence_absence.csv and Newick tree outputs from Roary [28], while accumulation curves were constructed on RStudio v.1.1.463 (RStudio Inc., Massachussets, US) using the create_pan_genome_plots.R script [26].

Whole genome differences were visualised by aligning each assembled genome to NCTC 11168 on the Blast Ring Image Generator (BRIG), with an upper identity of 70% and a lower identity of 50% and default parameters [29].

The core_genome_alignment.aln output from Roary was used to generate a maximum likelihood tree on RaxML v.8.2.12 with 500 bootstraps with standard parameters and -m GTRGAMMA -p 12435 -f -a and -x 12345 commands [30]. The resulting tree was visualised on iTOL v.5 [31]. To create a distance matrix, the core genome alignment was read on RStudio v.1.1.463 (RStudio Inc., Massachussets, US) using adagenet, and genetic distances were computed using ape equipped with Tamura and Nei 1993’s model [32–34].

ABRicate (https://github.com/tseemann/abricate) equipped with VFDB and Victors was used to screen for genes associated with virulence factors including motility [35, 36], adhesion, invasion, colonisation, lipooligosaccharide (LOS) synthesis, capsule synthesis and thermotolerance (Chen et al., 2016; Sayers et al., 2019). Antimicrobial resistance was screened using ABRicate equipped with ARG-ANNOT, CARD, Resfinder and NCBI AMRFinder Plus (Feldgarden et al., 2019; Gupta et al., 2014; Jia et al., 2017; Zankari et al., 2012). Discrepancies in the assignment of antimicrobial resistance genes between databases were addressed by selecting the gene with the highest coverage and identity. For both AMR and virulence datasets, genes with less than 80% coverage or identity were discarded. The raw outputs for these analyses are provided in S3 Appendix.

ISFinder was used to investigate the presence of insertion sequences in the genomes of the isolates in this study. Results were filtered by e-value (<0.001), bit score (>40) and identity (90%) [37].

Assemblies were filtered to remove contigs <200bp using BBMap v.38.22 [38] prior to submitting each genome to Bioproject PRJNA665357.

This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol