NGS analysis and Shannon entropy

KH Katina D Hulme
AK Anjana C Karawita
CP Cassandra Pegg
MB Myrna JM Bunte
HB Helle Bielefeldt-Ohmann
CB Conor J Bloxham
SH Silvie Van den Hoecke
YS Yin Xiang Setoh
BV Bram Vrancken
MS Monique Spronken
LS Lauren E Steele
NV Nathalie AJ Verzele
KU Kyle R Upton
AK Alexander A Khromykh
KC Keng Yih Chew
MS Maria Sukkar
SP Simon Phipps
KS Kirsty R Short
request Request a Protocol
ask Ask a question
Favorite

The haplotypes for each sample were reconstructed for each gene segment using a previously published pipeline (Cacciabue et al., 2020). In brief, FastQC (Andrews, 2010) was used for quality assurance of the NGS paired-end raw reads followed by BBtools (Bushnell, 2014), for removing and filtering adapters and low-quality reads. Bowtie2 (Langmead and Salzberg, 2012), an aligner tool to align the trimmed reads to the selected reference of the influenza strain (i.e. the inoculum), was then used. Samtools suite (Li et al., 2009) was used to sort, index, and generate depth and coverage statistics for read alignment files. Next, CliqueSNV (Knyazev, 2020) was used to infer the haplotypes and frequencies for all eight gene segments for each sample.

Shannon entropy (abundance-based diversity) was calculated using QSutils, an R package (Guerrero-Murillo and Font, 2020). During analysis the following assumptions were made: each paired-end read present in an alignment comes from a true viral haplotype from the original population, the occurrence of variants in a given gene segment is independent of the rest of the genes, and each haplotype in the actual population has an equal chance to be sampled.

Multiple dimension scaling (MDS) (LMDS V1.0 – R package) was utilised to visualise distance matrices to understand the dynamics of haplotype distance in a given quasi species population across donor and acceptor hosts (i.e. C57BL/6 mice infected with either 6 d.p.i. murine lung homogenate from asthmatic or non-asthmatic host) (Jombart, 2016). We used IRMA (iterative refinement meta-genome assembler) (Shepard et al., 2016) for variant calling and each of the variant in the coding sequence of PB1 gene was translated to the corresponding amino acid residue compared to the reference.

All scripts generated for this study can be found at https://github.com/akaraw/Hulme_et_al (Karawita, 2021) and the workflow for haplotype reconstruction can be found in Supplementary file 3.

Within-host alpha diversity was measured using Shannon’s entropy (H):

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A