Background
The functional impact of amino-acid substutions is often unclear in the absence of functional data. Using genomics and in silico prediction methods, it is possible to identify some clues of loss of function (mutation in highly conserved site when compared to a database) or change of function (mutation site in a structural model of the protein, predicted impact of the specific amino-acid change). Gain of function substitutions are harder to predict but maybe more likely if concentrated in specific sites. PROVEAN is an in silico prediction tool of loss of function based on conversation of the mutated site in a large database of protein homologs.
Software and datasets
PROVEAN v1.1.5
Provean dependencies: NCBI Blast, v2.2.30, CD-Hit, v4.8.1
Seqkit v0.15.0
Procedure
Install Provean and the non redundant protein database using the source code, database link (ftp://ftp.jcvi.org/pub/data/provean/nr_Aug_2011/) and the instructions provided here: https://www.jcvi.org/research/provean#downloads (last accessed 15 July 2024).
To run Provean:
1) Generate a list of non-synonymous mutations (format: column 1: LOCUS_TAG / column 2: mutation in short format, see example provided).
2) Concatenate the amino-acid sequences of the mutated genes in a multi-fasta file
3) Run Provean using the script run-provean.sh (see supplementary information). Concatenate the Provean output using the command : “grep -T -A 100 --no-group-separator SCORE */*.out | grep -v SCORE | sed 's/out-/out/' > all_provean.out”. 4) The concateanted output is a tab-separated file that can be further analysed in R (see below). Substitutions are classified as “deleterious” if the PROVEAN score is -2.5 or less and “neutral” otherwise.
Data analysis
An example of downstream analysis is provided as supplementary file.
References
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the Functional Effect of Amino Acid Substitutions and Indels. PLOS ONE. 2012;7(10):e46688.
Supplementary information
Following files can be found in the Github repository: https://github.com/stefanogg/staph_adaptation_paper
run-provean.sh script
Example of input file for run-provean.sh
Example of downstream analysis of the Provean output
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this
article to respond.