PolyPhen-2 combines information on sequence features, multiple alignments with homologous proteins, and structural parameters to predict the impact of a SNP on protein function.
For sequence-based assessment, PolyPhen-2 tries to identify the query as an entry in the UniProtKB/Swiss-Prot database. Using the feature table of the corresponding entry, PolyPhen-2 checks if a given SNP occurs at functional relevant site, e.g. if the SNP lies within a transmembrane, signal peptide, or binding region.
Similar to SIFT, PolyPhen-2 also assesses the degree of conversation of the position where the SNP occurs by utilizing a multiple sequence alignment of homologous sequences. For each variant PolyPhen-2 calculates a position-specific independent counts (PSIC) score. The PSIC score difference between the two variants describes the impact of a particular amino acid substitution: the higher the PSIC score difference, the higher functional impact the substitution is likely to have.
A BLAST query of the query sequence against protein structure databases is carried out to identify corresponding 3D protein structures. If corresponding structures are found, they are used to assess, whether the SNP is likely to destroy the hydrophobic core, interactions with ligands or other important features of the protein.
Finally, all parameters are taken together and empirical prediction rules are applied to make the final decision, whether the SNP is damaging or benign.
PolyPhen-2 is available online at http://genetics.bwh.harvard.edu/pph2/. We used the option ‘Batch query’ and submitted the list of genomic coordinates and variants of our filtered 191 nsSNPs.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.