Phylogenetic tree of GPCRs and mapping of G protein coupling data

TF Tilman Flock
AH Alexander S. Hauser
NL Nadia Lund
DG David E. Gloriam
SB Santhanam Balaji
MB M. Madan Babu
request Request a Protocol
ask Ask a question
Favorite

GPCR sequence alignment was constructed for each GPCR Class (A, B and C; defined in the Guide to Pharmacology/IUPHAR database; sequences retrieved through IUPHAR API using a Python script). Initial alignment within each class of GPCRs was made using MSAProbs52 which was further manually adjusted using the GPCRdb numbering4 as a guide. Furthermore, alignments within classes were trimmed by removing N- and C- terminal overhanging residues and large insertion in ICL3 beyond first ten to fifteen residues. As a cross-class alignment was not straightforward due to the low sequence similarity across GPCR classes, a structure alignment of the highest resolution structure of each GPCR class was used to cross-align the individual GPCR Class alignments. The structure alignment was constructed using Mustang53 with 4EIY (aa2ar_human) and 4BVN (adrb1_melga) representing Class A, 4K5Y (crfr1_human) representing Class B, and 4OO9 (grm5_human) representing Class C. First this structural alignment was integrated manually with the already generated Class A GPCRs alignment and then sequentially Class B and Class C alignments were also integrated manually to get a cross class “super alignment” (CCSA). The CCSA was validated against a recent cross-GPCR-class structural alignment4. Using the CCSA GPCR alignment, we first built an approximate maximum-likelihood (ML) phylogenetic tree using FastTree54 and this was used as initial starting tree for the final ML tree generation using MEGA755.

G protein-coupling data and GPCR classifications were retrieved from the IUPHAR/BPS Guide to Pharmacology (May 2016)56 SQL database as described above. R was used to prepare the coupling data for visualisation as concentric circles in the phylogenetic tree (Fig. 5c; Extended Data Fig. 8) using the latest version of iTol (version 3)57. In order to investigate sequence composition, sequence conservation, and searching for physiochemical and sequence pattern, the GPCR and G protein alignments were analysed in R using the bio3d58 and ape packages42.

To reconstruct the most likely ancestral GPCR coupling profile across all the clades of the final ML tree of human GPCRs, the Gα–GPCR coupling data was mapped on to the CCSA as described above. We first created a “coupling profile” for each receptor using the coupling information (from IUPHAR database). The profile is a vector of 4 dimensions (Gs, Gi/o, Gq/11, G12/13) and takes the value 1 (couples) or 0 (does not couple) in each dimension. By considering this as the “trait” for each receptor, we integrated the data with the final ML tree to generate ancestral coupling probability values using BayesTraits V 2.059 (http://www.evolution.rdg.ac.uk/BayesTraits.html). For each clade in the ML tree, we used the montecarlo simulation (mcmc) option with 100,000 trials in BayesTraits, to obtain probabilities of ancestral coupling tendency for each of the four Gα families. These ancestral coupling probability values were converted into a binary format i.e. “1” and “0”, where “1” indicates ancestral coupling to the given G protein and “0” indicates absence of such coupling. We assigned the value “1” to the ancestral node if the coupling probability was greater than or equal to 0.7. Otherwise we assigned the value “0”. This information was then converted into a “coupling profile” for each ancestral node in the tree, similar to the above-mentioned individual GPCR coupling profiles. Then for each GPCR, and the clade to which a given receptor belonged to, we required that: (i) the clade should contain 30 or fewer GPCRs (so that we investigate an ancestral receptor that is not very recent nor ancient) and (ii) ancestral coupling probability of the ancestral node as well individual receptors within the clade had coupling information (i.e. should not have all 0s in their profile). Through a custom written Perl script, we traversed the ML tree. We considered that a given GPCR has an altered coupling tendency compared to one of its ancestral receptor’s coupling tendency if there was a mismatch in their coupling profiles. The number of such instances was recorded and used to infer the fraction of receptors that have altered their coupling selectivity during their evolution.

The aminergic, purinergic, chemokine, S1P-related and V2R-related receptors (Extended Data Fig. 9) were selected as representative evolutionarily related receptor groups. The receptors in the different groups include (i) Purinergic cluster: P2RY1, P2RY2, P2RY4, P2RY6, P2RY11; (ii) V2R-related cluster: V1Br, V1AR, V2R, OXYR, NPSR1, GNRHR, PKR1, PKR2; (iii) S1P-related cluster: CNR1, CNR2, LPAR1, LPAR2, LPAR3, S1PR1, S1PR2, S1PR3, S1PR4, S1PR5; (iv) Chemokine cluster: CCR9, CCR7, CCR10, CXCR4, CXCR6, CCR6, CXCR3, CXCR5, CXCR2, CCR3, CCR1, CCR5, CCR2, CCR4, CCR8, CX3C1, XCR1, CXCR1; (v) Aminergic cluster: 5HT1A, 5HT1B, 5HT1D, 5HT1E, 5HT1F, 5HT2A, 5HT2B, 5HT2C, 5HT4R, 5HT5A, 5HT6R, 5HT7R, ACM1, ACM2, ACM3, ACM4, ACM5, ADA1A, ADA1B, ADA1D, ADA2A, ADA2B, ADA2C, ADRB1, ADRB2, ADRB3, DRD1, DRD2, DRD3, DRD4, DRD5, HRH1, HRH2, HRH3, HRH4, TAAR; (vi) Adrenergic cluster: ADRB1, ADRB2, ADRB3; (vii) Adenosine cluster: AA1R, AA2AR, AA2BR, GP119. Structure-based sequence alignments, conservation statistics and residue property features for every receptor position of these groups were collected through the GPCRdb API (http://gpcrdb.org/services/reference/)4,30 using Python scripts. Residue property groups associated with a certain type of molecular interaction were defined as in GPCRdb4,30 [small: A, C, D, G, N, P, S, T, V; aromatic: F, W, Y, H; aliphatic-hydrophobic: A, V, I, L, M, C, P; positive charge: H, K, R; negative charge: D, E; hydrogen-bonding: D, E, H, K, N, Q, R, S, T, W, Y). Interacting receptor positions were identified as described above. For each receptor group, we calculated the molecular property signatures (Extended Data Fig. 9) for their ability to couple to a particular G protein family by comparing the subsets of coupling and non-coupling receptors within the group, respectively (primary and secondary coupling data from the IUPHAR/BPS Guide To Pharmacology database). Each signature is composed of a unique combination of residue positions with distinct conservation (% in Gαx coupling - % in Gαx non-coupling receptors) of residue properties at each position. This calculation was performed using the pandas Python library (http://pandas.pydata.org/). Selectivity signatures of residue properties were visualised using matplotlib (http://matplotlib.org/). Investigations of sequence patterns, selectivity determinants and sequence conservation (Fig. 5d and Extended Data Fig. 9c) were performed using the Spial (http://www.mrclmb.cam.ac.uk/genomes/spial/) web server35 and visualised by WebLogo356. The parameters used for generating Fig. 5d and Extended Data Fig. 9c in Spial are a conservation cut-off of 0.1, specificity cut-off for V2R-clade panel: 0.25 and Gs-binding panels: 0.50.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A