We compared the helix insertion energies into the endoplasmic reticulum membrane across kingdoms using DGpred26 for a 19 residue window in the center of the helix. Since only a few sugar transporters have a determined structure available in the PDB, we transferred its helix annotation to additional 138 sequences using alignments. All entries from the sugar porter families (TC 2.A.1.1) without any known structures were selected from the TCDB database84. We collected the entries which had currently available UniProt IDs. The phylogenetic tree was generated by Dendroscope.
As reference sequences, we used the following sugar transporter sequences with corresponding UniProt ID: Homo sapiens Solute carrier family 2 facilitated glucose transporter member 3 (P11169), Escherichia coli D-xylose-proton symporter (P0AGF4), Plasmodium falciparum Hexose transporter 1 (O97467), Staphylococcus epidermidis Glucose transporter (A0A0H2VG78) and Arabidopsis thaliana Sugar transport protein 10 (Q9LT15). The PDB accession number for structures for these reference proteins are 4ZWC (P11169), 4GBZ (P0AGF4), 6RW3 (O97467), 4LDS (A0A0H2VG78) and 6H7D (Q9LT15).
Since the sequences are divergent and therefore difficult to align, we employ two strategies to improve the alignment: (1) multiple references and (2) aligning profile HMMs instead of sequences. We built profile-HMMs for the 143 sugar transporters by aligning them against the UniRef30 database (2020_06)85 using hhblits (v3.3.0, parameter -mact 0.1)44. Each non-reference profile is aligned to the references using hhalign (parameter - glob)86,87. The reference with the highest pairwise alignment score was chosen to transfer its annotation. First, the center is inferred from the residue that aligns with the reference center, however, in three cases the helix center did not align and we chose one of the directly adjacent residues. Second, to counteract misaligned helix positions, we refined the center positions by using the position with the minimal energy for a ± 3 offset, calculated with DGpred. For the resulting coordinates, we extracted the helices and calculated the insertion energy for the 19 residue long helices with DGpred.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.