Model architecture

Pascal Notin; Debora S. Marks; Ruben Weitzman; Yarin Gal

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Model architecture

PN Pascal Notin

DM Debora S. Marks

RW Ruben Weitzman

YG Yarin Gal

This method is extracted from research article: bioRxiv, Dec 2023

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers

DOI: 10.1101/2023.12.06.570473

Ask a question

Favorite

ProteinNPT (Fig. 1) is a semi-supervised conditional pseudo-generative model that learns a joint representation of protein sequences and associated property labels. The model takes as input both the primary structure representation of the proteins along with the corresponding labels for the property of interest. Let $(X^{full}, Y^{full})$ be the full training dataset where $X^{full} \in {1,20}^{N . L_{s}}$ are protein sequences (with N the total number of labeled protein sequences and L the sequence length), and $Y^{full} \in R^{N . T}$ the corresponding property labels (where $T$ is the number of distinct such labels, including $L_{t}$ true targets and $L_{a}$ auxiliary labels, as discussed in § 3.2). Depending on whether we are at training or inference time, we sample a batch of $B$ points and mask different parts of this combined input as per the procedure described later in this section. We separately embed protein sequences and labels, concatenate the resulting sequence and label embeddings (each of dimension $d$ ) into a single tensor $Z \in R^{(B . (L_{s} + T) . d)}$ , which we then feed into several ProteinNPT layers. A ProteinNPT layer (Fig. 1 - right) learns joint representation of protein sequences and labels by applying successively self-attention between residues and labels for a given sequence (row-attention), self-attention across sequences in the input batch at a given position (column-attention), and a feedforward layer. Each of these transforms is preceded by a LayerNorm operator $L N (.)$ and we add residual connections to the output of each step. For the multi-head row-attention sub-layer, we linearly project embeddings for each labeled sequence $n \in {1, B}$ for each attention head $i \in {1, H}$ via the linear embeddings $W r_{i}^{K}, W r_{i}^{Q}$ and $W r_{i}^{V}$ respectively. Mathematically, we have:

where the concatenation is performed row-wise, $W_{O}$ mixes outputs $O_{i}$ from different heads, and we use tied row-attention as defined in ^{Rao et al. [2021}] as the attention maps ought to be similar across labeled instances from the same protein family:

We then apply column-attention as follows:

where the concatenation is performed column-wise, $W_{P}$ mixes outputs $P_{i}$ from different heads, and the standard self-attention operator $A t t (Q, K, V) = s o f t m a x (Q \cdot K^{T} / \sqrt{d}) \cdot V$ . Lastly, the feedforward sub-layer applies a row-wise feed-forward network:

In the final stage, the learned embeddings from the last layer are used to predict both the masked tokens and targets: the embeddings of masked targets are input into a L2-penalized linear projection to predict masked target values, and the embeddings of masked tokens are linearly projected then input into a softmax activation to predict the corresponding original tokens.

(Left) The model takes as input the primary structure of a batch of proteins of length $L_{s e q}$ along with the corresponding $L_{t}$ labels and, optionally, $L_{a}$ auxiliary labels (for simplicity we consider $L_{t} = L_{a} = 1$ here). Each input is embedded separately, then all resulting embeddings are concatenated into a single tensor. Several ProteinNPT layers are subsequently applied to learn a representation of the entire batch, which is ultimately used to predict both masked tokens and targets (depicted by question marks). (Right) A ProteinNPT layer alternates between tied row and column attention to learn rich embeddings of the labeled batch.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol