Attention and Gate-Augmented Mechanism

Xiao Wang; Sean T. Flannery; Daisuke Kihara

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Attention and Gate-Augmented Mechanism

XW Xiao Wang

SF Sean T. Flannery

DK Daisuke Kihara

This method is extracted from research article: Front Mol Biosci, May 2021

Protein Docking Model Evaluation by Graph Neural Networks

DOI: 10.3389/fmolb.2021.647915

Request a Protocol

Ask a question

Favorite

The constructed graphs are used as the input to the GNN. More formally, graphs are the adjacency matrix A¹ and A², and the node features, $x^{i n} = {x_{1}^{i n}, x_{2}^{i n}, \dots, x_{N}^{i n}}$ with $x \in ℝ^{F}$ , where F is the dimension of the node feature.

We first explain the attention mechanism of our GNN. With the input graph of $x^{i n}$ , the pure graph attention coefficient is defined in Eq. 3, which denotes the relative importance between the i-th and the j-th node:

where $x_{i}^{'}$ and $x_{j}^{'}$ are the transformed feature representations defined by $x_{i}^{'} = W x_{i}^{i n}$ and $x_{j}^{'} = W x_{j}^{i n}$ . $W, E \in ℝ^{F \times F}$ are learnable matrices in the GNN. $e_{i j}$ and $e_{j i}$ become identical to satisfy the symmetrical property of the graph by adding $x_{i}^{' Τ} E x_{j}^{' Τ}$ and $x_{i}^{' Τ} E x_{i}^{'}$ . The coefficient will only be computed for i and j where $A_{i j} > 0$ .

Attention coefficients will also be computed for elements in the adjacency matrices. They are formulated in the following form for the element (i, j):

where $a_{i j}$ is the normalized attention coefficient for the i-th and the j-th node pair, $e_{i j}$ is the symmetrical graph attention coefficient computed in Eq. 3, and $N_{i}$ is the set of neighbors of the i-th node that includes interacting nodes j where $A_{i j} > 0$ . The purpose of Eq. 4 is to consider both the physical structure of the interaction, A _ij, and the normalized attention coefficient, e_ij, to define the attention.

Based on the attention mechanism, the new node feature of each node is updated by considering its neighboring nodes, which is a linear combination of the neighboring node features with the final attention coefficient $a_{i j}$ :

Furthermore, the gate mechanism is further applied to update the node feature since it is known to significantly boost the performance of GNN (Zhang et al., 2018). The basic idea is similar to that of ResNet (He et al., 2016), where the residual connection from the input helps to avoid information loss, alleviating the gradient collapse problem of the conventional backpropagation. The gated graph attention can be viewed as a linear combination of $x_{i}$ and $x_{i}^{″}$ , as defined in Eq. 6:

where $c_{i} = σ [D (x_{i} | | x_{i}^{″}) + b]$ , $D \in ℝ^{2 F}$ is a weight vector that is multiplied (dot product) with the vector $x_{i} | | x_{i}^{″}$ , and $b$ is a constant value. Both D and b are learnable parameters and are shared among different nodes. $x_{i} | | x_{i}^{″}$ denotes the concatenation vector of $x_{i} a n d x_{i}^{″}$ .

We refer to attention and gate-augmented mechanism as the gate-augmented graph attention layer (GAT). Then, we can simply denote $x_{i}^{o u t} = G A T (x_{i}^{i n}, A)$ . The node embedding can be iteratively updated by $G A T$ , which aggregates information from neighboring nodes.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol