Attention and Gate-Augmented Mechanism

XW Xiao Wang
SF Sean T. Flannery
DK Daisuke Kihara
request Request a Protocol
ask Ask a question
Favorite

The constructed graphs are used as the input to the GNN. More formally, graphs are the adjacency matrix A1 and A2, and the node features, xin={x1in,x2in, ,xNin} with xF, where F is the dimension of the node feature.

We first explain the attention mechanism of our GNN. With the input graph of xin, the pure graph attention coefficient is defined in Eq. 3, which denotes the relative importance between the i-th and the j-th node:

where xi and xj are the transformed feature representations defined by xi=Wxiin and xj=Wxjin. W,EF×F are learnable matrices in the GNN. eij and eji become identical to satisfy the symmetrical property of the graph by adding xiΤExjΤ and xiΤExi. The coefficient will only be computed for i and j where Aij>0.

Attention coefficients will also be computed for elements in the adjacency matrices. They are formulated in the following form for the element (i, j):

where aij is the normalized attention coefficient for the i-th and the j-th node pair, eij is the symmetrical graph attention coefficient computed in Eq. 3, and Ni is the set of neighbors of the i-th node that includes interacting nodes j where Aij>0. The purpose of Eq. 4 is to consider both the physical structure of the interaction, A ij, and the normalized attention coefficient, eij, to define the attention.

Based on the attention mechanism, the new node feature of each node is updated by considering its neighboring nodes, which is a linear combination of the neighboring node features with the final attention coefficient aij:

Furthermore, the gate mechanism is further applied to update the node feature since it is known to significantly boost the performance of GNN (Zhang et al., 2018). The basic idea is similar to that of ResNet (He et al., 2016), where the residual connection from the input helps to avoid information loss, alleviating the gradient collapse problem of the conventional backpropagation. The gated graph attention can be viewed as a linear combination of xi and xi, as defined in Eq. 6:

where ci=σ[D(xi||xi)+b], D2F  is a weight vector that is multiplied (dot product) with the vector xi||xi, and b is a constant value. Both D and b are learnable parameters and are shared among different nodes. xi||xi denotes the concatenation vector of xi and xi.

We refer to attention and gate-augmented mechanism as the gate-augmented graph attention layer (GAT). Then, we can simply denote xiout=GAT(xiin,A). The node embedding can be iteratively updated by GAT, which aggregates information from neighboring nodes.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

post Post a Question
0 Q&A