The constructed graphs are used as the input to the GNN. More formally, graphs are the adjacency matrix A1 and A2, and the node features, with , where F is the dimension of the node feature.
We first explain the attention mechanism of our GNN. With the input graph of , the pure graph attention coefficient is defined in Eq. 3, which denotes the relative importance between the i-th and the j-th node:
where and are the transformed feature representations defined by and . are learnable matrices in the GNN. and become identical to satisfy the symmetrical property of the graph by adding and . The coefficient will only be computed for i and j where .
Attention coefficients will also be computed for elements in the adjacency matrices. They are formulated in the following form for the element (i, j):
where is the normalized attention coefficient for the i-th and the j-th node pair, is the symmetrical graph attention coefficient computed in Eq. 3, and is the set of neighbors of the i-th node that includes interacting nodes j where . The purpose of Eq. 4 is to consider both the physical structure of the interaction, A ij, and the normalized attention coefficient, eij, to define the attention.
Based on the attention mechanism, the new node feature of each node is updated by considering its neighboring nodes, which is a linear combination of the neighboring node features with the final attention coefficient :
Furthermore, the gate mechanism is further applied to update the node feature since it is known to significantly boost the performance of GNN (Zhang et al., 2018). The basic idea is similar to that of ResNet (He et al., 2016), where the residual connection from the input helps to avoid information loss, alleviating the gradient collapse problem of the conventional backpropagation. The gated graph attention can be viewed as a linear combination of and , as defined in Eq. 6:
where , is a weight vector that is multiplied (dot product) with the vector , and is a constant value. Both D and b are learnable parameters and are shared among different nodes. denotes the concatenation vector of .
We refer to attention and gate-augmented mechanism as the gate-augmented graph attention layer (GAT). Then, we can simply denote . The node embedding can be iteratively updated by , which aggregates information from neighboring nodes.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.