Distance pruning and topology refinement aim to further improve the performance of node embedding. The distance constraint indicates the existence of edges and non-edges between some node pairs, whereby clamping the edge probability of these node pairs leads to a clearer network topology . Then, we take instead of , and repeat the edge probability learning process to gradually refine the node embedding matrix .
Given the distance constraint , we may calculate two deterministic sets: an edge set and a non-edge set . The calculation is based on Observation 1 and Observation 2 [11].
For any given nodes , and in an undirected and unweighted graph , if , then holds.
Let be the sets of nodes with the same distance from . For two given nodes and , if for any node , , then holds.
Note that Observation 2 needs observed distances to all the other nodes in , which cannot be met under our assumption. Therefore, only contains the observed direct neighbors of the distance monitor nodes .
After the calculation of and , we clamp the probability of edges in to 1, and the probability of non-edges in to 0. Let denote the non-edge mask matrix where if . Then, the distance pruning process can be represented as follows:
Then, we assign the adjacency matrix of as the masked . We ignore in LFD-NC as .
We summarize our algorithm in Algorithm 1.
The time complexity of LFD-NC is the same as that of GCN. The complexity of line 1 in Algorithm 1 is . In GCN, it is usually satisfied that ; thus, the complexity of line 3 is . The complexity of lines 4 and 5 is . Since , the total complexity of LFD-NC is dominated by GCN.
end for
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.