2.5. Distance Pruning and Topology Refinement

Qiang Wei; Guangmin Hu

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.5. Distance Pruning and Topology Refinement

QW Qiang Wei

GH Guangmin Hu

This method is extracted from research article: Entropy (Basel), Jun 2021

Unifying Node Labels, Features, and Distances for Deep Network Completion

DOI: 10.3390/e23060771

Ask a question

Favorite

Distance pruning and topology refinement aim to further improve the performance of node embedding. The distance constraint $D$ indicates the existence of edges and non-edges between some node pairs, whereby clamping the edge probability of these node pairs leads to a clearer network topology $G_{D}$ . Then, we take $G_{D}$ instead of $G_{L}$ , and repeat the edge probability learning process to gradually refine the node embedding matrix $H$ .

Given the distance constraint $D$ , we may calculate two deterministic sets: an edge set $E_{D} \subset M_{Z}$ and a non-edge set $E_{D}^{\emptyset} \subset M_{Z}$ . The calculation is based on Observation 1 and Observation 2 [11].

For any given nodes $u$ , $v$ and $w$ in an undirected and unweighted graph $G (V, E)$ , if $| d_{u v} - d_{u w} | \geq 2$ , then $(w, v) \notin E$ holds.

Let $L_{u}^{i} = {v | d_{u v} = i, v \in N}$ be the sets of nodes with the same distance $i$ from $u$ . For two given nodes $v \in L_{u}^{i}$ and $w \in L_{u}^{i + 1}$ , if for any node $x \in L_{u}^{i} \ \{v\}$ , $(x, w) \notin E$ , then $(v, w) \in E$ holds.

Note that Observation 2 needs $u$ observed distances to all the other nodes in $G$ , which cannot be met under our assumption. Therefore, $E_{D}$ only contains the observed direct neighbors of the distance monitor nodes $V_{O D}$ .

After the calculation of $E_{D}$ and $E_{D}^{\emptyset}$ , we clamp the probability of edges in $E_{D}$ to 1, and the probability of non-edges in $E_{D}^{\emptyset}$ to 0. Let $M_{n o n - e d g e} = [m (w, v)] \in {\{0, 1\}}^{N \times N}$ denote the non-edge mask matrix where $m (w, v) = 1$ if $(w, v) \in E_{D}^{\emptyset}$ . Then, the distance pruning process can be represented as follows:

Then, we assign the adjacency matrix $A_{G_{D}}$ of $G_{D}$ as the masked $A_{G_{X}}$ . We ignore $E_{D}$ in LFD-NC as $|E_{D} |≪| E_{D}^{\emptyset}|$ .

We summarize our algorithm in Algorithm 1.

The time complexity of LFD-NC is the same as that of GCN. The complexity of line 1 in Algorithm 1 is $O (| M_{Z} |)$ . In GCN, it is usually satisfied that $F > F^{(1)} \geq F^{(2)}$ ; thus, the complexity of line 3 is $O (N^{2} F + N F^{2})$ . The complexity of lines 4 and 5 is $O (| M_{Z} |)$ . Since $| M_{Z} | < N^{2}$ , the total complexity of LFD-NC is dominated by GCN.

$A_{G_{L}} = A_{O} + W_{L}$

$for r in [1, 2, \dots, R]$

$H = f (A_{G_{L}}, X)$

$A_{G_{X}} = A_{O} + P_{X}$

$A_{G_{X}} [M_{n o n - e d g e}] = 0$

$A_{G_{L}} = A_{G_{D}} = A_{G_{X}}$

end for

$output A_{G_{D}}$

Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol