2.2. Method for multiple source localization based on the pre-set observations

Xue Yang; Zhiliang Zhu; Hai Yu; Yuli Zhao; Ying Wang

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

2.2. Method for multiple source localization based on the pre-set observations

XY Xue Yang

ZZ Zhiliang Zhu

HY Hai Yu

YZ Yuli Zhao

YW Ying Wang

This method is extracted from research article: Phys Lett A, Sep 2020

Locating multiple information sources in social networks based on the naming game

DOI: 10.1016/j.physleta.2020.126908

Ask a question

Favorite

We denote $S = {s_{1}, . . ., s_{k}} \subset V$ , which contains the actual information sources. Our goal is to identify set S from the network topology and the local sources provided by the observations. In the previous Section, we introduced a method for selecting observation nodes. In this Section, we divide the observations into sets according to the simplified network $G_{2}$ . Then, we find the source of each observation set according to the information provided by observations.

The localization of multiple information sources is more complicated than identifying a single source. This is because different observations may receive information from different sources in the case of multiple sources. In this section, we discuss how to divide observations into several sets. Compared with the observations in different set, the observations in the same set have higher probability of receiving information from the same source. We overcome this challenge through label propagation. We propose the observation division algorithm in Algorithm 1 to find some suitable observation sets. First, we take all the observations as the sources to carry out reverse label propagation and find the nodes that can spread the information to each observation in a certain number of steps. In a network with diameter $〈 d 〉$ , the longest distance between any node and its source is $〈 d 〉$ . Without any loss of generality, we choose $〈 d 〉 / 2$ steps to find the nodes that can spread the information to each observation. Thus, if a node receives labels from multiple observations, these observations are considered to be propagated by the same source, that is, they belong to the same set. We denote an $N \times N$ matrix OD to record the observation labels received by each node. Initially, if $j \in O$ and $i = j$ , $OD (i, j) = 1$ , else $OD (i, j) = 0$ , where $1 < i, j < N$ . It means that label propagation starts at each observation. Subsequently, the value of $OD (i, j)$ changes from 0 to 1, indicating that node i propagates the label of observation j. After $〈 d 〉 / 2$ rounds of tag propagation, we obtain the matrix OD for the result of the reverse label propagation, and the i-th row of this matrix contains the observations that can be spread by node i in $〈 d 〉 / 2$ steps. It is straightforward that if a node can spread the information to an observation, then it also can spread the information to the nodes that the observation can reach. Thus, we reorganize the result matrix in an iterative way. Then, we eliminate all-zero rows and the rows that can be contained by others. Finally, we obtain the matrix OD in which the non-zero items in each row indicate the observations getting information from the same source node. A parameter $r_{OD}$ is defined as the number of rows of matrix OD, and it also represents the number of observation sets.

After dividing the set of observations, another problem to be solved is to find the source of each set. It is obvious that the observations receive information later than their local source. However, it is impossible to set all the nodes as observations. In general, the longer the propagation path from the source to the node, later the node receives the information. Hence, in each observation set, for each observation $o_{t}$ , we update the probability that a node is the source of the observation set with the difference between the distance of the node to $o_{t}$ and the distance of the node to its local source set. When all the observations are considered, the node with the highest probability is the source of the underlying set. In detail, in set $k (1 < k < r_{OD})$ , We denote $\vec{p_{k}^{(t)}} = (p_{k, 1}^{(t)}, . . . p_{k, N}^{(t)})$ as a probability vector, where $p_{k, i}^{(t)}$ represents the probability that node i is the source of the k-th observation set when considering the information provided by the first t observations. Initially, any node in set V is equally likely to be the source, i.e. for $\forall i \in 1, 2, \dots, N$ , $p_{k, i},^{(0)}$ equals $1 / N$ . Then, for each observation $o_{t}$ , we quantify the effect of the information provided by $o_{t}$ on the value of $p_{k}^{(t)}$ . If $OD (k, o_{t}) = 0$ , it means that $o_{t}$ does not belong to the k-th observation set. Thus, we have $p_{k}^{(t)} = p_{k}^{(t - 1)}$ . If $OD (k, o_{t}) = 1$ , it means that $o_{t}$ belongs to the k-th set. It is obvious that the node in set $N L S_{o_{t}}$ must not be the source. We denote $[u, v]$ as the shortest path between nodes u and v. Let $d (u, v)$ be the length of the shortest path between u and v in $G_{2}$ . When $d (i, o_{t})$ =inf, node i can not propagate the information to the observation $o_{t}$ . Thus, node i must also not be the propagation source. We denote Q as the set of all nodes that satisfy $d (i, o_{t})$ =inf $(i \in V)$ . For the sake of calculation, we denote $T = L S_{o_{t}} \cup Q$ and update the value of each $p_{k}^{(t)}$ according to (1).

where $n_{z}$ represents the number of nodes which are not in T.

Moreover, we denote $d (i, L S_{o_{t}}) = {d (i, l s_{o_{t}, r}) | \max_{l s_{o_{t}, r} \in L S_{o_{t}}} (d (i, l s_{o_{t}, r},)), d (i, l s_{o_{t}, r} \neq \inf}$ as the distance from node i to set $L S_{o_{t}}$ . Then, we denote $η = d (i, o_{t}) - d (i, L S_{o_{t}})$ as the deduction between $d (i, o_{t})$ and $d (i, L S_{o_{t}})$ , and η is larger than zero. The node in $L S_{o_{t}}$ would receive the information earlier than node $o_{t}$ . Thus, the larger the value of η, the greater the probability that node i is the propagation source of the k-th observation set. In order to avoid negative distance difference, we define $ϵ = \min_{l s_{o_{t}, r} \in L S_{o_{t}}} η$ . Based on this, we further update the probability that node i is the source of the k-th observation set.

where $α = \sum_{r = 1}^{t} OD (k, r)$ , it means that $o_{t}$ is the α-th observation in the k-th set.

When t equals $| O |$ , the information provided by each observation is evaluated in the estimation process of the source node, that is, the probability update process stopped. The node with the biggest probability of (3) is considered as the source of the k-th observation set.

Finally, a set of the estimated multiple information sources is obtained by merging the source of each observation set, which we denote as $Ω = {β_{1}, . . ., β_{r_{OD}}} \subset V$ .

Since January 2020 Elsevier has created a COVID-19 resource centre with free information in English and Mandarin on the novel coronavirus COVID-19. The COVID-19 resource centre is hosted on Elsevier Connect, the company's public news and information website. Elsevier hereby grants permission to make all its COVID-19-related research that is available on the COVID-19 resource centre - including this research content - immediately available in PubMed Central and other publicly funded repositories, such as the WHO COVID database with rights for unrestricted research re-use and analyses in any form or by any means with acknowledgement of the original source. These permissions are granted for free by Elsevier for as long as the COVID-19 resource centre remains active.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol