We denote , which contains the actual information sources. Our goal is to identify set S from the network topology and the local sources provided by the observations. In the previous Section, we introduced a method for selecting observation nodes. In this Section, we divide the observations into sets according to the simplified network . Then, we find the source of each observation set according to the information provided by observations.
The localization of multiple information sources is more complicated than identifying a single source. This is because different observations may receive information from different sources in the case of multiple sources. In this section, we discuss how to divide observations into several sets. Compared with the observations in different set, the observations in the same set have higher probability of receiving information from the same source. We overcome this challenge through label propagation. We propose the observation division algorithm in Algorithm 1 to find some suitable observation sets. First, we take all the observations as the sources to carry out reverse label propagation and find the nodes that can spread the information to each observation in a certain number of steps. In a network with diameter , the longest distance between any node and its source is . Without any loss of generality, we choose steps to find the nodes that can spread the information to each observation. Thus, if a node receives labels from multiple observations, these observations are considered to be propagated by the same source, that is, they belong to the same set. We denote an matrix OD to record the observation labels received by each node. Initially, if and , , else , where . It means that label propagation starts at each observation. Subsequently, the value of changes from 0 to 1, indicating that node i propagates the label of observation j. After rounds of tag propagation, we obtain the matrix OD for the result of the reverse label propagation, and the i-th row of this matrix contains the observations that can be spread by node i in steps. It is straightforward that if a node can spread the information to an observation, then it also can spread the information to the nodes that the observation can reach. Thus, we reorganize the result matrix in an iterative way. Then, we eliminate all-zero rows and the rows that can be contained by others. Finally, we obtain the matrix OD in which the non-zero items in each row indicate the observations getting information from the same source node. A parameter is defined as the number of rows of matrix OD, and it also represents the number of observation sets.
After dividing the set of observations, another problem to be solved is to find the source of each set. It is obvious that the observations receive information later than their local source. However, it is impossible to set all the nodes as observations. In general, the longer the propagation path from the source to the node, later the node receives the information. Hence, in each observation set, for each observation , we update the probability that a node is the source of the observation set with the difference between the distance of the node to and the distance of the node to its local source set. When all the observations are considered, the node with the highest probability is the source of the underlying set. In detail, in set , We denote as a probability vector, where represents the probability that node i is the source of the k-th observation set when considering the information provided by the first t observations. Initially, any node in set V is equally likely to be the source, i.e. for , equals . Then, for each observation , we quantify the effect of the information provided by on the value of . If , it means that does not belong to the k-th observation set. Thus, we have . If , it means that belongs to the k-th set. It is obvious that the node in set must not be the source. We denote as the shortest path between nodes u and v. Let be the length of the shortest path between u and v in . When =inf, node i can not propagate the information to the observation . Thus, node i must also not be the propagation source. We denote Q as the set of all nodes that satisfy =inf . For the sake of calculation, we denote and update the value of each according to (1).
where represents the number of nodes which are not in T.
Moreover, we denote as the distance from node i to set . Then, we denote as the deduction between and , and η is larger than zero. The node in would receive the information earlier than node . Thus, the larger the value of η, the greater the probability that node i is the propagation source of the k-th observation set. In order to avoid negative distance difference, we define . Based on this, we further update the probability that node i is the source of the k-th observation set.
where , it means that is the α-th observation in the k-th set.
When t equals , the information provided by each observation is evaluated in the estimation process of the source node, that is, the probability update process stopped. The node with the biggest probability of (3) is considered as the source of the k-th observation set.
Finally, a set of the estimated multiple information sources is obtained by merging the source of each observation set, which we denote as .
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.