For performance assessment, matching results were compared against the manual annotation.
We calculated the following performance metrics to assess the performance of FLAP:
where is the number of addresses that are correct matches, is the number of address samples, and is the number of addresses that can be matched in manual annotation (see Section 2.10).
As described in Section 2.6, FLAP also outputs the probability of match as a score. We use the score to illustrate the usability aspect of FLAP. was defined as any correct mapping in the five candidates with the highest scores. Given the score and the threshold of the score , a cross-tabulation is created for the confidence score and matching correctness:
The following metrics are calculated:
A lift curve was generated to demonstrate the usability of FLAP at different confidence score thresholds (Figure 3).
Performance of FLAP at different thresholds of classifier score.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.