Evaluation

Steffen Lang; Raphael Wild; Alexander Isenko; Daniel Link

Improve Research Reproducibility A Bio-protocol resource

Home
Protocols

Concise Method

Evaluation

SL Steffen Lang

RW Raphael Wild

AI Alexander Isenko

DL Daniel Link

This method is extracted from research article: Sci Rep, Sep 2022

Predicting the in-game status in soccer with machine learning using spatiotemporal player tracking data

DOI: 10.1038/s41598-022-19948-1

Request a Protocol

Ask a question

Favorite

We evaluated the performance of the final models using four approaches. First, a frame-wise evaluation, which compares the ball status in the ground truth to the prediction in each frame. As performance metrics, we computed the accuracy and F1-Score for each model. Furthermore, we compared the prediction to a random guessing approach per match by taking the percentage of in-game frames, we refer to this as the knowledge gain. It is calculated for each match using the prediction accuracy minus the percentage of in-game frames. However, this frame-wise performance does not necessarily translate into correctly identified match interruptions since all consecutive frames must have the same value for a stoppage. For example, a single ball-in detection in a long streak of ball-out labels creates two stoppages instead of a single one.

Consequently, and secondly, model performance was evaluated stoppage-wise. The basic idea is to extract stoppages from the original data and predictions by searching for a matching pair between them. Stoppages are extracted by identifying the frames at which the match was interrupted and resumed, where the first and last frames with the ball-out label define the start and end of a stoppage, respectively. A suitable metric to compare predicated stoppages with actual ones is the intersection over union (IoU), which is common in object detection benchmarks^³¹,³². The IoU is computed for a pair of real and predicted stoppages as the overlap time between them divided by the overall time covered by both stoppages, i.e., the overlap time plus the sum of the non-overlapping durations. For our paper, two stoppages are matched if their IoU is at least 50%, which guarantees that each real stoppage is matched only with one predicted stoppage and vice versa.

A good model should apply to analysis tasks. Hence, the results of performance metrics using the ball status prediction must be comparable to those using real data. The IoU metric assigns a predicted stoppage to the corresponding ground truth interruption, even when the overlap is imperfect. Subsequently, a shift in the correct starting and ending points for each stoppage exists, which affects its application in video analysis tasks.

Third, we checked whether the predicted stoppages’ starting and ending points did not differ much from the ground truth. We calculated the shifts between real and predicted points for the start and end. A fundamental task in video analysis is to analyze standard situations, thus, we assumed that our predictions should be in a range within ± 2 s. In that case, a practitioner could easily find and add time marks to analyze the execution of a standard situation.

Fourth, we evaluated the quality of the performance indicator Total Distance Covered (TDC) in the effective playing time (TDC_E). TDC is one of the most common performance indicators for estimating workload^³³–³⁵. TDC_E represents the running activity when the ball is in-play and can be interpreted as the match intensity. Since the error in predicting the ball status introduces an error to TDC_E, we checked whether this error is acceptable for performance analysis. Therefore, we calculated TDC_E for each player three times per match using the ball status based on (1) ground truth, (2) AD prediction and 3) a naïve approach and compared them. For the naïve approach, we calculated an approximation for TDC_E by taking the real TDC by the player for the whole match and the mean percentage of the effective playing time in all matches. In our test matches, the mean percentage of the effective playing time was 59.6 ± 6.0% (Min. = 47.4%, Max. = 69.7%). To reduce noise, only field players who played a full match were included in the analysis.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Do you have any questions about this protocol?

Post your question to gather feedback from the community. We will also invite the authors of this article to respond.

Post a Question

0 Q&A

Share your protocol with your peers.

Submit a Preprint Protocol