A multi-task learning network to recognize the numbers on jerseys of sports team players
When reporting on sports games live or remotely, commentators should be able to quickly recognize the numbers on the players' jersey shirts, as this allows them to keep up with what's happening and communicate it to their audience. However, quickly identifying players in sports videos is not always easy, as these videos are often taken at a distance to capture the overall progression of the game. A further difficulty is the fast motion of the broadcast camera that often results in motion blur.
Researchers at University of Waterloo have recently developed a machine-learning technique that can automatically recognize jersey numbers of players in images extracted from broadcast sports videos. This technique, presented in a paper pre-published on arXiv, could help to identify the jersey numbers of team players during sports events faster and more efficiently than other existing computational methods.
"Sports jersey number recognition networks in existing literature consider jersey number recognition as a classification problem and either (1) consider the jersey numbers as separate classes (holistic representation), or (2) treat the two digits in a jersey number as two independent classes (digit-wise representation)," Kanav Vats, one of the researchers who carried out the study, told Tech Xplore. "For example, the jersey number '12' can be modeled by considering '12' as a separate class and also by splitting the number '12' into two constituent digits '1' and '2' and treating the two digits as separate classes."
Past studies have found that learning multiple output representations can improve the performance of deep neural networks. In other words, neural networks that are trained to focus on different aspects of the task they are learning to complete were found to perform better than those focusing on individual aspects of the task.
"The input to the Resnet34 backbone-based network is a single-player image," Vats said. "The network outputs three probability vectors. The first is the probability of the jersey number present in the image considering each jersey number in the dataset as a separate class, the second is the probability distribution of the first digit in the jersey number and the third is the probability of the second digit in the jersey number."
The researchers trained their neural network with the weighted sum of the cross-entropy loss of the three outputs they focused on. When they tested their network, they found that learning both holistic (e.g., '12') and digit-wise (e.g., '1' and '2' in '12') representations of numbers significantly improved their network's ability to recognize jersey numbers. In fact, their multi-task learning approach outperformed other techniques that only focused on either the holistic representation or digit-wise representations.
"'When the multi-task loss function network we proposed was plugged into a network introduced in a previous study, it showed a significant improvement in performance," Vats said. "Notably, the multi-task loss function is also easy to implement in a modern deep learning library (such as Pytorch) and can be used for jersey number recognition in other sports such as soccer."
In the future, the neural network developed by this team of researchers could help to automatically identify jersey numbers in sports videos faster and more efficiently. In addition, Vats and his colleagues compiled a new dataset containing 54,251 annotated images of NHL players and their jersey numbers that could be used to train other techniques for jersey number and player recognition.
In their next studies, the researchers plan to improve their jersey number and player identification system further. For instance, they would like to devise a neural network that also takes into consideration the location of ice hockey players on the ice rink when trying to determine their identities.
"The current study does not take temporal context into account, so our future work will aim to improve player identification by using temporal video data for inferring the jersey number from broadcast clips," Vats said. "This can be done through a temporal convolutional network that can directly work on videos. The proposed multi-task loss function will be incorporated in the temporal network."
More information: Multi-task learning for jersey number recognition in ice hockey. arXiv:2108.07848 [cs.CV]. arxiv.org/abs/2108.07848
© 2021 Science X Network