System for drone surveillance: How violence is boxed

The Illustration shows the skeleton corresponding to the humans in an image. The angles (shown in green for few limbs) between the various limbs in this structure are used by the SVM to recognize the humans engaged in violent activities. Credit: arXiv:1806.00746 [cs.CV]

Three researchers, Amarjot Singh (University of Cambridge), Devendra Patil (NIT Warangal India), and SN Omkar (IISc Bangalore) are working on the use of a drone and artificial intelligence to spot fighting people in a crowd.

Their paper "Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network" is on arXiv. A video shows how their system works.

DroneDJ summed up their approach, saying that they use an "off-the-shelf consumer load it with AI and have it monitor a crowded area such as a sports stadium or a protest and look for acts of violence such as punching, kicking, strangling, shooting or stabbing."

Why bother? Are not standard CCTV cameras adequate? Standard CCTV cameras do not do the best job in monitoring violent criminals in large public areas. Enter drones.

The paper will appear in a workshop at IEEE Computer Vision and Pattern Recognition (CVPR) 2018 this month. The system detects violent individuals in real-time by processing the drone images in the cloud.

They addressed five violent types of acts in their paper: punching, kicking, strangling, shooting or stabbing.

Their research introduced what they refer to as "the aerial violent individual dataset used for training the deep network." Hopefully it might encourage other researchers interested in using deep learning for aerial surveillance, they said.

James Vincent in The Verge explained that an algorithm trained using deep learning estimates the poses of humans in the video and matches them to postures the researchers have designated as violent. The video noted that violent people are marked with bounding boxes.

How effective is their system? The level of accuracy goes down when more people enter the scene. James Vincent: "However, the research needs to be taken with a pinch of salt, particularly with regard to its claims of accuracy. Singh and his colleagues report that their system was 94 percent accurate at identifying 'violent' poses, but they note that the more people that appear in frame, the lower this figure. (It fell to 79 percent accuracy when looking at 10 individuals.)"

Their work reflects a research interest in exploring ways to use machine learning to analyze live video footage. They plan to test it during two upcoming festivals in India, said DroneDJ.

The paper also introduced the Aerial Violent Individual (AVI) Dataset which can benefit other researcher aiming to use for aerial surveillance applications.

In the bigger picture, it is obvious by now that the word "surveillance" in and of itself is a loaded term, and one thinks of repressive governments eager to silence protestors by putting them under lock and key for flimsy reasons. On the other hand, societies are coping with vandals, hate groups and kidnappings.

"Anything can be used for good. Anything can be used for bad," said Singh, lead researcher, in The Verge.

Explore further: Meeting the disguised face challenge via deep convolutional network

More information: Eye in the Sky: Real-time Drone Surveillance System (DSS) for Violent Individuals Identification using ScatterNet Hybrid Deep Learning Network, arXiv:1806.00746 [cs.CV] arxiv.org/abs/1806.00746

Abstract
Drone systems have been deployed by various law enforcement agencies to monitor hostiles, spy on foreign drug cartels, conduct border control operations, etc. This paper introduces a real-time drone surveillance system to identify violent individuals in public areas. The system first uses the Feature Pyramid Network to detect humans from aerial images. The image region with the human is used by the proposed ScatterNet Hybrid Deep Learning (SHDL) network for human pose estimation. The orientations between the limbs of the estimated pose are next used to identify the violent individuals. The proposed deep network can learn meaningful representations quickly using ScatterNet and structural priors with relatively fewer labeled examples. The system detects the violent individuals in real-time by processing the drone images in the cloud. This research also introduces the aerial violent individual dataset used for training the deep network which hopefully may encourage researchers interested in using deep learning for aerial surveillance. The pose estimation and violent individuals identification performance is compared with the state-of-the-art techniques.

63 shares