September 19, 2018 feature

Fast object detection in videos using region-of-interest packing

by Ingrid Fadelli , Tech Xplore

Researchers at the Robert Bosch Center for Data Science and Artificial Intelligence and Center for Computational Brain Research, Indian Institute of Technology Madras, and Purdue University have recently developed a new method of reducing computational requirements for object detection in videos using neural networks. Their technique, called Pack and Detect (PaD), was outlined in a paper pre-published on arXiv.

Object detection is a key aspect of many computer vision applications, such as object tracking, video summarization, and video search. While recent advances in machine learning have led to the development of increasingly accurate tools for completing this task, existing methods are still computationally very intensive. For instance, processing a video at 300 x 300 resolution using the SSD300 object detection network, with VGG16 as backbone and at 30 fps requires 1.87 trillion floating point operations per second (FLOPS).

The researchers observed that in some cases, however, most regions in a video frame are merely background, with salient objects occupying only a small fraction of the area in the frame. In addition, they found that there is a strong temporal correlation between consecutive frames. They leveraged these observations and proposed a new technique for object detection in videos that could reduce computational requirements for object detection tasks.

"We were inspired by the foveal mechanism in both biological and artificial vision systems," Athindran Ramesh Kumar, one of the researchers who carried out the study, told TechXplore. "Previous efforts pertaining to the foveal attention mechanisms in artificial vision systems focus on only one region in the image or on one object at a time. We wondered how a vision system would be if it could focus on all salient regions in the scene at once."

The object detection method devised by the researchers is hence inspired by biological vision systems. However, contrary to previous attempts, their system packs all the regions of interest together in a single frame, instead of processing them sequentially.

"The objective of our work was to speed-up object detection in videos by focusing only on the salient regions in the frame and eliminating the background clutter," Balaraman Ravindran, another researcher who carried out the study, told TechXplore. "For eliminating background clutter, we exploited the temporal correlation between adjacent frames in a video. This is a property that video compression techniques use to reduce the storage and bandwidth requirements; we use it to speed up computation."

PaD, the object detection method proposed by Ravindran and his colleagues works by processing frames at regular intervals in full size. These frames are referred to as "anchor frames." In all other frames, on the other hand, the tool identifies regions of interest based on the location in which objects were situated in the previous frame.

"These regions of interest are arranged together like in a collage, which is used as input for the object detector," Anand Raghunathan, one of the researchers that carried out the study, told TechXplore. "The detections are then mapped back to the locations in the original image. This method is faster because the collage images are of smaller size than the full frames. We leverage the flexibility of popular object detectors such as SSD300 to process images at both full size and smaller sizes."

The researchers evaluated their method on the ImageNet VID dataset and found that it sped up times by 1.25x, with less than a 1.6 percent drop in accuracy. In addition, they observed that the time taken to process lower-sized frames was almost three times lower, with the FLOP count reduced by four times.

In addition, their study highlighted two important aspects that could inform the development of faster and less computationally intensive methods of detecting objects in videos. First, objects of interest generally only occupy a small fraction of pixels in a frame; second, there is a correlation between adjacent frames in a video.

"Our work can help make video analytics possible on resource-constrained devices at the edge of the Internet of Things by reducing computational requirements, or may improve the number of video streams that may be processed by a server in the cloud," Athindran said.

The study carried out by this team of researchers is an initial step toward the development of more effective object detection tools. They are now planning further investigations that could improve their method further.

For instance, currently, PaD selects anchor frames at regular intervals, yet the researchers could develop a mechanism that dynamically identifies these key frames. They also plan to test their technique in more resource-constrained hardware, such as smartphones, wearable devices and smart home appliances.

"We handcrafted an algorithm to infer the regions of interest and form a collage image," Ravindran said. "But a fully neural system would have neural networks that generate the collage image based on the previous frame. This is a more ambitious line of future work."

More information: Pack and Detect: Fast object detection in videos using region-of-interest packing. arXiv:1809.01701v1 [cs.CV]. arxiv.org/abs/1809.01701

Citation: Fast object detection in videos using region-of-interest packing (2018, September 19) retrieved 30 June 2024 from https://techxplore.com/news/2018-09-fast-videos-region-of-interest.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Adaptive anomaly detection in traffic surveillance videos

55 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

Fast object detection in videos using region-of-interest packing

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Adaptive anomaly detection in traffic surveillance videos

Helping computers fill in the gaps between video frames

An intuitive physics model to predict the effects of a collision

Semantic cache for AI-enabled image analysis

Mimicking the reflexive detection ability of the animal visual system for computer detection of moving objects

An integrated visual and semantic neural network model explains human object recognition in the brain

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

Fast object detection in videos using region-of-interest packing

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Adaptive anomaly detection in traffic surveillance videos

Helping computers fill in the gaps between video frames

An intuitive physics model to predict the effects of a collision

Semantic cache for AI-enabled image analysis

Mimicking the reflexive detection ability of the animal visual system for computer detection of moving objects

An integrated visual and semantic neural network model explains human object recognition in the brain

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy