This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:


trusted source


System extracts spoken language from video recording, converts it to searchable text

Credit: Unsplash/CC0 Public Domain

A new approach to searching through video content has been developed by a team in South Korea. The system, described in the International Journal of Computational Vision and Robotics, extracts spoken word from a video recording, converts it to text, and then makes that text searchable. Importantly, the system thus does not rely on embedded keywords nor curated tags or hashtags to be associated with the video content.

The approach obviously relies on the dialogue or spoken commentary of an item being associated with the scenes in the video that users might wish to search. It is, of course, superfluous if the video has subtitles already baked in. Nevertheless, it will be a boon for users wishing to search the millions of hours of video available in databases, on streaming services, and elsewhere on the internet and could be used to help catalogue videos.

Kitae Hwang, In Hwan Jung, and Jae Moon Lee of the School of Computer Engineering at Hansung University in Seoul, have developed an Android app for use with appropriate smartphones. It is worth noting, however, that there is at least one other app with the same name, so should this app be made available in the Google Play Store for Android apps, it is likely to require a change of name.

The new app works by extracting audio from videos using the FFmpeg code and converting it into text in 10-second increments. This, the team explains, creates a searchable timeline for the video. Advanced speech recognition technology then generates a transcription of those audio segments, which are indexed on the video timeline.

For a 20-minute video, the process is complete in just two to three minutes and runs in the background while the video plays. The team points out that users can then search for specific terms and find all mentions in the video.

The app will have applications in education, news analysis, and other information-dense video where quick access to specific information is needed. For instance, students reviewing lecture recordings or journalists searching for specific statements in interviews could make use of this app. There are many more scenarios where it would be useful to be able to search video in this manner.

More information: Kitae Hwang et al, An implementation of searchable video player, International Journal of Computational Vision and Robotics (2024). DOI: 10.1504/IJCVR.2024.138324

Provided by Inderscience
Citation: System extracts spoken language from video recording, converts it to searchable text (2024, May 23) retrieved 19 June 2024 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Prototype browser extension adds Wikipedia-like citations on YouTube to curb misinformation


Feedback to editors