March 15, 2017 weblog

Baidu Research is keen on addressing transcription pain points

by Nancy Owano , Tech Xplore

(Tech Xplore)—Artificial intelligence powered transcription software? How, where? Professionals in many sectors who may have to cope with transcriptions of interviews and recorded statements know how tiring transcribing can be. Yet assignment have deadlines and feeling tired will be no excuse for not getting the words correctly or leaving off chunks of what the speaker actually said.

Baidu Research has a fresh answer. They are introducing SwiftScribe. Tian Wu, project manager, made an announcement.

Also, they have a page where you can try it out at swiftscribe.ai/ and it includes a link if you want to sign up for their closed beta.

Yes, this is a beta launch. "To begin, we will invite between 30-50 transcriptionists to test the beta version." (That is, Baidu is inviting just 30 to 50 transcriptionists to participate.)

Why SwiftScribe? The team recognized transcription's pain point – that is, the time-consuming process of manually transcribing word-by-word.

For transcriptionists and the groups they work for, higher productivity would not be such a bad thing. Target users for SwiftScribe range from freelancers, transcriptionists working for transcription service companies and data entry specialists. They may work in industries requiring some or substantial work in transcription, including medical, legal, business and media.

"It typically takes between four to six hours to transcribe one hour of audio data, and the going rate for transcriptions is somewhere around one dollar per audio minute."

Baidu Research aims to fix the pain point. The idea is, you use SwiftScribe to quickly, easily, transcribe voice recordings.

Using SwiftScribe, the time a transcriptionist spends on a project is on average cut down by 40 percent. (VentureBeat wrote, "Wu's team believes SwiftScribe can help people transcribe audio 1.67 times faster—in 40 percent less time—than they would on their own. That would imply that they could do more work and ultimately get paid more for their work, Wu said.")

Technology drivers are speech recognition technology and editing tools.

The company is vocal about its speech recognition capability, with the engine Deep Speech 2. A neural network has been trained on audio, learning to associate sounds with words and phrases.

SwiftScribe said its product is in a space apart from its competitors, for "as users transcribe and make edits, the system can learn and improve along the way."

As VentureBeat noted, you will need to go in and make changes such as capitalizing, adding punctuation or changing spellings of certain words but, wrote VB's Jordan Novet, "Keyboard shortcuts help you more efficiently change the speed of audio, rewind, and add a line break."

Novet provided some background on how this announcement gels with past interest in speech recognition. "Baidu in the past few years has been honing its DeepSpeech software for speech recognition. Last year, the company introduced TalkType, an Android keyboard that, using DeepSpeech, puts speech input first and typing second, based on the idea that you can enter information more quickly when you say it than when you peck."

More information: — swiftscribe.ai/

— research.baidu.com/introducing … nscription-software/

Citation: Baidu Research is keen on addressing transcription pain points (2017, March 15) retrieved 17 July 2024 from https://techxplore.com/news/2017-03-baidu-keen-transcription-pain.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Speech recognition faster at texting

6 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

14 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

17 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

18 hours ago

Large language models make human-like reasoning mistakes, researchers find

19 hours ago

Unveiling a new class of synthetic fuels

19 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

19 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

20 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

23 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

Jul 16, 2024

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Baidu Research is keen on addressing transcription pain points

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Speech recognition faster at texting

Project VoCo demo offers intriguing look at tech for word changes

System correlates recorded speech with images, could lead to fully automated speech recognition

Microsoft claims its new speech recognition system on par with human capabilities

A computer can pick out speech even amid cacophony

Self-learning computer software can detect and diagnose errors in pronunciation

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Phys.org

Medical Xpress

Science X

Baidu Research is keen on addressing transcription pain points

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Speech recognition faster at texting

Project VoCo demo offers intriguing look at tech for word changes

System correlates recorded speech with images, could lead to fully automated speech recognition

Microsoft claims its new speech recognition system on par with human capabilities

A computer can pick out speech even amid cacophony

Self-learning computer software can detect and diagnose errors in pronunciation

Recommended for you

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Your Privacy