April 5, 2024

Manual transcription still beats AI: A comparative study on transcription services

by CISPA Helmholtz Center for Information Security

A research team from the Empirical Research Support (ERS) at CISPA Helmholtz Center for Information Security has conducted a systematic comparison of the most popular transcription services. The comparison involved 11 providers of manual as well as AI-based transcriptions.

It shows that, good quality notwithstanding, the latter still have problems with speaker attribution and that there are discrepancies between recording and transcription that distort meaning. Whisper AI from OpenAI delivered the best results among the AI providers.

Interviews are a popular method for collecting scientific data. There is a basic distinction between quantitative and qualitative interviews. While the former is designed to obtain statistically usable information from a large number of participants with the help of standardized questionnaires, the latter is aimed at obtaining interview data that allow for interpretation by the researchers.

A special type is the guided interview, in which there is a prepared list of questions, which can, however, be deviated from during the interview. "In cybersecurity research, these interviews are utilized when exploring the patterns of action and interpretation of actors who operate through digital means," explains sociologist Dr. Rafael Mrowczynski from CISPA's Empirical Research Support (ERS) team. The ERS team advises the Center's researchers on methodological issues.

Converting an audio file into text

Transcription is a crucial step in qualitative data analysis. "The standard procedure is to convert the audio recordings of the interviews into text. It is important for the quality of the data that the transcriptions are adequate," Mrowczynski explains. Depending on the scientific field, there are different standards for transcription.

"In cybersecurity research, we usually work with transcripts that precisely reproduce the content of the conversation," says Mrowczynski. An adequate transcript, therefore, only contains the relevant spoken words. The researchers can obtain the transcript in two ways: Either it is created by the research team itself, or the task is outsourced to third-party providers.

Among the third-party providers, besides manual transcription, there has recently been real hype about automated, AI-based transcription. This is due to the exponential leaps in development and quality that AI applications have experienced in many areas over the last two years.

The researchers from CISPA's ERS team wanted to know which provider on the market achieves the best results and how automated, AI-based transcription performs in comparison with manual transcription. The goal was to be able to provide the researchers at CISPA and the cybersecurity community with a recommendation for working with qualitative interviews.

The approach of the ERS team

For their research project, Mrowczynski and his colleagues Dr. Maria Hellenthal, Dr. Rudolf Siegel, and Dr. Michael Schilling created a test dataset. This consisted of individual interviews lasting about ten minutes and group discussions with CISPA researchers in German and English. The content focused on the research field of cybersecurity.

"It was important that technical terms from the community were included so that the precision of the transcription could be assessed," Mrowczynski explains. Some of the interviews were additionally enhanced with background noise in order to reflect real settings in everyday research better.

The data were sent to eleven providers in December 2022. Among those were the transcription services Amberscript, GoTranscript, QualTranscribe, Rev, and Scribbl, as well as the AI-based transcription providers Amazon Transcribe, AssemblyAI, Audiotranskription.de, Google Cloud, Microsoft Azure, and Whisper by OpenAI.

For the assessment of the obtained transcripts, Mrowczynski and his colleagues created a reference transcript that served as the basis for the comparative analysis. The analysis itself then focused on two central criteria. First, the researchers assessed the word error rate, which indicates by how many words a transcript differs from the reference transcript. Second, the qualitative deviation from the reference transcript was coded manually.

Manual transcription services beat AI

In their paper, Mrowczynski and his colleagues conclude that, in general, "most of the manual transcription services achieve a commendable level of performance, while AI-based services often show meaning-distorting discrepancies between recording and transcription."

The distortion of meaning can be clearly seen in technical terms; Mrowczynski explains, "In the transcript, for example, the term 'hashes' became 'ashes." That is how we came up with the title of the paper."

OpenAI's Whisper achieved the best results among the AI-based providers. Most providers handled English better than German. Three providers did not offer transcription for German at all. Background noise generally had a negative effect on the result. The AI-based providers particularly had problems with speaker assignments.

In addition, the transcripts created by an AI had to be reformatted before it was possible to further process them in software for qualitative data analysis. However, the researchers point out that their analysis reflects the state of the art as of December 2022 and that current developments could not be taken into account.

The research was presented at the 2023 CCS ACM Conference on Computer and Communications Security.

More information: Rudolf Siegel et al, Poster: From Hashes to Ashes - A Comparison of Transcription Services, Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (2023). DOI: 10.1145/3576915.3624380

Provided by CISPA Helmholtz Center for Information Security

Citation: Manual transcription still beats AI: A comparative study on transcription services (2024, April 5) retrieved 30 April 2024 from https://techxplore.com/news/2024-04-manual-transcription-ai.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Large-scale long terminal repeat insertions found to produce a significant set of novel transcripts in cotton

5 shares

Feedback to editors

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

1 hour ago

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

1 hour ago

New research reports on buckling: When structures suddenly collapse

3 hours ago

Paper power: Origami technology makes its way into quadcopters

4 hours ago

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

6 hours ago

Trotting robots reveal emergence of animal gait transitions

7 hours ago

Research team improves voltage of aqueous rechargeable batteries in the quest for safer, cheaper options

8 hours ago

A framework to enhance the safety of text-to-image generation networks

10 hours ago

Scientists harness the wind as a tool to move objects

Apr 29, 2024

Researchers develop a new way to instruct dance in virtual reality

Apr 29, 2024

Load comments (0)

Manual transcription still beats AI: A comparative study on transcription services

Converting an audio file into text

The approach of the ERS team

Manual transcription services beat AI

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

New research reports on buckling: When structures suddenly collapse

Paper power: Origami technology makes its way into quadcopters

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

Trotting robots reveal emergence of animal gait transitions

Research team improves voltage of aqueous rechargeable batteries in the quest for safer, cheaper options

A framework to enhance the safety of text-to-image generation networks

Scientists harness the wind as a tool to move objects

Researchers develop a new way to instruct dance in virtual reality

Large-scale long terminal repeat insertions found to produce a significant set of novel transcripts in cotton

Microsoft adds transcription feature to Word

Researchers develop novel data representation for transcription factor-binding sequences

Scientists develop new method to distinguish newly made gene transcripts from old ones

New insights on the transcriptional regulation of seed germination

Geneticists discover two distinct modes of transcription termination

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

A framework to enhance the safety of text-to-image generation networks

Trotting robots reveal emergence of animal gait transitions

Researchers develop a new way to instruct dance in virtual reality

Researchers use ChatGPT for choreographies with flying robots

Computer scientists unveil novel attacks on cybersecurity

Phys.org

Medical Xpress

Science X

Manual transcription still beats AI: A comparative study on transcription services

Converting an audio file into text

The approach of the ERS team

Manual transcription services beat AI

Methane emissions from landfill could be turned into sustainable jet fuel with plasma-driven process

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

New research reports on buckling: When structures suddenly collapse

Paper power: Origami technology makes its way into quadcopters

Turning up the heat on data storage: New memory device paves the way for AI computing in extreme environments

Trotting robots reveal emergence of animal gait transitions

Research team improves voltage of aqueous rechargeable batteries in the quest for safer, cheaper options

A framework to enhance the safety of text-to-image generation networks

Scientists harness the wind as a tool to move objects

Researchers develop a new way to instruct dance in virtual reality

Related Stories

Large-scale long terminal repeat insertions found to produce a significant set of novel transcripts in cotton

Microsoft adds transcription feature to Word

Researchers develop novel data representation for transcription factor-binding sequences

Scientists develop new method to distinguish newly made gene transcripts from old ones

New insights on the transcriptional regulation of seed germination

Geneticists discover two distinct modes of transcription termination

Recommended for you

AI speech analysis may aid in assessing and preventing potential suicides, says researcher

A framework to enhance the safety of text-to-image generation networks

Trotting robots reveal emergence of animal gait transitions

Researchers develop a new way to instruct dance in virtual reality

Researchers use ChatGPT for choreographies with flying robots

Computer scientists unveil novel attacks on cybersecurity

Your Privacy