April 18, 2019

Why language technology can't handle Game of Thrones (yet)

Researchers from the Vrije Universiteit Amsterdam and the Dutch Royal Academy's Humanities Cluster evaluated four state-of-the-art tools for recognising names in text, to assess and improve their performance on popular fiction. They find solutions to boost the tools' capability to recognise names in one novel from an accuracy of 7% to 90%.

Natural language processing (NLP) tools are commonly used in many day-to-day applications such as Siri and Google, but the effectiveness of these technologies is not thoroughly understood. Researchers from the Vrije Universiteit Amsterdam and the Dutch Royal Academy's Humanities Cluster have performed a thorough evaluation of four different name recognition tools on popular 40 novels, including A Game of Thrones. Their analyses, published in PeerJ Computer Science, highlight types of names and texts that are particularly challenging for these tools to identify as well as solutions for mitigating this. In addition, they extracted social networks from the novels to explore differences in story structure. These insights can help make such technologies more robust against genre differences, and can help for example make this technology more useful to journalists wanting to analyse large datasets such as the Panama Papers.

Many NLP tools are based on machine learning; that is, a computer program is trained to identify patterns in text based on previously fed examples. To recognise names in text, it is for example fed many newspaper articles in which humans have meticulously marked the names. The program is then tasked to 'learn' what a name looks like based on context (such as, it being preceded by Mr) or the shape of the word (such as that names generally start with a capital letter in English). Now, the problem when applying such a system trained on newspapers to novels, is that authors of novels have much more freedom in their narrative than journalists who need to stick to facts. Fiction authors can make up their own names, such as Tywin or R'hllor, or use descriptive character names straight from the dictionary such as Grey Worm. These names do not behave like 'normal' names, thus NLP systems have difficulty recognising them in a text.

The experiments performed by Niels Dekker (Trifork B.V.), Tobias Kuhn (Vrije Universiteit Amsterdam) and Marieke van Erp (KNAW Humanities Cluster) also highlight the flexibility of language and how names are contextualised in stories. It is for example possible to refer to Daenerys Targaryen as Daenerys and she, but she is also known as Dany, Daenerys Stormborn, Mother of Dragons, Khaleesi, the Unburnt and Mhysa. The social network created for A Game of Thrones, illustrates for example that Dany is used by her friends, and her full name Daenerys only by her enemies (in her absence).

The research described in this publication shows that more attention should be paid to the performance of NLP tools and that there is still work to do before 'text' can be fully understood by computers.

More information: Dekker N, Kuhn T, van Erp M. 2019. Evaluating named entity recognition tools for extracting social networks from novels. PeerJ Computer Science 5:e189 doi.org/10.7717/peerj-cs.189

Provided by PeerJ

Citation: Why language technology can't handle Game of Thrones (yet) (2019, April 18) retrieved 3 July 2024 from https://techxplore.com/news/2019-04-language-technology-game-thrones.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New research has revealed we are actually better at remembering names than faces

4 shares

Feedback to editors

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

15 hours ago

New ink-based method offers best recipe yet for thermoelectric devices

16 hours ago

New recycling process can recover up to 99.97% of materials in perovskite solar cells

16 hours ago

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

17 hours ago

New design approach identifies routes to stronger titanium alloys

17 hours ago

Scientists develop new electrolytes for low-temperature lithium metal batteries

18 hours ago

Viologen redox flow batteries offer an alternative to vanadium

19 hours ago

Study employs image-recognition AI to determine battery composition and conditions

19 hours ago

Evidently efficient: Self-organization of informal bus lines in the Global South

20 hours ago

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

20 hours ago

Load comments (0)

Why language technology can't handle Game of Thrones (yet)

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

New research has revealed we are actually better at remembering names than faces

Meow hear this: Study says cats react to sound of their name

Google to allow some nicknames on Plus service

April Fools hoax stories could offer clues to help identify 'fake news'

New open access database for medieval literature

Google+ abandons need to use real names

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Phys.org

Medical Xpress

Science X

Why language technology can't handle Game of Thrones (yet)

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Viologen redox flow batteries offer an alternative to vanadium

Study employs image-recognition AI to determine battery composition and conditions

Evidently efficient: Self-organization of informal bus lines in the Global South

Statistical physics and network science reveal factors behind 2021–2022 energy crisis in Europe

Related Stories

New research has revealed we are actually better at remembering names than faces

Meow hear this: Study says cats react to sound of their name

Google to allow some nicknames on Plus service

April Fools hoax stories could offer clues to help identify 'fake news'

New open access database for medieval literature

Google+ abandons need to use real names

Recommended for you

Study employs image-recognition AI to determine battery composition and conditions

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Your Privacy