January 22, 2020

Data mining hyphenated headlines: Improving named entity recognition

by David Bradley, Inderscience

Data mining and extraction of knowledge from disparate sources is big data, big business. But, how does the search software cope with entities that are mentioned where only part of their name is used or a name is hyphenated when it normally isn't? Research published in the International Journal of Intelligent Information and Database Systems reveals details of a new approach to improving named entity recognition and disambiguation in news headlines.

Jayendra Barua and Rajdeep Niyogi of the Department of Computer Science and Engineering, at the Indian Institute of Technology, in Roorkee, Uttarakhand, India, explain that their approach to such an analysis of current news headlines builds on a trained algorithm that has been taught to remove the hyphens and complete incomplete names to remove ambiguity.

The team's evaluation of their novel approach shows that it works with approximately 10 percent greater accuracy than conventional systems and so could improve the automated retrieval of news associated with particular companies, organizations, events, public figures, and other entities of interest to those data mining the news. The system works well with newsfeeds, such as the RSS type of newsfeed generated by regularly updated websites. Headlines from such sources might commonly be longer than conventional newspaper headlines but are nevertheless succinct, commonly being ten or fewer words long. Each word might then be important in a data mining context and so disambiguation is critical.

More information: Jayendra Barua et al. Improving named entity recognition and disambiguation in news headlines, International Journal of Intelligent Information and Database Systems (2020). DOI: 10.1504/IJIIDS.2019.104530

Journal information: International Journal of Intelligent Information and Database Systems

Provided by Inderscience

Citation: Data mining hyphenated headlines: Improving named entity recognition (2020, January 22) retrieved 17 July 2024 from https://techxplore.com/news/2020-01-hyphenated-headlines-entity-recognition.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Rating news sources can help limit the spread of misinformation

2 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

11 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

13 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

15 hours ago

Large language models make human-like reasoning mistakes, researchers find

16 hours ago

Unveiling a new class of synthetic fuels

16 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

16 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

17 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

20 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

22 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Data mining hyphenated headlines: Improving named entity recognition

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Rating news sources can help limit the spread of misinformation

Humans and AI team up to improve clickbait detection

Facebook launches a news section—and will pay publishers

Computer scientists improve access to millions of US patents records

Through analysis of 'named entities', computers can extract more information from texts

How on earth does geotagging work?

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Phys.org

Medical Xpress

Science X

Data mining hyphenated headlines: Improving named entity recognition

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Rating news sources can help limit the spread of misinformation

Humans and AI team up to improve clickbait detection

Facebook launches a news section—and will pay publishers

Computer scientists improve access to millions of US patents records

Through analysis of 'named entities', computers can extract more information from texts

How on earth does geotagging work?

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Your Privacy