July 30, 2018 feature

A new complex network-based approach to topic modeling

by Ingrid Fadelli , Tech Xplore

Researchers at Northwestern University, the University of Bath, and the University of Sydney have developed a new network approach to topic models, machine learning strategies that can discover abstract topics and semantic structures within text documents.

"One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts," the researchers explained in their study. "Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents."

Topic models are currently being used to identify semantically related texts and classify documents within a number of fields, including sociology, history, linguistics, and psychology. The most commonly used method, latent Dirichlet allocation (LDA), is also used for bibliometrical, psychological and political analysis, as well as for image processing.

Despite its widespread success, LDA presents several flaws in the way it represents text, such as a lack of method to choose the number of topics, discrepancies with statistical properties of real texts and a lack of justification for the Bayesian prior, which in Bayesian statistical inference is the probability distribution expressed before evidence is presented.

A large portion of recent research into topic models has focused on creating more sophisticated versions of LDA that perform better or can effectively analyze particular aspects of documents.

The approach developed by this team of researchers stems from network theory, a theory used in physics and other scientific fields that provides techniques for analyzing graphs, as well as structures in systems with different interacting agents. Their new framework for topic modeling is based on the approach used to find communities in complex networks, which, in the context of network theory, is a graph with features that occur in modeling of real-life systems.

"I was working on natural language and topic modeling from the perspective of complex systems and complex networks," Martin Gerlach, postdoctoral fellow at Northwestern University told TechXplore. "The problems seemed very similar, yet the communities of computer science (topic modeling) and complex networks seemed to work largely independently. Being trained as a physicist, we wanted to show that two seemingly different problems could be reduced to the same underlying math."

Gerlach and his colleagues devised a new approach to identifying topical structures that relates to the problem of finding communities in complex networks. Their technique represents text corpora as bipartite networks, a class of complex networks that divide nodes into sets X and Y, only allowing connections between nodes in different sets.

"We mapped the problem of topic modeling to the problem of community detection in a network consisting of words and documents showing that they are mathematically equivalent," explained Gerlach.

The researchers' approach, which adapts existing community-detection methods, was found to be more versatile and principled than other existing topic models, for instance detecting the number of topics present in texts and hierarchically grouping both words and documents. Their method used a stochastic block model (SBM), a generative model for graphs that generally maps communities, subsets of items that are connected with one another.

"We solve some of the intrinsic and known problems of popular topic modeling algorithms such as LDA (e.g. how to determine the number of topics)," said Gerlach. "In addition, our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields."

The SBM approach developed by Gerlach and his colleagues could have interesting applications in other areas where machine learning is used, such as the analysis of genetic codes or images. In future, the researchers plan to continue exploring the potential of complex networks both within the context of text analysis and beyond.

"The equivalence between topic modeling and community detection allows to use insights gained in each of the communities and apply to the other domain," said Gerlach. "I hope to use these insights to gain a better understanding of these machine learning algorithms; why they work, and more importantly, under which conditions they do not work."

More information: A network approach to topic models, Martin Gerlach et al. A network approach to topic models, Science Advances (2018). DOI: 10.1126/sciadv.aaq1360

Journal information: Science Advances

Citation: A new complex network-based approach to topic modeling (2018, July 30) retrieved 29 June 2024 from https://techxplore.com/news/2018-07-complex-network-based-approach-topic.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

88 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

A new complex network-based approach to topic modeling

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

Data science can tell us which political party is dominating

Physicists with green fingers estimate tree spanning rate in random networks

How community structure affects the resilience of a network

Does my algorithm work? There's no shortcut for community detection

Scientists teach neural network to identify a writer's gender

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

A new complex network-based approach to topic modeling

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

New algorithm can separate unstructured text into topics with high accuracy and reproducibility

Data science can tell us which political party is dominating

Physicists with green fingers estimate tree spanning rate in random networks

How community structure affects the resilience of a network

Does my algorithm work? There's no shortcut for community detection

Scientists teach neural network to identify a writer's gender

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy