September 5, 2018 feature

A neural network to extract knowledgeable snippets and documents

by Ingrid Fadelli , Tech Xplore

Every day, millions of articles are published on social media and other platforms, receiving a vast amounts of clicks and shares from users navigating the web. Many of these articles contain useful information that, if extracted, could be used to compile knowledge databases or to deliver knowledge retrieval and question answering services.

Researchers at the Chinese Academy of Sciences (CAS) have developed a convolutional neural network (CNN)-based model to extract knowledgeable snippets and annotate documents. Their method, outlined on a paper pre-published on arXiv, was found to perform better than existing tools, despite being trained for shorter periods of time.

In their paper, the researchers define the term "knowledgeable document" as "a document containing multiple knowledgeable snippets, which describe concepts, properties of entities, or the relations among entities." So far, most knowledge bases, such as YAGO or DBpedia, extract knowledge based on Wikipedia, WordNet, GeoNames, and other online resources. However, compared to social media platforms, these resources often contain limited and inflexible information.

"Another recent knowledge base, Probase, with 2.7 million concepts, was automatically harnessed from the so-far largest corpus, consisting of 326 million knowledgeable sentences extracted from 1.68 billion web pages," the researchers wrote in their paper. "However, these sentences are extracted only by the Hearst patterns. For extracting more knowledgeable snippets to construct more comprehensive knowledge bases, semantic-based methods are needed to complement the previous pattern-based ones."

Knowledgeable snippets and articles could also be used to develop knowledge retrieval and question answering services. These services would, for instance, answer questions raised by users who are looking for help with a particular problem. With these applications in mind, the researchers at CAS set out to develop a CNN based model that can analyze the semantics of a document, determine whether it is knowledgeable or not, and extract knowledgeable snippets of information from it.

"Specifically, we propose SSNN, a joint CNN-based model, to understand the abstract concept of documents in different domains collaboratively and judge whether a document is knowledgeable or not," the researchers explain in their paper. "In more detail, the network structure of SSNN is 'low-level Sharing, high-level Splitting," in which the low-level layers are shared for different domains while the high-level layers beyond the CNN are trained separately to perceive the differences of different domains."

The model devised by the researchers offers an end-to-end solution to annotate documents that does not entail extensive and time-consuming feature engineering. They also developed manual features and trained a SVM classifier model to complete the task.

The researchers evaluated the effectiveness of their model on a dataset of real documents from three content domains on WeChat, a Chinese messaging, social media and mobile payment platform developed by Tencent. Their findings were very promising, with the SSNN performing consistently better than other CNN models, while saving time and memory consumption thanks to shorter and more efficient training processes.

"Compared with building multiple domain-specific CNNs, this joint model not only critically saves training time, but also improves the prediction accuracy visibly," the researchers wrote in their paper. "The superiority of the proposed model is demonstrated in a real dataset from Wechat public platforms."

In future, the SSNN model proposed in this study could be used to build more comprehensive knowledge databases. It could also aid the development of innovative services that answer user queries both quickly and exhaustively in real-time.

More information: Hierarchical Neural Network for Extracting Knowledgeable Snippets and Documents. arXiv:1808.07228v1 [cs.CL]. arxiv.org/abs/1808.07228

Abstract
In this study, we focus on extracting knowledgeable snippets and annotating knowledgeable documents from Web corpus, consisting of the documents from social media and We-media. Informally, knowledgeable snippets refer to the text describing concepts, properties of entities, or relations among entities, while knowledgeable documents are the ones with enough knowledgeable snippets. These knowledgeable snippets and documents could be helpful in multiple applications, such as knowledge base construction and knowledge-oriented service. Previous studies extracted the knowledgeable snippets using the pattern-based method. Here, we propose the semantic-based method for this task. Specifically, a CNN based model is developed to extract knowledgeable snippets and annotate knowledgeable documents simultaneously. Additionally, a "low-level sharing, high-level splitting" structure of CNN is designed to handle the documents from different content domains. Compared with building multiple domain-specific CNNs, this joint model not only critically saves the training time, but also improves the prediction accuracy visibly. The superiority of the proposed method is demonstrated in a real dataset from Wechat public platform.

Citation: A neural network to extract knowledgeable snippets and documents (2018, September 5) retrieved 29 June 2024 from https://techxplore.com/news/2018-09-neural-network-knowledgeable-snippets-documents.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Using machine learning for cross-lingual and cross-platform rumor verification

154 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Jun 28, 2024

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (0)

A neural network to extract knowledgeable snippets and documents

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Using machine learning for cross-lingual and cross-platform rumor verification

Education RE: herd immunity can up readiness to be vaccinated

Using multi-task learning for low-latency speech translation

A new machine learning strategy that could enhance computer vision

AI-assisted note-taking for electronic health records

Teaching the public more science likely won't boost support for funding, but sparking their curiosity might

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

A neural network to extract knowledgeable snippets and documents

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Using machine learning for cross-lingual and cross-platform rumor verification

Education RE: herd immunity can up readiness to be vaccinated

Using multi-task learning for low-latency speech translation

A new machine learning strategy that could enhance computer vision

AI-assisted note-taking for electronic health records

Teaching the public more science likely won't boost support for funding, but sparking their curiosity might

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy