Researchers at UC Davis have recently developed a new machine learning based tool to verify multimedia rumors online. Their paper, pre-published on arXiv, proposes cross-lingual and cross-platform features for rumor verification, which leverage the semantic similarity between rumors and information on other websites. Their method can combine information from multiple languages to get a complete picture of online news.
A growing number of people worldwide are now using devices to read the news and learn about what is happening in the world. However, social media platforms are largely un-moderated, resulting in the proliferation of fake news, which is often accompanied by fabricated or de-contextualized multimedia content. Fake rumors can spread very quickly online, causing havoc and confusion among readers, so the development of tools to verify the authenticity of online information is of pressing importance.
"Our research is inspired by the increasing popularity of fake news attached by multimedia content in social networks," Weiming Wen, one of the graduate researchers who carried out the study, told Tech Xplore. "It is mainly about how to use NLP techniques to verify rumors with multimedia content. The basic idea is to solve the problem through machine learning—extracting specific features from this type of rumor and building a model to classify rumors as fake or real."
Past rumor verification research used multimedia content as input features, leveraging forensic features of images or videos to determine whether they have been tampered with. Although these image features enhanced results, most of these studies could not effectively use multimedia content to verify rumors on Twitter consistently.
A possible reason for this is that often, multimedia content attached to fake news is merely borrowed from authentic events and is somewhat semantically aligned with the text that accompanies it. This means that the image itself is real, but is placed in an entirely different story to make the fake rumor more credible.
The researchers at UC Davis proposed an alternative way of verifying rumors that leverages multimedia content by finding information associated with it on other news platforms.
Most existing rumor verification datasets are monolingual, for instance, only including multimedia content presented with English or Chinese text. The researchers created a new cross-lingual, cross-platform rumor verification dataset (CCMR), comprising three sub-datasets: CCMR Twitter, CCMR Google and CCMR Baidu.
"When we say multimedia rumors, we mean tweets or other social media content that are not verified and have images or videos along with the text," Zhou Yu, assistant professor at UC Davis, who carried out the study, told Tech Xplore. "Text and image are considered two different information channels. We are leveraging vision information in an innovative way, using it as a pivot to link news from different platforms and in different languages."
The features developed by the researchers embed both the rumor and the associated titles on different web pages into 300-dimension vectors with a pre-trained multilingual sentence embedding. They trained their multilingual sentence embedding algorithm on 453,000 pairs of English and Chinese parallel news, as well as micro-blogs in the UM-Corpus dataset. This algorithm can combine news from multiple languages, attaining more effective rumor verification.
"Given a rumor attached with an image, we first search the image via Google Image to get a bunch of related posts," Wen explained. "We then extract features of this rumor by computing the similarity and agreement between the rumor and the searched posts. Finally, we use our pre-trained model to verify this rumor using its features."
When tested, machine learning methods that used the cross-lingual and cross-platform features proposed by the researchers achieved state-of-the-art rumor verification results. These features were also found to be compact and generalizable across languages.
"I think the most meaningful part of our study is that we developed a rumor verification framework that works specifically for multimedia rumors, which is extremely common, but has not been studied thoroughly," Wen said. "With this framework, we can efficiently verify multimedia rumors from platforms such as Facebook and Twitter."
This study could be an important milestone on the path to developing effective ways of validating online rumors that are accompanied by multimedia content. Moreover, the English-Chinese dataset put together by the researchers could be used in further research exploring methods for cross-lingual rumor verification.
"In future, we plan to generate reasons for our verification results about multimedia rumors," Wen said. "Besides classifying a rumor as fake, we also want to automatically generate a reason, such as 'this post is fake because it borrows an image from another event to prove its statement,'" Wen said.