Fake news detector algorithm works better than a human

fake news
Credit: CC0 Public Domain

An algorithm-based system that identifies telltale linguistic cues in fake news stories could provide news aggregator and social media sites like Google News with a new weapon in the fight against misinformation.

The University of Michigan researchers who developed the system have demonstrated that it's comparable to and sometimes better than humans at correctly identifying stories.

In a recent study, it successfully found fakes up to 76 percent of the time, compared to a human success rate of 70 percent. In addition, their linguistic analysis approach could be used to identify fake news articles that are too new to be debunked by cross-referencing their facts with other stories.

Rada Mihalcea, the U-M computer science and engineering professor behind the project, said an automated solution could be an important tool for sites that are struggling to deal with an onslaught of fake news stories, often created to generate clicks or to manipulate public opinion.

Catching fake stories before they have real consequences can be difficult, as aggregator and social media sites today rely heavily on human editors who often can't keep up with the influx of news. In addition, current debunking techniques often depend on external verification of facts, which can be difficult with the newest stories. Often, by the time a is proven a fake, the damage has already been done.

Linguistic analysis takes a different approach, analyzing quantifiable attributes like grammatical structure, word choice, punctuation and complexity. It works faster than humans and it can be used with a variety of different news types.

"You can imagine any number of applications for this on the front or back end of a news or social media site," Mihalcea said. "It could provide users with an estimate of the trustworthiness of individual stories or a whole news site. Or it could be a first line of defense on the back end of a news site, flagging suspicious stories for further review. A 76 percent success rate leaves a fairly large margin of error, but it can still provide valuable insight when it's used alongside humans."

Linguistic algorithms that analyze written speech are fairly common today, Mihalcea said. The challenge to building a fake news detector lies not in building the algorithm itself, but in finding the right data with which to train that algorithm.

Fake news appears and disappears quickly, which makes it difficult to collect. It also comes in many genres, further complicating the collection process. Satirical news, for example, is easy to collect, but its use of irony and absurdity make it less useful for training an algorithm to detect fake news that's meant to mislead.

Ultimately, Mihalcea's team created its own data, crowdsourcing an online team that reverse-engineered verified genuine news stories into fakes. This is how most actual fake news is created, Mihalcea said, by individuals who quickly write them in return for a monetary reward.

Study participants, recruited with the help of Amazon Mechanical Turk, were paid to turn short, actual news stories into similar but fake news items, mimicking the journalistic style of the articles. At the end of the process, the research team had a dataset of 500 real and fake news stories.

They then fed these labeled pairs of stories to an algorithm that performed a linguistic analysis, teaching itself distinguish between real and fake news. Finally, the team turned the algorithms to a dataset of real and fake news pulled directly from the web, netting the 76 percent success rate.

The details of the new system and the dataset that the team used to build it are freely available, and Mihalcea says they could be used by news sites or other entities to build their own fake news detection systems. She says that future systems could be further honed by incorporating metadata such as the links and comments associated with a given online item.

A paper detailing the system will be presented Aug. 24 at the 27th International Conference on Computational Linguistics in Santa Fe, N.M. Mihalcea worked with U-M computer science and engineering assistant research scientist Veronica Perez-Rosas, psychology researcher Bennett Kleinberg at the University of Amsterdam and U-M undergraduate student Alexandra Lefevre.

The paper is titled "Automatic detection of Fake News."

Explore further

The fake news detector

More information: Automatic Detection of Fake News, arXiv:1708.07104 [cs.CL] arxiv.org/abs/1708.07104
Citation: Fake news detector algorithm works better than a human (2018, August 21) retrieved 18 September 2019 from https://techxplore.com/news/2018-08-fake-news-detector-algorithm-human.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

Aug 21, 2018
Their definition of fake news is someone with a different viewpoint. Their censorship needs to stop. I hate to say this but it may take government intervention. They are trying to change an election for their personal purposes.

Aug 21, 2018
lol , so easy to create a '' Linguistic algorithm '' to detect fake news, just look for the letters CNN ,ABC , MSNBC

Aug 21, 2018
So Jimmy and noosebalm, my deer boys. You are "saying" that any opinion that is not obedient to your prejudices must be suppressed?

That none of us is to be permitted to vote against your dictates?

You seem to be claiming superiority above all the rest of us? In your state of awesomeness you intend to disregard any decision of which you do not approve?

That you would even make such a stupid comment as the one above? Just confirms your inferiority and low intelligence.

That you are typical altright fairytails. Too cowardly to honorably compete on an honest playing field of multiple ideas and ideals.

Since you were sleeping through your Civic classes? The point of voting is that we each have an opportunity to influence government.

You fake patriots wrap yourselves in the flag and apple pie (messy!) as you parrot a few politically correct slogans. As you tear up the Constitution and deny Civil Rights to all who do not submit to your cult of fear and hate.

Aug 21, 2018
doesn't seem like it works for the scienceX sites lmao

Aug 21, 2018
The algorithm for determining fake news will eventually fall into the wrong hands and the bad guys will tweak their content until it passes the fake news test. The Russians are increasing their efforts to meddle with American democracy. Even more dangerous is that Russians operate actual real news sites in America, building themselves a trusted platform that can start spewing fake news covertly without he subscribers suspecting. People like jimmy and snoozebomb have already fallen victim to the previous wave of fake news. These new covertly fake news sites will be even more damaging. The Russian disinformation campaign is just getting started.

Aug 21, 2018
It's been my experience that someone who uses the word "truth" several times is probably trying to sell you some crap.

Aug 22, 2018
Interesting. It's essentially a lie detector that uses linguistic cues... I wonder if it only works on news stories.

Aug 22, 2018
Anyone who believes that the definition of "fake news" is someone who writes or says something you disagree with is deeply deluded and flat wrong. The definition of fake news relates to its factual accuracy. If for example, I post the news that I have discovered that 2+2=13, that would be "fake news". News you disagree with, dislike, or demeans your favorite political buffoon is only faker if it is factually incorrect.

On the other hand, the way the training samples were created doesn't lead me to have any confidence that the algorithm itself is useful at all. Using Deep Learning Networks, it is easily to pluck the linguistic styles of the students that modified the original news stories, and I strongly suspect that is what is being measured.

Aug 22, 2018
So 24% of genuine (but probably unpopular) news will be classed as "fake"

Aug 22, 2018
It's essentially a lie detector that uses linguistic cues... I wonder if it only works on news stories

There's similar approaches for academic texts (and at a very basic level there's a filter that checks if you were drunk while typing and will either ask for permission before sending or delay sending by 12 hours).

One of the linguistic cues for fake news is probably the use of qualitative adjectives ("sad", "bad", "great", etc. ) and simple sentence structures over quantitative statements and elaborate logical constructs. Fake news targets the gut and not the mind (hence adjectives). Fake news is also geared towards the dumb (hence simple sentence structures).

In essence: Fake news is targeted at the Twitter user

Aug 22, 2018
@ wiils , you antifa creeps really suffer from that tranference thing ie alex jones actually is suppressed

Aug 27, 2018
If this application depends on linguistic cues, than than fact checking, can it really distinguish fake news from sarcasm or satire?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more