A trio of researchers, two from Cornell the other from Stanford has developed a computer algorithm that is capable of identifying antisocial behavior as demonstrated in website comment sections. In their paper uploaded to the preprint server arXiv, Justin Cheng, Cristian Danescu-Niculescu-Mizil and Jure Leskovec describe their algorithm, how they came up with it and how they plan to improve on its accuracy.
On the Internet, people who engage in antisocial ways in the comments section of web content, have come to be known as trolls, and they represent, like those who dish out spam, a constant source of annoyance—so much so that big name websites like CNN, are working with academics to find ways to identify trolls and ban them before they cause too much trouble. Visitors to web sites that are harassed or made to feel bad often avoid websites where they feel they have been angered or annoyed. In this new effort, the researchers built their troll finding algorithm by engaging in an analysis of typical troll behavior with data provided by CNN.com, Breitbart.com and IGN.com.
In scouring the data (comparing the behavior of those that have been banned against others that have never been banned) over an 18 month period which included studying the comments of over 10,000 Future Banned Users (FBUs), the researchers discovered some patterns in troll behavior—first, they noticed that on average troll posts were less literate than non-trolls—and they tended to get less literate the more they posted to a site. They also found that fellow posters were initially patient with trolls, but reached a plateau, at which point, banning came quickly.
The researchers report that it was relatively easy to spot FBUs and to convert what they had found to something a computer could understand—starting with what they called an Automated Readability Index. After writing their algorithm and working out issues, the team reports that they were able to spot FBUs with an 80 percent accuracy rate after just ten posts. That is not high enough for web sites owners, of course, banning non-trolls by mistake 20 percent of the time could lead to driving away visitors—but it could possibly be used as a way to assist moderators.