September 7, 2022
Cleaning up social media with machine learning
Adult, or pornographic, content spam is a growing problem on social media. New research in the International Journal of Business Intelligence and Data Mining discusses how such content might be quickly detected and removed in a timely manner.
Deepali Dhaka, Surbhi Kakar, and Monica Mehrotra of Jamia Millia Islamia (Central University) in Jamia Nagar, New Delhi, India, explain how the general user experience and that of younger people using social media might be improved if obscene spam content can be filtered effectively and quickly. Machine learning tools are often the way forward in detecting particular types of content and the team has demonstrated that one such tool, XGboost, can detect adult spam content with more than 90% accuracy. This was the most effective classification algorithm of the six tested and adapted by the team for detecting pornographic spam on Twitter.
As such, fewer than ten in every hundred updates flagged as adult spam would be false positives. The team's approach needed to analyze just a small number of features, value system, the entropy of words, lexical diversity, and word embeddings, to be able to pluck adult spam updates from the general stream of updates on one of the most well-known social media platforms, Twitter.
Inherent in positive detection is that in general, everyday users of the platform discuss a wide variety of topics in different contexts and write and share in what might be referred to as an organic manner. In contrast, spammers and pornographic spammers, in this case, tend to have a fixed or even entirely automated approach to their updates, limited diversity of subject matter, as one would expect, and a very limited lexicon. These and other characteristics of spam messages, make them recognizable to the algorithm.