Using machine learning to detect unreliable Facebook pages
A growing number of companies and individuals worldwide are creating Facebook pages for marketing and advertising purposes. This is because Facebook offers the possibility to communicate to potential or existing customers free of charge, advertising new products, offers or services.
Yet, precisely because this service is free and easy to access, malicious users are using it to create deceptive pages. Detecting and identifying unreliable pages is of key importance, as it might help to warn users and reduce malicious activity on the platform.
Researchers worldwide have hence been trying to develop methods to detect and prevent deception on Facebook and other social media platforms. Panida Songram, a researcher at Mahasarakham University, in Thailand, has recently carried out a study investigating the use of supervised machine learning to detect the reliability or unreliability of Facebook pages.
"This paper aims to detect and investigate the characteristics of unreliable and reliable Facebook pages," Songram wrote in her paper, which was published in Springer's Artificial Life and Robotics journal. "Effective machine learning models and feature selection methods are also investigated for detecting unreliable and reliable pages."
Songram extracted a vast number of features that could help to determine whether a page is reliable or not, including page details, information about a product or service, user responses and post behavior of the page administrator. She then trained a supervised machine learning tool to analyze these features and classify pages as reliable or unreliable.
"First, Facebook pages are randomly collected and then they are labeled by five users," Songram explained in her paper. "Facebook pages with agreement of five users are selected and their information is retrieved using the Facebook Graph API. Next, features are extracted from the information and investigated in the experiments."
Songram evaluated the effectiveness of different classifiers in detecting unreliable and reliable pages. She found that KNN was the best classifier, achieving 88.67 percent accuracy. She also carried out an analysis of Facebook page features, to better understand what typically characterizes reliable or unreliable pages.
"For unreliable pages, the number of days between date of last post and retrieved date is high and the number of posts per week (post frequency) is very small," Songram wrote in her paper. "It indicates that unreliable pages are not active, while reliable pages are active."
Songram observed that the amount of people discussing unreliable pages online is significantly smaller than those talking about reliable pages. A possible explanation for this is that often users realize that the pages are unreliable and they hence do not talk about them online. Posts on reliable pages also contained far more URLs than those on unreliable pages, as well as more information about the company and its products/services.
Using what she found to be the top 10 features to determine a Facebook page's reliability, Songram achieved a classification accuracy of 91.37 percent. In future, her findings could aid the development of more effective tools to quickly detect unreliable Facebook pages.
More information: Panida Songram. Detection of unreliable and reliable pages on Facebook, Artificial Life and Robotics (2018). DOI: 10.1007/s10015-018-0509-z
© 2018 Science X Network