October 8, 2018
Social media data used to predict retail failure
Researchers have used a combination of social media and transport data to predict the likelihood that a given retail business will succeed or fail.
Using information from ten different cities around the world, the researchers, led by the University of Cambridge, have developed a model that can predict with 80% accuracy whether a new business will fail within six months. The results will be presented at the ACM Conference on Pervasive and Ubiquitous Computing (Ubicomp), taking place this week in Singapore.
While the retail sector has always been risky, the past several years have seen a transformation of high streets as more and more retailers fail. The model built by the researchers could be useful for both entrepreneurs and urban planners when determining where to locate their business or which areas to invest in.
"One of the most important questions for any new business is the amount of demand it will receive. This directly relates to how likely that business is to succeed," said lead author Krittika D'Silva, a Gates Scholar and Ph.D. student at Cambridge's Department of Computer Science and Technology. "What sort of metrics can we use to make those predictions?"
D'Silva and her colleagues used more than 74 million check-ins from the location-based social network Foursquare from Chicago, Helsinki, Jakarta, London, Los Angeles, New York, Paris, San Francisco, Singapore and Tokyo; and data from 181 million taxi trips from New York and Singapore.
Using this data, the researchers classified venues according to the properties of the neighbourhoods in which they were located, the visit patterns at different times of day, and whether a neighbourhood attracted visitors from other neighbourhoods.
"We wanted to better understand the predictive power that metrics about a place at a certain point in time have," said D'Silva.
Whether a business succeeds or fails is normally based on a number of controllable and uncontrollable factors. Controllable factors might include the quality or price of the store's product, its opening hours and its customer satisfaction. Uncontrollable factors might include unemployment rates of a city, overall economic conditions and urban policies.
"We found that even without information about any of these uncontrollable factors, we could still use venue-specific, location-related and mobility-based features in predicting the likely demise of a business," said D'Silva.
The data showed that across all ten cities, venues that are popular around the clock, rather than just at certain points of day, are more likely to succeed. Additionally, venues that are in demand outside of the typical popular hours of other venues in the neighbourhood tend to survive longer.
The data also suggested that venues in diverse neighbourhoods, with multiple types of businesses, tend to survive longer.
While the ten cities had certain similarities, the researchers also had to account for their differences.
"The metrics that were useful predictors vary from city to city, which suggests that factors affect cities in different ways," said D'Silva. "As one example, that the speed of travel to a venue is a significant metric only in New York and Tokyo. This could relate to the speed of transit in those cities or perhaps to the rates of traffic."
To test the predictive power of their model, the researchers first had to determine whether a particular venue had closed within the time window of their data set. They then 'trained' the model on a subset of venues, telling the model what the features of those venues were in the first time window and whether the venue was open or closed in a second time window. They then tested the trained model on another subset of the data to see how accurate it was.
According to the researchers, their model shows that when deciding when and where to open a business, it is important to look beyond the static features of a given neighbourhood and to consider the ways that people move to and through that neighbourhood at different times of day. They now want to consider how these features vary across different neighbourhoods in order to improve the accuracy of their model.