April 15, 2021
Neural network uses data on banking transactions for credit scoring
Researchers from Skoltech and a major European bank have developed a neural network that outperforms existing state-of-the art solutions in using transactional banking data for customer credit scoring. The research was published in the proceedings of the 2020 IEEE International Conference on Data Mining (ICDM).
Machine learning algorithms are already extensively used in risk management, helping banks assess clients and their finances. "A modern human, in particular a bank client, continually leaves traces in the digital world. For instance, the client may add information about transferring money to another person in a payment system. Therefore, every person obtains a large number of connections that can be represented as a directed graph. Such a graph gives an additional information for client's assessment. An efficient processing and usage of the rich heterogeneous information about the connections between clients is the main idea behind our study," the authors write.
Maxim Panov, who heads the Statistical Machine Learning group, and Kirill Fedyanin from Skoltech and their colleagues were able to show that using the data about money transfers between clients improves the quality of credit scoring quite significantly compared to algorithms that only use the target client's data. That would help to make better offers for trustworthy clients while lowering the negative effect of fraudulent activity.
"One of the defining properties of a particular bank client is his or her social and financial interactions with other people. It motivated us to look at bank clients as a network of interconnected agents. Thus, the goal of the study was to find out whether the famous proverb "Tell me who your friends are and I will tell you who you are" applies to financial agents," Panov says.
Their edge weight-shared graph convolutional network (EWS-GCN) uses graphs, where nodes correspond to anonymized identifiers of bank clients and edges are interactions between them, to aggregate information from them and predict the credit rating of a target client. The main feature of the new approach is the ability to process large-scale temporal graphs appearing in banking data as is, i.e. without any preprocessing which is usually complex and leads to partial loss of the information contained in the data.
The researchers ran an extensive experimental comparison of six models and the EWS-GCN model outperformed all its competitors. "The success of the model can be explained by the combination of three factors. First, the model processes rich transactional data directly and thus minimizes the loss of information contained in it. Second, the structure of the model is carefully designed to make the model expressive and efficiently parametrized, and finally, we have proposed a special training procedure for the whole pipeline," Panov notes.
He also says that for the model to be used in banking practice, it has to be very reliable. "Complex neural network models are under the threat of adversarial attacks and due to the lack of knowledge of this phenomenon in relation to our model, we cannot use it in the production process at the moment, leaving it for further research," Panov concludes.