Journal of Financial Risk Management
Vol.06 No.02(2017), Article ID:77205,9 pages
10.4236/jfrm.2017.62015

The P2P Risk Assessment Model Based on the Improved AdaBoost-SVM Algorithm

Jianhui Yang, Dongsheng Luo

School of Business and Management, South China University of Technology, Guangzhou, China

Copyright © 2017 by authors and Scientific Research Publishing Inc.

This work is licensed under the Creative Commons Attribution International License (CC BY 4.0).

http://creativecommons.org/licenses/by/4.0/

Received: May 31, 2017; Accepted: June 24, 2017; Published: June 27, 2017

ABSTRACT

The improved AdaBoost-SVM algorithm is used to classify the safety and the risk from the Peers-to-Peers net loan platforms. Since the SVM algorithm is hard to deal with the rare samples and its training is slow, rule sampling is used to reduce the classify noise. Then, with the combinations of learning machine, P2P risks can be identified. The result shows that IAdaBoost algorithm can improve the risk platform classification accuracy. And the error of classification can be controlled in 5%.

Keywords:

Peers-to-Peers, AdaBoost, SVM, The Combinations of Learning Machine, Rule Sampling

1. Introduction

In recent years, owing to the development of the domestic Internet financial business, the traditional financial industry has to reform rapidly. With the global integration process intensified, modern finance is showed a complex form. The complexity of the financial system makes the risk spread faster and faster, and the scope of the impact between the platforms is also growing.

As an important form of Internet finance, Peers-to-Peers loan, the risk infection and measurement are also of concern. Credit risk, which is the main problem faced by the P2P market, is largely associated with the fuzziness of risk factors. Measuring the credit risk is the inherent risk management requirement of P2P and bank market, and it is also an important basis for effective prevention of financial risk.

Domestic scholars on the network lending (P2P) were focused on the discussion of its platform operation mode and development trends, as well as network lending (P2P) industry risk control and risk management issues. From the new perspective of “platform risk”, we have expanded the research of P2P domain (Ye, Li, & Xu, 2016) . Wang mainly analyzes into the P2P network lending platform for risk regulation and prevention analysis and policy considerations (Wang, 2016) . Liu analyzes the risk characteristics of China’s P2P industry from three different perspectives of lenders, investors and platforms, and constructs an improved debtor risk assessment model (Liu, 2013) . Luo Chunyu, when studies the network P2P (P2P) risk assessment, builds quantitative methods and constructs the investor composition analysis model, as well as the borrower credit risk analysis model and multi-information source loan assessment model, supporting the investors to provide decision (Luo, 2012) .

The foreign research of P2P network lending platform, mostly analyzes the main behavior of borrower transactions and platform development trends. Considering the current research on the credit characteristics and loan success factors of the main body of the transaction, we mainly analyze the risk problems and the dislocation of the network, and the lack of supervision. This is why China is not as good as Britain and the United States with complete and transparent credit system. What’s more, their network lending system (P2P) is developed into the scope of supervision. Compared to foreign, our network lending (P2P) still has to be improved.

In the context of this difference between domestic and foreign, P2P credit risk measurement and evaluation depends on the data screening and model establishment. In the machine learning algorithm, the commonly used algorithm models include perceptron, K-nearest neighbor, Decision Tree, Logistic regression, Support Vector Machine, AdaBoost algorithm, Hidden Markov, Conditional Random Field and so on. The machine learning algorithm is applied to the P2P risk assessment, which can effectively improve the evaluation and classification model. The traditional support vector machine algorithm training problem, in essence, is a convex secondary programming problem. Using the P2P risk measurement and risk assessment, we get the P2P platform indicators data, P2P network loan platform risk division, so as to filter the problem platform.

In view of the simple SVM algorithm, the sample set is required to be high, and the combined learning method generates multiple base classifiers by splitting learning and assembling them according to a certain strategy. The result of the combined classifier depends on the single base classifier. As a result of the determination, the error of the classification can be effectively reduced by the combination characteristics of the various base classifiers.

Boosting algorithm is a commonly used statistical learning method, which is widely used and effective. In the classification problem, it improves the classifier performance by changing the weight of training samples, combining multiple classifiers, and classifying these classifiers linearly. Applied to SVM, it can be enhanced for the separation and division of the sample set. It can change the probability distribution of training data, and call a weak learning algorithm for a series of training data distributions to learn a series of classifiers.

2. Statement of the Problem

Because of the huge risk of P2P platform, we focus on how to build the model and measure the P2P risk. As a result, the following article analyzes the P2P risk source and credit evaluation index system, and solves the risk assessment of P2P to avoid investing in bad P2P platform.

2.1. P2P Risk Assessment and Credit Index System

P2P network loan platform has faced many aspects of the risk source, including the platform itself and the risk of the risk of infection between platforms. Under the influence of many risk factors, the development and growth of P2P platform will be seriously constrained.

The current net loan is rating and there is no recognized standard and qualification; each rating agencies consider the dimensions and standards, and cannot really reflect the level of a platform. Table 1 shows the evaluation index system used by the third-party rating agencies of each loan platform. For example, 360 large data research institute focus on the background strength of the platform; Dagong international rating report more focused on the debtor solvency of the inspection; Academy of Social Sciences Institute of Finance’s evaluation system is focused on the level of risk control platform; Home of the comprehensive evaluation system set indicators and not for security (Yu, 2015) .

The evaluation of the network borrowing (P2P) platform can be properly referred to the commercial bank credit rating method. Table 2 below shows a comparison of the rating system for commercial banks. For the commercial bank credit rating, mainly the United States Federal Financial Institutions Regulatory Commission CAMELS rating system, the China Banking Regulatory Com- mission issued a joint-stock commercial bank risk rating system and Moody’s as

Table 1. Evaluation index system adopted by the third-party rating agency of net credit platform.

Source: Net loan home, Yu Jiamin: network lending (P2P) platform quantitative monitoring research.

Table 2. Comparison of the rating system of commercial banks.

Source: Qi Fei (2012) Yu Jiamin: Network lending (P2P) platform quantitative monitoring research.

the representative of the rating agencies also have a mature commercial bank rating system. Six factors such as capital adequacy, asset quality and management level are summarized by the rating system adopted by regulators and international and domestic authorities, as shown in Table 2. It is of great significance for the research of this paper to summarize the rating system of commercial banks, especially the regulatory authorities (Li, Liu, & Chen, 2015) .

2.2. SVM Algorithm Improvement

SVM technology mentioned showed below is the base classifier under the P2P network loan platform. The advantage of this method is that the number of classifiers is small, and the algorithm is simple and complicated (Ju, Wang, & Yao, 2012) . But there are some drawbacks to this approach:

(1) Base classifier learning needs to train all samples, its training is slow.

(2) Poor treatment of rare classify.

Taking into account the above problems, the following selection of sampling is training methods. Training a data set with a subset of the samples can effectively avoid repetitive learning of the entire sample of the base classifier. Its advantages are as below:

(1) The basis of the classifier to repeat the study only part of the training sample, its training speed can be effectively promoted

(2) Sampling training covers most of the sample data, it can avoid the classifier to ignore the rare class phenomenon.

Therefore, P2P platform classification also uses a similar sampling training method to avoid the special platform data caused by the training set of unbalanced problems.

In this paper, AdaBoost is applied to SVM classification, and the sample set of each classifier is extracted from the original data set, and the improved AdaBoost-SVM classifier is obtained by multiple iterations.

2.3. AdaBoost Algorithm Detail

Now, { ( x i , y i ) | i = 1 , 2 , , N } represents a collection of N training samples. In the AdaBoost algorithm, the accuracy of the base classifier is closely related to its error rate. The initial sample is equal, and in each subsequent iteration, AdaBoost adjusts the weight on each sample, calculates the error rate of the classifier on the training set, and corrects the probability distribution of the training set.

Algorithm: Enters the sequence of N labeled instances, the distribution D on the N instances, such as ( x 1 , y 1 ) , , ( x n , y n ) . The algorithm of the training base classifier, and the number of iterations T (Dong, Geng, & Zhou, 2007; Joshi et al., 2002; Ju, Wang, & Yao, 2012; Tan, Steinbach, & Kumar, 2006) .

(1) Initialization: initialize the same weight for each sample: 1 N ;

(2) Adjust the distribution: P t = w t t = 1 D w t ;

(3) Passing the distribution to the base classifier training model, returning the prediction: x [ 0 , 1 ] ;

(4) Calculate the prediction error rate: ε t = t = 1 N P t | h t x t y t | ;

(5) the Importance of calculating the base classifier: α t = ε t 1 ε t ;

(6) Calculate the new weight vector: w t + 1 = w t α t 1 | h t x t y t | .

In addition, IAdaBoost algorithm is based on the idea of AdaBoost algorithm, in order to avoid the base classifier to ignore the rare class, the initial weight of the sample with the sample size of the class to mark, to get a balanced sample classifier (Chew, Crisp, & Bogner, 2000; Wang & Le, 2005) .

3. Outcome of Practice

Empirical data is from the Network Loan Home Platform (http://www.wdzj.com/), statistics from the September 21, 2016 to the February 21, 2017. It is a total of 6 months of P2P network loan platform data. Table 3 below is the data classification table for the sample set. Data sets are all equally divided. Obviously, only 6 months is not enough for the training. But with the sampling, we re-established a sufficient sample set. The training set is the first five months of the data, the test set is the last month. Respectively, when using SVM, AdaBoost-SVM and IAdaBoost-SVM three algorithms to test, we compare the correct classification of the problem platform (the closed platform replaces with 1, the normal platform replaces with 0).

The results of IAdaBoost-SVM. SVM and AdaBoost-SVM are compared. The parameters σ are taken as the AdaBoost classifier with the fixed parameter value, the parameter value is 6, the penalty parameter C is 100, the dimension of the dimension is 507, the number of iterations of the IAdaBoost algorithm and the AdaBoost algorithm is 5 (Li, Liu, & Chen, 2015) .

3.1. Correct Classification Chart

From the classification results in Figure 1 and Figure 2, we can see that in the range of small samples, IAdaBoost improved algorithm is higher than AdaBoost and SVM classification accuracy. At the same time, when the sample set reaches more than 1300 when the number of samples, the combination of learning performance is more excellent, in some cases, the test set the correct rate of about 90%. This allows us to correctly implement P2P risk measurement and risk forecasting.

Of course, we can see from the figure, IAdaBoost algorithm to improve the effect of rare data sets more effective.

3.2. Predictive Effect

As can be seen from Figure 3 and Figure 4, under normalized conditions, the

Table 3. SVM, AdaBoost-SVM and IAdaBoost-SVM SampleSet (Divided into 5 Samples).

The Source: Network Loan Home Platform (http://www.wdzj.com/).

Figure 1. Classification of SVM, AdaBoost-SVM and IAdaBoost-SVM in sample set. Source: Network loan home platform (http://www.wdzj.com/).

Figure 2. Classification of SVM, AdaBoost-SVM and IAdaBoost-SVM in the sample collection. Source: Network loan home platform (http://www.wdzj.com/).

Figure 3. Scatter plot and error rate of the sample set A and test set A under the normalizedAdaBoost algorithm. Source: Network loan home platform (http://www.wdzj.com/).

simulated normal platform and the problem platform can be roughly super-plane classification (blue for the normal platform, red for the simulation of the problem platform).The combination learner can effectively control the error

Figure 4. Scatter plot and error rate of the sample set B and test set B under the normalizedIAdaBoost algorithm. Source: network loan home platform (http://www.wdzj.com/).

Table 4. Base classifier and its weight ratio.

Source: Network loan home platform (http://www.wdzj.com/).

rate within 5% of the learning process. The final base classifier and its weight are shown in Table 4. Boost 1, Boost 2 and Boost 5 have higher weights, more than 20%; the rest is lower weight.

4. Conclusion

The IAdaBoost algorithm proposed in this paper not only reduces the training sample, cuts the training range, deals with the unbalanced sample category, but also removes some of the noise data and selects the reliable sample points for training. In addition, the initialization of the improved algorithm can improve the weight of the rare samples, which is beneficial to the correct classification of rare samples. Application of the P2P network loan platform risk assessment can effectively screen out the problem platform, so as to carry out risk management. Of course, AdaBoost-SVM model also has its shortcomings. Sample sets and training set of data should be more detailed, and there is still room for improvement of sampling methods. In addition, the weights of the initial classification of the algorithm can be preprocessed to improve the processing speed of the model risk calculation.

Cite this paper

Yang, J. H., & Luo, D. S. (2017). The P2P Risk Assessment Model Based on the Improved AdaBoost- SVM Algorithm. Journal of Financial Risk Management, 6, 201-209. https://doi.org/10.4236/jfrm.2017.62015

References

  1. 1. Chew, H.-G., Crisp, D. J., Bogner, R. E. et al. (2000). Target Detection in Radar Imagery Using Support Vector Machines with Training Size Biasing. In Proceedings of the Sixth International Conference on Control, Automation, Robotics and Vision, Singapore.

  2. 2. Dong, L. H., Geng, G. H., & Zhou, M. Q. (2007). Design of Text Automatic Classifier Based on Boosting Algorithm. Computer Applications, No. 2, 384-386. [Paper reference 1]

  3. 3. Joshi, M. V., Agarwal, R. C., & Kumar, V. (2002). Predict Rare Classes: Can Boosting Make Any Weak Learner Strong? In Proceedings of the Eighth ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD2002), Edmonton, Canada. [Paper reference 1]

  4. 4. Ju, X., Wang, H., & Yao, H. L. (2012). Combinatorial Classifier Based on Boosting Support Vector Machine. Journal of Hefei University of Technology, No. 10, 1220-1222. [Paper reference 2]

  5. 5. Li, Y. J., Liu, X. X., & Chen, P. (2015). Improved AdaBoost Algorithm and SVM Combinatorial Classifier. Computer Engineering and Applications, 44-32. [Paper reference 2]

  6. 6. Liu, Z. T. (2013). China’s P2P Network Credit Risk Assessment. Nanning: Guangxi University. [Paper reference 1]

  7. 7. Luo, C. Y. (2012). P2P Network Lending in the Investment Decision Model. Dalian: Dalian University of Technology. [Paper reference 1]

  8. 8. Tan, P.-N., Steinbach, M., & Kumar, V. (2006). Introduction to Data Raining Is 1: Posts & Telecom Pushers Inc. [Paper reference 1]

  9. 9. Wang, F. (2016). China’s P2P Network Lending Platform Risk Regulation and Prevention. China Circulation Economy, 121-127. [Paper reference 1]

  10. 10. Wang, Y. Z., & Le, S. B. (2005). Multiboot-Based Minimum Classification Error Algorithm. Small Microcomputers, No. 11, 1948-1950. [Paper reference 1]

  11. 11. Ye, Q., Li, Z. Q., & Xu, W. H. (2016). Study on Risk Identification of P2P Network Borrowing Platform. Accounting Research, 38-45. [Paper reference 1]

  12. 12. Yu, J. M. (2015). Network Lending (P2P) Platform Quantitative Monitoring Research. Guangzhou: South China University of Technology. [Paper reference 1]