_{1}

^{*}

With improved machine learning models, studies on bankruptcy prediction show improved accuracy. This paper proposes three relatively newly-developed methods for predicting bankruptcy based on real-life data. The result shows among the methods (support vector machine, neural network with dropout, autoencoder), neural network with added layers with dropout has the highest accuracy. And a comparison with the former methods (logistic regression, genetic algorithm, inductive learning) shows higher accuracy.

Machine learning is a subfield of computer science. It allows computers to build analytical models of data and find hidden insights automatically, without being unequivocally coded. It has been applied to a variety of aspects in modern society, ranging from DNA sequences classification, credit card fraud detection, robot locomotion, to natural language processing. It can be used to solve many types of tasks such as classification. Bankruptcy prediction is a typical example of classification problems.

Machine learning was born from pattern recognition. Earlier works of the same topic (machine learning in bankruptcy) use models including logistic regression, genetic algorithm, and inductive learning.

Logistic regression is a statistical method allowing researchers to build predictive function based on a sample. This model is best used for understanding how several independent variables influence a single outcome variable [

Genetic algorithm is based on natural selection and evolution. It can be used to extract rules in propositional and first-order logic, and to choose the appropriate sets of if-then rules for complicated classification problems [

Inductive learning’s main category is decision tree algorithm. It identifies training data or earlier knowledge patterns and then extracts generalized rules which are then used in problem solving [

To see if the accuracy of bankruptcy prediction can be further improved, we propose three latest models―support vector machine (SVM), neural network, and autoencoder.

Support vector machine is a supervised learning method which is especially effective in cases of high dimensions, and is memory efficient because it uses a subset of training points in the decision function. Also, it specifies kernel functions according to the decision function [

Neural networks, unlike conventional computers, are expressive models that learn by examples. They contain multiple hidden layers, thus are capable of learning very complicated relationships between inputs and outputs. And they operate significantly faster than conventional techniques. However, due to limited training data, overfitting will affect the ultimate accuracy. To prevent this, a technique called dropout―temporarily and randomly removes units (hidden and visible)―to the neural network [

Autoencoder, also known as Diabolo network, is an unsupervised learning algorithm that sets the target values to be equal to the inputs. By doing this, it suppresses the computation of representing a few functions, which improves accuracy. Also, the amount of training data required to learn these functions is reduced [

This paper is structured as follows. Section 2 describes the motivation for this idea. Section 3 describes relevant previous work. Section 4 formally describes the three models. In Section 5 we present our experimental results where we do a parallel comparison within the three models we choose and a longitudinal comparison with the three older models. Section 6 is the conclusion. Section 7 is the reference.

The three models we choose (SVM, neural network, autoencoder) are relatively newly-developed but have already been applied to many fields.

SVM has been used successfully in many real-world problems such as text categorization, object tracking, and bioinformatics (Protein classification, Cancer classification). Text categorization is especially helpful in daily life―web searching and email filtering provide huge convenience and work efficiency.

Neural networks learn by examples instead of algorithms, thus, they have been widely applied to problems where it is hard or impossible to apply algorithmic methods [

Autoencoders are especially successful in solving difficult tasks like natural language processing (NLP). They have been used to solve the previous seemingly intractable problems in NLP, including word embeddings, machine translation, document clustering, sentiment analysis, and paraphrase detection.

However, the usage of the three models in economics or finance is comparatively hard to find. So, we aim to find out if they still work well in economical field by running them with real-life data in a predicting bankruptcy task.

Another motivation is finding out if the accuracy of this particular problem (bankruptcy prediction) can be improved after reading previous works―The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms [

Machine learning enables computers to find insights from data automatically. The idea of using machine learning to predict bankruptcy has previously been used in the context of Predicting Bankruptcy with Robust Logistic Regression by Richard P. Hauser and David Booth [

Another work, the discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms, in 2003 by Myoung-Jong Kim and Ingoo Han uses the same dataset as we do. They apply older models―inductive learning algorithms (decision tree), genetic algorithms, and neural networks without dropout. Since the length of genomes in GA is fixed, a given problem cannot easily be encoded. And GA gives no guarantee of finding the global maxima. The problem of inductive learning is with the one-step-ahead node splitting without backtracking, which may generate a suboptimal tree. Also, decision trees can be unstable because small variations in the data might result in a completely different tree being generated [

The models we choose either contain a newly developed technique, like dropout, or completely new models that have hardly been utilized in bankruptcy prediction.

This section describes the proposed three models.

Specifically, we use support vector classify (SVC), a subcategory of SVM, in this task. It constructs a hyper-plane, as shown in

With training vectors in two classes and a vector,

x i ∈ ℝ p , i = 1 , ⋯ , n , y ∈ { 1 , − 1 } n

respectively, SVM aims at solving the problem:

min ω , b , ζ 1 2 ω T ω + C ∑ i = 1 n ζ i

subject to

y i ( ω T ϕ ( x i ) + b ) ≥ 1 − ζ i

Its dual is

min α 1 2 α T Q α − e T α

subject to

y T α = 0 , 0 ≤ α i ≤ C , i = 1 , ⋯ , n

where e is a common vector, C > 0 is upper bound, Q is n by n positive semidefinite matrix, Q i j ≡ y i y j k ( x i ⋅ x j ) , and K ( x i , x j ) = ϕ ( x i ) T ϕ ( x j ) is the kernel.

Here the function implicitly maps the training vectors into a higher dimensional space.

The decision function is:

sgn ( ∑ i = 1 n y i α i K ( x i , x ) + ρ ) [

Neural networks’ inputs are modelled as layers of neurons. Its structure is shown in the following figure.

As shown in

ξ = ∑ i = 1 n w i x i

f in

When the value of excitation level x reaches the threshold h, the output y (state) of the neuron is induced. This simulates the electric impulse generated by axon [

Dropout is a technique that further improves neural network’s accuracy. In

z ( l + 1 ) = w ( l + 1 ) y l + b ( l + 1 ) ,^{iii}

y ( l + 1 ) = f ( z ( l + 1 ) ) ,^{ii}

where f is any activation function.

With dropout, the feed-forward operation becomes:

r^{(l)}-Bernoulli(p), j

y ( l ) = r ( l ) y ( l ) ,

z ( l + 1 ) = w ( l + 1 ) y l + b ( l + 1 ) ,^{iii} [

Consider an n/p/n autoencoder.

In ^{n} to G^{p}.

Define X = { x 1 , ⋯ , x m } as a set of training vectors in F^{n}. When there are external targets, let Y = { y 1 , ⋯ , y m } denote the corresponding set of target vectors in F^{n}. And ∆ is a distortion function (e.g. Lp norm, Hamming distance) defined over F^{n}.

For any A Î A and B Î B, the input vector x Î F^{n} becomes output vector A ◦ B(x) Î F^{n} through the autoencoder. The goal is to find A Î A and B Î B that minimize the overall distortion function:

min E ( A , B ) = min E ( x t ) = min Δ A ∘ B ( x t ) , x t [

Given training vectors x i ∈ R n , i = 1 , ⋯ , l and a label vector y ∈ R l , a decision tree groups the sample according to the same labels.

Let Q represents the data at node m. The tree partitions the data θ = ( j , t m )

(feature j and threshold t m ) into Q left ( θ ) and Q right ( θ ) subsets:

Q left ( θ ) = ( x , y ) | x j ≤ t m Q right ( θ ) = Q \ Q left (θ)

The impurity function H ( ) is used to calculate the impurity at m, the choice of which depends on the task being solved (classification or regression)

G ( Q , θ ) = n left N m H ( Q left ( θ ) ) + n right N m H ( Q right (θ))

Choose the parameters that minimises the impurity

θ ∗ = arg min θ G ( Q , θ )

Then recur for subsets Q left ( θ ∗ ) and Q right ( θ ∗ ) until reaching the maximum possible depth, N m < min samples or N m = 1 [

The data we used shown in

As shown in

As shown in

Data set | Dimensionality | Instances | Training Set | Test Set | Validation |
---|---|---|---|---|---|

Bankruptcy | 6 times1 | 250 | 80% | 10% | 10% |

variation | accuracy |
---|---|

truncate = 50 | 0.9899 |

truncate = 100 | 0.9933 |

variation | accuracy |
---|---|

without dropout | 0.9867 with loss 0.0462 |

with dropout (dropout rate = 0.1) | 0.9867 with loss 0.0292 |

with dropout (dropout rate = 0.3) | 0.9933 with loss 0.0300 |

with dropout (dropout rate = 0.4) | 0.9933 with loss 0.0401 |

with dropout (dropout rate = 0.5) | 0.9933 with loss 0.0278 |

with dropout (dropout rate = 0.7) | 0.9933 with loss 0.0428 |

with dropout (dropout rate = 0.8) | 0.9867 with loss 0.0318 |

As shown in

As shown in

As shown in

Support vector machine, neural network with dropout, and autoencoder are three relatively new models applied in bankruptcy prediction problems. Their accuracies outperform those of the three older models (robust logistic regression, inductive learning algorithms, genetic algorithms). The improved aspects include the control for overfitting, the improved probability of finding the global maxima, and the ability to handle large feature spaces. This paper compared and concluded the progress of machine leaning models regarding bankruptcy prediction, and checked to see the performance of relatively new models in the context of bankruptcy prediction that have rarely been applied in that field.

However, the three models also have drawbacks. SVM does not directly give probability estimates, but uses an expensive five-fold cross-validation instead.

variation | accuracy |
---|---|

two layer with dropout (dropout rate = 0.5) | 0.9933 with loss 0.0278 |

three layer (added layer with dense 200) with dropout (dropout rate = 0.5) | 0.9933 with loss 0.0221 |

four layer (added layer with dense 16) with dropout (dropout rate = 0.5) | 1.0000 with loss 0.0004 |

variation | accuracy |
---|---|

truncate = 50 with four layers (added layer dense 16,200) with dropout rate 0.5 | 0.9950 with loss 0.0389 |

truncate = 100 with four layers (added layer dense 16,200) with dropout rate 0.5 | 1.0000 with loss 0.0004 |

variation | accuracy |
---|---|

with SVM | 0.9867 |

with decision tree | 0.9933 |

model | accuracy |
---|---|

Robust logistic regression | 0.6944 |

inductive learning algorithms (decision tree) | 0.897 |

genetic algorithms | 0.94 |

neural networks without dropout | 0.903 |

SVM truncate = 100 | 0.9933 |

Truncate = 100 with four layers (added layer dense 16,200) with dropout rate 0.5 | 1.0000 with loss 0.0004 |

autoencoder (with decision tree) | 0.9933 |

Also, if the data sample is not big enough, especially when outnumbered by the number of features, SVM is likely to give bad performance [

when the most relevant information only makes up a small percent of the input. The solutions to overcome these drawbacks are yet to be found.

Wang, N.X. (2017) Bankruptcy Prediction Using Machine Learning. Journal of Mathematical Finance, 7, 908-918. https://doi.org/10.4236/jmf.2017.74049