In present digital era, data science techniques exploit artificial intelligence (AI) techniques who start and run small and medium-sized enterprises (SMEs) to have an impact and develop their businesses. Data science integrates the conventions of econometrics with the technological elements of data science. It make use of machine learning (ML), predictive and prescriptive analytics to effectively understand financial data and solve related problems. Smart technologies for SMEs enable allows the firm to get smarter with their processes and offers efficient operations. At the same time, it is needed to develop an effective tool which can assist small to medium sized enterprises to forecast business failure as well as financial crisis. AI becomes a familiar tool for several businesses due to the fact that it concentrates on the design of intelligent decision making tools to solve particular real time problems. With this motivation, this paper presents a new AI based optimal functional link neural network (FLNN) based financial crisis prediction (FCP) model for SMEs. The proposed model involves preprocessing, feature selection, classification, and parameter tuning. At the initial stage, the financial data of the enterprises are collected and are preprocessed to enhance the quality of the data. Besides, a novel chaotic grasshopper optimization algorithm (CGOA) based feature selection technique is applied for the optimal selection of features. Moreover, functional link neural network (FLNN) model is employed for the classification of the feature reduced data. Finally, the efficiency of the FLNN model can be improvised by the use of cat swarm optimizer (CSO) algorithm. A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model. The experimental studies demonstrated that the CGOA-FLNN-CSO model has accomplished maximum prediction accuracy of 98.830%, 92.100%, and 95.220% on the applied Polish dataset Year I-III respectively.
Financial data science is the application of data science approaches to solve the problems related finance. The data science encompasses skills from computer science, mathematics, statistics, information visualization, graphic design, complex systems, communication and business. It is generally based on scientific techniques and methods in extracting useful patterns from both structured and unstructured data. It generally includes predictive modeling, clustering, data wrangling, visualization and dimensionality reduction. On the other hand, small and medium sized enterprises (SMEs) are a major portion of the global economy [
For SMEs, because of the absence of reliable data and other aspects, assessing credit risk is complex and expensive for the banks. Although bank utilizes relationships long to collect soft data on time for handling credit data scarcity, SME frequently faces higher cost while accessing finance as a result of data opacity and higher bankruptcy threat [
Financial crisis prediction (FCP) for SMEs contains major importance in making financial decisions. A company state of minimum/maximum organization is involved with shareholders, local community, and organizational candidates, but, it causes the financial global economy and policymakers. Henceforth, maximum social and financial expenses through business bankrupt have been stimulated with several developers for understanding better bankruptcy reason and lastly, finding the distress of business. Economical centres and organizations have shown high consideration to predict the economical error of a firm. FCP is the most substantial application which assists an economical firm to make an optimum decision. It is due to the inferior decision acquired from the whole organizations that result in economic problem/bankruptcy and influence the clients, shareholders, vendors, etc. The current growth in Information Technology (IT) initiates to accomplish different data are depending upon the threat levels of a firm in many conditions. During the estimation of high data, several clients are based on analyst decisions. But, the factor that affects the efficiency analyses.
AI and Statistical methods have been employed to find the essential factors of FCP. Now, AI modules are utilized in different categories [
On the other hand, Feature Selection (FS) is determined as a significant preprocessing stage in DM. It is mostly proposed to filter the repeating and irregular features from actual data. Now, it is recognized that various mathematical models and estimations are employed to handle the FCP. Depending upon the evaluation state, FS modules are categorized as filter relied upon, wrapper, and embedded. In spite of utilizing the wrapper methods, it experiences several challenges such as learner constraints, maximum processing complexity, etc. Incorporated modules contain difficulty compared to wrapper modules as FS subsets based on learning modules. Because of the presence of limitations, it was employed by filter modules. It is employed to compute the feature subset by permanent values rather than selected features and learners. The problem of detecting an optimal feature from the accessible features is called as NP hard problem [
This paper presents a new AI based optimal functional link neural network (FLNN) based FCP model for SMEs. The proposed model involves preprocessing, feature selection, classification, and parameter tuning. Primarily, the financial data of the SMEs are gathered and then preprocessed to improve the data quality. In addition, a novel chaotic grasshopper optimization algorithm (CGOA) based feature selection technique is applied for the optimal selection of features. Besides, functional link neural network (FLNN) model is employed for the classification of the feature reduced data. At the same time, the efficiency of the FLNN model can be improvised by the use of cat swarm optimizer (CSO) algorithm. A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model. In short, the paper's contributions can be summarized as follows.
Propose a CGOA-FLNN-CSO technique to predict the financial status of SMEs. The CGOA-FLNN-CSO technique integrates preprocessing, CGOA based feature selection, FLNN based classification, and CSO based parameter tuning. Designs a new CGOA technique to select optimal set of features to reduce the computational complexity and enhance the FCP performance. Derive a new FLNN-CSO technique for the classification process, which includes an optimal hyperparameter process using CSO algorithm to boost the predictive performance. Validate the predictive performance of the proposed technique on Polish dataset and compares the results with the recent state of art methods.
The upcoming sections of the paper are planned as follows. Section 2 briefs the existing works related to the FCP. Followed Section 3 elaborates the proposed model and Section 4 offers detailed performance validation. Lastly, Section 5 concludes the study.
This section offers a comprehensive review of FCP models that existed in the literature. Gregova et al. [
In Lu et al. [
Perboli et al. [
GOA is initially projected in [
Depending upon the aforementioned descriptions of grasshopper, three evolution operators have existed in the location upgrading of individual in swarms, the social interactions operator (Si), the gravity force operator (Gi), and the wind advection operator (Ai), as shown in
whereas
where
whereas
But, the gravity operator isn’t deliberated and they consider that the wind direction is often on the way to target. Later the
where
where
The generation of early population in the search space plays a major part in GOA. From the stated survey, they have examined that several chaoses based GOA method is investigated for resolving global optimization problems. In this study, they presented chaotic initiation of maps in the GOA optimization procedure for accelerating its global convergence speed. The chaotic map is utilized for balancing efficient exploitation & exploration and decrease repulsion or attraction forces among grasshoppers in the optimization procedure [
whereas
In this process, huge number of manifold periodic components would be placed in the thinner
An important factor of CGOA-FS method is assessing the quality of the chosen subset. As the presented CGOA-FS method is a wrapper based technique, then a learning method (viz., classification) must be included in the evaluation procedure. In this study, an FLNN classifier is used as an evaluator, and the classifier accuracy of the chosen features is included in the presented fitness function. If the chosen features in a subset are appropriate, the attained classifier accuracy would be improved. Having a higher classifier accuracy is the main aim of the CGOA-FS technique. Another significant aim is decreasing the number of chosen features. When the number of features chosen is minimum, then improved solution can be obtained. In the presented CGOA-FS technique, these two contradictory aims are considered.
where
The ANN is a network of connected components which is stimulated using the study of biological nervous system. It is an effort to implement a machine which works in the same way as the human brain system. This artificial machine mimics the functions such as biological neurons are named as nodes/neurons. These nodes/units are the components of the NN. This method executes difficult problems in a nonlinear platform. It can classify patterns by arranging the patterns to one group/other. Several variations of ANN amongst MLP is the best known kind. It consists of hidden, input, and output layers that are interconnected among each other
FLNN is the subdivision of FFNN without hidden layer. It carries nonlinearity to its input structure with the expansion unit [
In order to properly tune the parameters of the FLNN model, CSO algorithm is applied to it and thereby enhances the predictive performance. The CSO technique is simulated as resting and tracing performances of cats. The cats appear that exists lazy and spend one of their time resting. Therefore, it is continually monitoring the surroundings cleverly and purposely and if it can be obvious a target, it starts moving to it rapidly. So, the CSO technique is exhibited depends on composing of 2 important behaviors of cats. The CSO technique has 2 modes such as tracing and seeking modes. All cats signify the solution set that has their individual place, fitness value, and flag. The place is composed of
This technique proceeds the subsequent steps for searching for better solutions:
To identify the upper and lower bounds to the solution sets. Arbitrarily create Arbitrarily classified the cats as seeking and tracing modes based on MR. The MR was mixture ratio that is selected from interval of Estimate the fitness value of every cat based on domain identified FF. Then, an optimum cat was selected and stored as memory. The cats then move to either seeking or tracing mode. Next, the cats endure seek or trace mode, for next iteration, arbitrarily reallocate the cats as to seek or trace modes according to MR. Verify the end criteria; when fulfilled; end the program; then, repeat Step 4 to Step 6.
Seeking Mode. This mode reproduces the resting performance of cats, in which 4 basic parameters role vital plays: seeking memory pool (SMP), seeking range of selected dimension (SRD), counts of dimension to change (CDC), and self-position considering (SPC). These values are each tuned and determined as the user with trial and error model. SMP identifies the size of seeking memory to cats, for instance, it determined the count of candidate places where among them is going that exists selected as cat. Thus, for instance, when the SPC flag is fixed to true, afterward to all cats, it can be required for generating (SMP-1) number of candidates in its place SMP number as the present place was regarded as among them. The seeking mode steps are as follows:
(1) Create up to SMP copies of present place of
(2) In order to all copies, arbitrarily elect up to CDC dimensional that exists mutated. Also, arbitrarily add or subtract SRD values in the present values that exchange the old places as depicted from the subsequent formula:
where
(3) Estimate the fitness value to every candidate place.
(4) According to probabilities, elect most candidate points that exist the next place to the cat where candidate point with superior FS have further chance to be elected as demonstrated in
When the objective is to minimize, next
Tracing Mode. This mode replicas the trace performance of cats. In the initial iteration, arbitrary velocity values are provided to every dimension of cat place. But, to later steps, velocity values require that exists efficient. Moving cats from this mode are as follows:
(1) Upgrade velocities
(2) When the velocity value outranged the maximal value, afterward it can be equivalent to maximal velocity.
(3) Upgrade place of
The proposed technique has 2 essential processes involving inner parameter optimization and outer efficiency estimation. In the inner parameter optimization method, the penalty parameter C and kernel bandwidths γ of ELM was determined dynamically employing the ISSO approach utilizing 5_fold CV analysis. Next, the achieved optimum parameters pair (C, γ) is inputted as to KELM forecast method for performing the classifier task from outer loop utilizing a 10_fold CV approach. The classifier error rate is employed as FF.
where
The proposed model is simulated using Python 3.6.5 tool. The FCP performance of the proposed model takes place on Polish dataset with 3 years of data [
Polish Dataset | Source | # of instances | # of attributes | # of class | Bankrupt/Non-Bankrupt |
---|---|---|---|---|---|
Year-I | UCI | 7027 | 64 | 2 | 271/6756 |
Year-II | UCI | 10173 | 64 | 2 | 400/9773 |
Year-III | UCI | 10503 | 64 | 2 | 495/10008 |
No. of Iterations | Polish Dataset Year-I | |||
---|---|---|---|---|
CGOA-FS | GOA-FS | KHO-FS | GWO-FS | |
10 | 0.2246 | 0.4966 | 0.6984 | 1.3601 |
20 | 0.2237 | 0.4966 | 0.6915 | 1.3601 |
30 | 0.2237 | 0.4966 | 0.6915 | 0.9324 |
40 | 0.2237 | 0.4966 | 0.6915 | 0.9324 |
50 | 0.2206 | 0.4966 | 0.6807 | 0.9324 |
60 | 0.2206 | 0.4966 | 0.6807 | 0.9324 |
70 | 0.2206 | 0.4966 | 1.1943 | 0.9324 |
80 | 0.2206 | 0.4966 | 1.1943 | 0.9324 |
90 | 0.2205 | 0.4966 | 1.1943 | 0.9324 |
100 | 0.2205 | 0.4966 | 0.6637 | 0.9324 |
Average | 0.2219 | 0.4966 | 0.8381 | 1.0179 |
No. of Iterations | Polish Dataset Year-II | |||
CGOA-FS | GOA-FS | KHO-FS | GWO-FS | |
10 | 0.3422 | 0.6100 | 0.8000 | 1.0610 |
20 | 0.3422 | 0.6100 | 0.8000 | 1.0570 |
30 | 0.3422 | 0.6100 | 0.7800 | 1.0570 |
40 | 0.3422 | 0.6100 | 0.7800 | 1.0570 |
50 | 0.3357 | 0.6000 | 0.7800 | 1.0570 |
60 | 0.3357 | 0.6000 | 0.7800 | 1.0570 |
70 | 0.3357 | 0.6000 | 0.7800 | 1.0570 |
80 | 0.3148 | 0.5900 | 0.7800 | 1.0570 |
90 | 0.3148 | 0.5900 | 0.7800 | 1.0550 |
100 | 0.3131 | 0.5900 | 0.7800 | 1.0550 |
Average | 0.3319 | 0.6010 | 0.7840 | 1.0570 |
No. of Iterations | Polish Dataset Year-III | |||
CGOA-FS | GOA-FS | KHO-FS | GWO-FS | |
10 | 0.2542 | 0.5340 | 0.7190 | 0.9859 |
20 | 0.2542 | 0.5330 | 0.7190 | 0.9831 |
30 | 0.2542 | 0.5330 | 0.7190 | 0.9831 |
40 | 0.2542 | 0.5330 | 0.7190 | 0.9825 |
50 | 0.2433 | 0.5330 | 0.7170 | 0.9810 |
60 | 0.2433 | 0.5330 | 0.7150 | 0.9810 |
70 | 0.2411 | 0.5330 | 0.7150 | 0.9805 |
80 | 0.2411 | 0.5330 | 0.7150 | 0.9805 |
90 | 0.2408 | 0.5330 | 0.7150 | 0.9805 |
100 | 0.2403 | 0.5330 | 0.7150 | 0.9805 |
Average | 0.2467 | 0.5331 | 0.7168 | 0.9819 |
On analyzing the performance on the Polish dataset year-III, it can be stated that the GWO-FS manner has accomplished least performance with the average best cost of 0.9819. Simultaneously, the KHO-FS manner has achieved slightly increased outcome with the average best cost of 0.7168. Also, the GOA-FS manner has accomplished moderate performance with the average best cost of 0.5331. Finally, the CGOA-FS approach outperformed the existing manners with the average best cost of 0.2467.
This paper has developed a new CGOA-FLNN-CSO algorithm to predict the financial status of the SMEs. The proposed model involves different processes such as preprocessing, CGOA based feature selection, FLNN based classification, and CSO based parameter tuning. The inclusion of CGOA algorithm for FS plays a vital role in the improved predictive performance. At the same time, the unique features of the FLNN model parameter tuning using CSO algorithm helps to considerably enhance the predictive performance. A detailed experimental validation process takes place on Polish dataset to ensure the performance of the presented model. The obtained simulation results verified the effectiveness of the presented model over the compared methods interms of best cost, sensitivity, specificity, accuracy, F-score, and Mathew Correlation Coefficient (MCC). The CGOA-FLNN-CSO model has accomplished maximum prediction accuracy of 98.830%, 92.100%, and 95.220% on the applied Polish dataset Year I-III respectively. In future, the predictive performance of the CGOA-FLNN-CSO algorithm can be extended by the use of outlier detection approaches.