Suzhen Wang1, Zhanfeng Zhang1,*, Shanshan Geng1, Chaoyi Pang2
CMC-Computers, Materials & Continua, Vol.71, No.2, pp. 3721-3731, 2022, DOI:10.32604/cmc.2022.015378
- 07 December 2021
Abstract As society has developed, increasing amounts of data have been generated by various industries. The random forest algorithm, as a classification algorithm, is widely used because of its superior performance. However, the random forest algorithm uses a simple random sampling feature selection method when generating feature subspaces which cannot distinguish redundant features, thereby affecting its classification accuracy, and resulting in a low data calculation efficiency in the stand-alone mode. In response to the aforementioned problems, related optimization research was conducted with Spark in the present paper. This improved random forest algorithm performs feature extraction according More >