Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction
2023,
Hussein Ibrahim Hussein,
Said Amirul Anwar Ab Hamid@Ab Majid,
Muhammad Imran Ahmad
Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differences in the number of data samples between its classes. In most cases, the performance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples. In this paper, a hybrid approach combining data pre-processing technique and SVM algorithm based on improved Simulated Annealing (SA) was proposed. Firstly, the data pre-processing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed. In this technique, the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data. Next is the training of a balanced dataset using SVM. Since this algorithm requires an iterative process to search for the best penalty parameter during training, an improved SA algorithm was proposed for this task. In this proposed improvement, a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process. Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM. Registering at an average of 89.65% of accuracy for the binary class classification has demonstrated the good performance of the proposed works.
A review of optimization algorithms in SVM parameters
2023,
Hussein Ibrahim Hussein,
Said Amirul Anwar Ab Hamid@Ab Majid
The SVM is a widely known machine learning, which is very useful for regression applications and pattern classification. These machines have been used successfully in several domains to address numerous real-world challenges. In this context, parameter optimisation for an SVM is a widely researched topic, which has attracted attention from several research domains. Algorithms facilitating optimisation have been of greater interest compared to other algorithms. Algorithmic approaches allow the optimal parameters for an SVM to be determined, after which the model can be adapted for several other applications. During the last two decades, several enhancements have been brought about to facilitate better optimisation of SVM models to offer enhanced performance. This paper focuses on the several algorithms currently employed to optimise support vector machines in their basic and modified forms. This paper comprises a comprehensive analysis of algorithms and aims to ascertain the present challenges relating to algorithms used for SVM parameter optimisation. This study cannot evaluate all the details; however, the significant theoretical aspects are covered using references to existing literature.