Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction

Hussein Ibrahim Hussein; Said Amirul Anwar Ab Hamid@Ab Majid; Muhammad Imran Ahmad

Publication:

Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction

cris.virtual.department	Universiti Malaysia Perlis
cris.virtual.department	Universiti Malaysia Perlis
cris.virtualsource.department	5a2977ff-c732-4870-94ba-738508334227
cris.virtualsource.department	a19d4cbc-2489-4c8e-8287-251ae470e4c4
dc.contributor.author	Hussein Ibrahim Hussein
dc.contributor.author	Said Amirul Anwar Ab Hamid@Ab Majid
dc.contributor.author	Muhammad Imran Ahmad
dc.date.accessioned	2024-05-15T07:30:32Z
dc.date.available	2024-05-15T07:30:32Z
dc.date.issued	2023
dc.description.abstract	Imbalanced data classification is one of the major problems in machine learning. This imbalanced dataset typically has significant differences in the number of data samples between its classes. In most cases, the performance of the machine learning algorithm such as Support Vector Machine (SVM) is affected when dealing with an imbalanced dataset. The classification accuracy is mostly skewed toward the majority class and poor results are exhibited in the prediction of minority-class samples. In this paper, a hybrid approach combining data pre-processing technique and SVM algorithm based on improved Simulated Annealing (SA) was proposed. Firstly, the data pre-processing technique which primarily aims at solving the resampling strategy of handling imbalanced datasets was proposed. In this technique, the data were first synthetically generated to equalize the number of samples between classes and followed by a reduction step to remove redundancy and duplicated data. Next is the training of a balanced dataset using SVM. Since this algorithm requires an iterative process to search for the best penalty parameter during training, an improved SA algorithm was proposed for this task. In this proposed improvement, a new acceptance criterion for the solution to be accepted in the SA algorithm was introduced to enhance the accuracy of the optimization process. Experimental works based on ten publicly available imbalanced datasets have demonstrated higher accuracy in the classification tasks using the proposed approach in comparison with the conventional implementation of SVM. Registering at an average of 89.65% of accuracy for the binary class classification has demonstrated the good performance of the proposed works.
dc.identifier.doi	10.32604/cmc.2023.036025
dc.identifier.uri	https://www.techscience.com/cmc/v75n1/51522/html
dc.identifier.uri	https://www.techscience.com/
dc.language.iso	en
dc.subject	Data Reduction
dc.subject	Imbalanced Data
dc.subject	Resampling Technique
dc.subject	Simulated Annealing
dc.subject	Support Vector Machine
dc.title	Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction
dc.type	Resource Types::text::journal::journal article
dspace.entity.type	Publication
oaire.citation.endPage	564
oaire.citation.issue	1
oaire.citation.startPage	547
oaire.citation.volume	75
oairecerif.author.affiliation	AlSafwa University College, Karbala, Iraq
oairecerif.author.affiliation	Universiti Malaysia Perlis
oairecerif.author.affiliation	Universiti Malaysia Perlis

Collections

Theses & Dissertations

Publication: Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction

Options

Files

Collections

Publication:

Imbalanced data classification using SVM based on simulated annealing featuring synthetic data generation and reduction