Comparison of classification models for breast cancer disease using multivariate analysis and data mining approaches

Nurul Ashyikin Ramli; Zalina Zahid; Siti Aida Sheikh Hussin; Noor Asiah Ramli

Comparison of classification models for breast cancer disease using multivariate analysis and data mining approaches

Journal

Applied Mathematics and Computational Intelligence (AMCI)

ISSN

2289-1323

Date Issued

2023-12

Author(s)

Nurul Ashyikin Ramli

Universiti Teknologi MARA

Zalina Zahid

Universiti Teknologi MARA

Siti Aida Sheikh Hussin

Universiti Teknologi MARA

Noor Asiah Ramli

Universiti Teknologi MARA

DOI

https://doi.org/10.58915/amci.v12i4.348

Handle (URI)

Applied Mathematics and Computational Intelligence (AMCI)

https://ejournal.unimap.edu.my/index.php/amci/article/view/348/225

https://hdl.handle.net/20.500.14170/1573

Abstract

Compared to other cancer types, breast cancer is one of the main causes of death in women. Early cancer detection can significantly increase survival and quality of life. A variety of machine learning prediction algorithms with combination of feature selection approaches have shown to be useful in the detection of breast cancer disease. However, it was discovered that there are still problems with classification accuracy. An outlier-related factor was known to have potential effect on classification accuracy. In order to further improve the classification’s accuracy, the Kmeans approach was used to detect outliers. The major goal of this study was to examine the classification performance of breast cancer disease when feature selection methods were used in combination with K-Means. For experimental purpose, the Coimbra dataset for breast cancer consisting of 116 instances and 10 attributes was used in this study. Multivariate techniques including Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), and Discriminant Analysis (DA) were applied to reduce data dimensions. Meanwhile, four data mining approaches consisting of Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) were compared for classification purpose. The performance measurement was then evaluated using accuracy, precision, specificity, and sensitivity criteria. The results revealed that five combinations approaches (PCA-DT, PCA-RF, KPCA-DT, KPCA-RF, DA-RF) performed better across all four criteria after combining with KMeans technique. Among five combined methods, KPCA with DT outperformed other combination methods with the highest value across precision (76.47 percent) and specificity (71.43 percent). This study suggests the incorporation of feature selection method together with outlier detection method has proved to be more efficient and beneficial for breast cancer classification.

Subjects

File(s)

Comparison of Classification Models for Breast.pdf (426.83 KB)

Views

15

Acquisition Date
Nov 19, 2024

View Details

Downloads

6

Acquisition Date
Nov 19, 2024

View Details

google-scholar

Options

Comparison of classification models for breast cancer disease using multivariate analysis and data mining approaches