Home
  • English
  • ÄŚeština
  • Deutsch
  • Español
  • Français
  • GĂ idhlig
  • Latviešu
  • Magyar
  • Nederlands
  • PortuguĂŞs
  • PortuguĂŞs do Brasil
  • Suomi
  • Log In
    Have you forgotten your password?
Home
  • Browse Our Collections
  • Publications
  • Researchers
  • Research Data
  • Institutions
  • Statistics
    • English
    • ÄŚeština
    • Deutsch
    • Español
    • Français
    • GĂ idhlig
    • Latviešu
    • Magyar
    • Nederlands
    • PortuguĂŞs
    • PortuguĂŞs do Brasil
    • Suomi
    • Log In
      Have you forgotten your password?
  1. Home
 
Options

Comparison of classification models for breast cancer disease using multivariate analysis and data mining approaches

Journal
Applied Mathematics and Computational Intelligence (AMCI)
ISSN
2289-1323
Date Issued
2023-12
Author(s)
Nurul Ashyikin Ramli
Universiti Teknologi MARA
Zalina Zahid
Universiti Teknologi MARA
Siti Aida Sheikh Hussin
Universiti Teknologi MARA
Noor Asiah Ramli
Universiti Teknologi MARA
DOI
https://doi.org/10.58915/amci.v12i4.348
Handle (URI)
Applied Mathematics and Computational Intelligence (AMCI)
https://ejournal.unimap.edu.my/index.php/amci/article/view/348/225
https://hdl.handle.net/20.500.14170/1573
Abstract
Compared to other cancer types, breast cancer is one of the main causes of death in women. Early cancer detection can significantly increase survival and quality of life. A variety of machine learning prediction algorithms with combination of feature selection approaches have shown to be useful in the detection of breast cancer disease. However, it was discovered that there are still problems with classification accuracy. An outlier-related factor was known to have potential effect on classification accuracy. In order to further improve the classification’s accuracy, the Kmeans approach was used to detect outliers. The major goal of this study was to examine the classification performance of breast cancer disease when feature selection methods were used in combination with K-Means. For experimental purpose, the Coimbra dataset for breast cancer consisting of 116 instances and 10 attributes was used in this study. Multivariate techniques including Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), and Discriminant Analysis (DA) were applied to reduce data dimensions. Meanwhile, four data mining approaches consisting of Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) were compared for classification purpose. The performance measurement was then evaluated using accuracy, precision, specificity, and sensitivity criteria. The results revealed that five combinations approaches (PCA-DT, PCA-RF, KPCA-DT, KPCA-RF, DA-RF) performed better across all four criteria after combining with KMeans technique. Among five combined methods, KPCA with DT outperformed other combination methods with the highest value across precision (76.47 percent) and specificity (71.43 percent). This study suggests the incorporation of feature selection method together with outlier detection method has proved to be more efficient and beneficial for breast cancer classification.
Subjects
  • Breast cancer

  • Principal component a...

  • Kernel principal comp...

  • Random forest

  • Support vector machin...

File(s)
Comparison of Classification Models for Breast.pdf (426.83 KB)
Views
15
Acquisition Date
Nov 19, 2024
View Details
Downloads
6
Acquisition Date
Nov 19, 2024
View Details
google-scholar
  • About Us
  • Contact Us
  • Policies