Now showing 1 - 2 of 2
  • Publication
    Content-defined chunking algorithms in data deduplication: performance, trade-offs and future-oriented techniques
    (Semarak Ilmu Publishing, 2025)
    Safa Ali Abo Hussein
    ;
    ; ;
    Fathey Mohammed
    ;
    Abdul Ghani Khan
    In the digital era, the exponential growth of data presents significant challenges for storage efficiency and processing speed. This paper reviews Content-Defined Chunking (CDC), a cornerstone in data deduplication technology, aimed at addressing these challenges. We systematically examine various CDC algorithms, categorising them into hashing-based and hash-less methodologies, and evaluating their performance in deduplication processes. Through a critical analysis of existing literature, the study identifies the balance between chunking speed and deduplication efficacy as a pivotal area for enhancement. Our findings reveal the need for innovative CDC algorithms to adapt to the evolving data landscape, proposing future research directions for improving storage and processing solutions. This work contributes to the broader understanding of data deduplication techniques, offering a pathway towards more efficient data management systems.
      1  1
  • Publication
    Optimizing SVM classification in medical datasets: a comparative study of swarm intelligence algorithms for feature selection and parameter tuning
    (AIP Publishing, 2024-05)
    Safa Ali Abdo Hussein
    ;
    ; ;
    Abdul Ghani Khan
    Support Vector Machine (SVM) modelling approach focuses on dimensional reduction with a powerful generalization capability to solve problems such as non-linearity, and local extremes. However, the ability of SVM to learn and generalize depends on the collection of acceptable parameters that directly affect the output of the model. In addition, deploying relevant feature sets is also an issue in obtaining optimal classification. Thus, this work deploys eight swarm intelligence algorithms to improve the SVM classifier's accuracy by conducting two experiments: feature selection and tuning SVM parameters using the obtained feature set. The deployed swarm intelligence algorithms include Particle swarm optimization (PSO), Genetic Algorithm (GA), Firefly Algorithm (FA), Salp Swarm Algorithm (SSA), Bat Algorithm (BA), Sine Cosine Algorithm (SCA), Whale Optimization Algorithm (WOA), and Multi Verse Optimization (MVO). The undertaken experiments were on six medical benchmark datasets that are of various dimensions. Evaluation was based on two metrics: classification accuracy and size of feature set. This work's significance is to present the comparison and determine which swarm algorithm is the best to be used in a large size medical dataset. The experimental results support existing literature that noted the swarm intelligence are useful feature selection as well as optimizing SVM parameter while maintaining acceptable accuracy.