Now showing 1 - 10 of 10
  • Publication
    Enhancing ecosystem biodiversity through air pollution concentrations prediction using support vector regression approaches
    (Universitatea Gheorghe Asachi din Iasi, 2023)
    Syaidatul Umairah Solehah
    ;
    Aida Wati Zainan Abidin
    ;
    Saiful Nizam Warris
    ;
    Wan Nur Shaziayani
    ;
    Balkish Mohd Osman
    ;
    Nurain Ibrahim
    ;
    ;
    Ahmad Zia Ul-Saufie
    Air is the most crucial element for the survival of life on Earth. The air we breathe has a profound effect on our ecosystem biodiversity. Consequently, it is always prudent to monitor the air quality in our environment. There are few ways can be done in predicting the air pollution index (API) like data mining. Therefore, this study aimed to evaluate three types of support vector regression (linear, SVR, libSVR) in predicting the air pollutant concentration and identify the best model. This study also would like to calculate the API by using the proposed model. The secondary daily data is used in this study from year 2002 to 2020 from the Department of Environment (DoE) Malaysia which located at Petaling Jaya monitoring station. There are six major pollutants that have been focusing in this work like PM10, PM2.5, SO2, NO2, CO, and O3. The root means square error (RMSE), mean absolute error (MAE) and relative error (RE) were used to evaluate the performance of the regression models. Experimental results showed that the best model is linear SVR with average of RMSE = 5.548, MAE = 3.490, and RE = 27.98% because had the lowest total rank value of RMSE, MAE, and RE for five air pollutants (PM10, PM2.5, SO2, CO, O3) in this study. Unlikely for NO2, the best model is support vector regression (SVR) with RMSE = 0.007, MAE = 0.006, and RE = 20.75% in predicting the air pollutant concentration. This work also illustrates that combining data mining with air pollutants prediction is an efficient and convenient way to solve some related environment problems. The best model has the potential to be applied as an early warning system to inform local authorities about the air quality and can reliably predict the daily air pollution events over three consecutive days. Besides, good air quality plays a significant role in supporting biodiversity and maintaning healthy ecosystems. © 2023 Universitatea "Alexandru Ioan Cuza" din Iasi. All rights reserved.
  • Publication
    Performance of Bayesian Model Averaging (BMA) for short-term prediction of PM10 concentration in the Peninsular Malaysia
    (MDPI, 2023) ;
    Hazrul Abdul Hamid
    ;
    Ahmad Shukri Yahaya
    ;
    Ahmad Zia Ul-Saufie
    ;
    ;
    Ain Nihla Kamarudzaman
    ;
    György Deák
    ;
    In preparation for the Fourth Industrial Revolution (IR 4.0) in Malaysia, the government envisions a path to environmental sustainability and an improvement in air quality. Air quality measurements were initiated in different backgrounds including urban, suburban, industrial and rural to detect any significant changes in air quality parameters. Due to the dynamic nature of the weather, geographical location and anthropogenic sources, many uncertainties must be considered when dealing with air pollution data. In recent years, the Bayesian approach to fitting statistical models has gained more popularity due to its alternative modelling strategy that accounted for uncertainties for all air quality parameters. Therefore, this study aims to evaluate the performance of Bayesian Model Averaging (BMA) in predicting the next-day PM10 concentration in Peninsular Malaysia. A case study utilized seventeen years’ worth of air quality monitoring data from nine (9) monitoring stations located in Peninsular Malaysia, using eight air quality parameters, i.e., PM10, NO2, SO2, CO, O3, temperature, relative humidity and wind speed. The performances of the next-day PM10 prediction were calculated using five models’ performance evaluators, namely Coefficient of Determination (R2), Index of Agreement (IA), Kling-Gupta efficiency (KGE), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) and Mean Absolute Percentage Error (MAPE). The BMA models indicate that relative humidity, wind speed and PM10 contributed the most to the prediction model for the majority of stations with (R2 = 0.752 at Pasir Gudang monitoring station), (R2 = 0.749 at Larkin monitoring station), (R2 = 0.703 at Kota Bharu monitoring station), (R2 = 0.696 at Kangar monitoring station) and (R2 = 0.692 at Jerantut monitoring station), respectively. Furthermore, the BMA models demonstrated a good prediction model performance, with IA ranging from 0.84 to 0.91, R2 ranging from 0.64 to 0.75 and KGE ranging from 0.61 to 0.74 for all monitoring stations. According to the results of the investigation, BMA should be utilised in research and forecasting operations pertaining to environmental issues such as air pollution. From this study, BMA is recommended as one of the prediction tools for forecasting air pollution concentration, especially particulate matter level.
  • Publication
    Comparative analysis of machine learning techniques for SO₂ prediction modelling
    (IOP Publishing, 2023)
    Wan Nur Shaziayani
    ;
    ; ;
    Ahmad Zia Ul-Saufie
    Sulphur dioxide (SO₂) is produced both naturally and by human activity. The primary natural resource is derived from volcanoes. The burning of fossil fuels is the primary anthropogenic source (especially coal and diesel). Therefore, a reliable and accurate predicting method is essential for an early warning system for SO₂ atmospheric concentration. There are still limited studies in Malaysia that use machine learning methods to predict SO₂ concentrations. With the aid of machine learning, this study seeks to develop and predict future SO₂ concentrations for the next day using the maximum daily data from Klang, Selangor. RapidMiner Studio is the data mining tool used for this research work. Based on the results, it showed that the SVM model was the best guide to be used compared with the other five models (GLM, DL, DT, GBT, and RF). The performance indicators showed that the SVM model was adequate for the next day's prediction (R2 = 0.77, SE = 8.26, REL = 18.69%, AE = 1.46, and RMSE = 2.82). The developed model in this research can be used by Malaysian authorities as a public health protection measure to give Malaysians an early warning about the problem of air pollution. The goal of predictive modelling is to make a reasonable prediction of the variable of interest, and frequently, to determine how much the independent variable contributed to the dependent variable. The results also showed that the previous SO₂ concentrations were one of the most influential parameters used to predict the future SO₂ concentrations.
  • Publication
    Modified linear regression for predicting ambient particulate pollutants (PM₁₀) during High Particulate Event
    (IOP Publishing, 2023)
    Izzati Amani Mohd Jafri
    ;
    ;
    Nur Alis Addiena A. Rahim
    ;
    Syaza Ezzati Baidrulhisham
    ;
    ;
    Ahmad Zia Ul-Saufie
    ;
    György Deák Habil
    Particulate Matter (PM₁₀) is one of the most significant contributors towards haze or high particulate event (HPE) that occurs in Malaysia. HPE can severely affect human health, environment and economic so it is important to create a reliable prediction model in predicting future PM₁₀ concentration especially during HPE. Therefore, the aim of this study is to investigate the performance of modified linear regression models in predicting the next-day Particulate Matter (PM₁₀+24) concentration at two areas in the peninsular Malaysia namely, Bukit Rambai and Nilai. Hourly air quality dataset during historic HPE in 1997, 2005, 2013 and 2015 were used for analysis. Pearson correlation was used to select the input of the PM₁₀ prediction model where only parameters with moderate (0.6 > r > 0.3) and strong (r > 0.6) correlation with PM₁₀ concentration were selected as independent variables input in creating the multiple linear regression (MLR) model. The performance of modified linear regression model was evaluated by using several performance indicator which is Prediction Accuracy (PA), Index of Agreement (d 2), Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). The results show that the modified MLR (parameter with r > 0.6 included as input) gave the best prediction model for the next-day PM₁₀ concentration in both Bukit Rambai and Nilai.
  • Publication
    Assessment of time series model for predicting long-interval consecutive missing values in air quality dataset
    (Penerbit Akademia Baru, 2025)
    Daniel Kim Boon Bong
    ;
    ;
    Ahmad Zia Ul-Saufie
    ;
    Faizal Ab Jalil
    ;
    György Deák
    Air pollutant concentration in Malaysia is continuously monitored using the Continuous Ambient Air Quality Machine (CAAQM). During the observation phase by CAAQM, some air pollutant datasets were detected missing due to machine failure, maintenance, position changes and human error. Incomplete datasets especially with the longer gaps of consecutive missing observation may lead to several significant problems including loss of efficiency, difficulties in using some computational software and bias estimation due to differences of observed and predicted dataset. This study aim evaluates the performance of the time series method i.e. Auto Regression Integrated Moving Average (ARIMA) for filling long hours of missing data in an air pollution dataset. The dataset of PM10, SO2, NO2, O3, CO, wind speed, relative humidity and ambient temperature for Pegoh and Kota Kinabalu in 2018 were used for analysis. Monte Carlo Markov Chain (MCMC) and Expectation-Maximization (EM) were employed to compare with ARIMA's effectiveness in filling the simulated missing gaps in air quality dataset. Existing missing data in the raw data were pre-treated and then simulated into 5%, 10% and 15% of missing data ranging from 24-hour to 120-hour intervals. The performance of the imputation approach was assessed using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Prediction Accuracy (PA) and Index of Agreement (IA). Overall, the Expectation-Maximization technique was selected the most effective at filling in simulated long gaps of missing data of air pollutant dataset with the range of IA from 0.74 to 0.77. In contrast, the ARIMA approach performed poorly in this research with range of IA value of 0.44 to 0.48. This was because of it requires past time-series data to generalize a forecast or impute missing data, hence, the forecast becomes a straight line and performed poorly at predicting series with long hours of missing observation.
  • Publication
    Analysis of air pollution in Malaysia: implications for environmental conservation using granger causality and pearson correlation
    (Universitatea Gheorghe Asachi din Iasi, 2025)
    Zulkifli Abd Rais
    ;
    ;
    Hazrul Abdul Hamid
    ;
    ;
    Ahmad Zia Ul-Saufie
    ;
    Mohd Khairul Nizam MAHMAD
    This study investigates the relationships between air pollutants (PM₁₀, SO₂, NO₂, O₃, CO) and meteorological factors (temperature, relative humidity, wind speed) across five states in Malaysia: Seberang Perai, Shah Alam, Nilai, Larkin and Pasir Gudang. Using time-series data from 2017 to 2021, we applied Granger causality and Pearson correlation to explore the predictive relationships and linear associations between these variables. Granger causality provided insights into temporal precedence, revealing significant predictive relationships such as temperature Granger-causing PM₁₀ and O₃ in Nilai and Shah Alam. Meanwhile, Pearson correlation highlighted strong linear relationships, such as the positive correlation between PM₁₀ and wind speed in Shah Alam and the negative correlation between humidity and O₃ across several stations. By comparing both methods, we show how combining Granger causality with Pearson correlation can enhance environmental modelling, offering a comprehensive approach to air pollution prediction. This integration provides robust insights into the dynamics of air quality, which are critical for developing effective pollution control strategies.
  • Publication
    Prediction of missing data in rainfall dataset by using simple statistical method
    (IOP Publishing, 2020)
    Izzati Amani Mohd Jafri
    ;
    ;
    Ahmad Zia Ul-Saufie
    ;
    Annas Suwardi
    Almost all of the data obtained from hydrological station contains missing data. Usually, this problem occurs due to equipment failures, maintenance work and human error. Incomplete dataset will reduce the ability of a statistical analysis and can cause a bias estimation due to systematic differences between observed and unobserved data. In this study, four simple statistical method such as Series Mean, Average Mean Top Bottom, Linear Interpolation and Nearest Neighbour were applied to predict the missing values in a rainfall dataset. An annual daily data for rainfall from nine selected monitoring station (from 2009 until 2018) were described using descriptive statistic. Then, the dataset were randomly simulated into 4 percentages of missing (5%, 10%, 15% and 20%) by using statistical package for social sciences software. The performance of this imputation methods were evaluated by using four performance indicators namely Mean Absolute Error, Root Mean Squared Error, Prediction Accuracy, and Index of Agreement. Overall, Linear Interpolation method was selected as the best imputation method to predict the missing data in the rainfall dataset.
      2  15
  • Publication
    Characteristics of PM10 Level during haze events in Malaysia based on quantile regression method
    (MDPI, 2023)
    Siti Nadhirah Redzuan
    ;
    ;
    Nur Alis Addiena A. Rahim
    ;
    Izzati Amani Mohd Jafri
    ;
    Syaza Ezzati Baidrulhisham
    ;
    Ahmad Zia Ul-Saufie
    ;
    Andrei Victor Sandu
    ;
    Petrica Vizureanu
    ;
    Mohd Remy Rozainy Mohd Arif Zainol
    ;
    György Deák
    Malaysia has been facing transboundary haze events repeatedly, in which the air contains extremely high particulate matter, particularly PM10, which affects human health and the environment. Therefore, it is crucial to understand the characteristics of PM10 concentration and develop a reliable PM10 forecasting model for early information and warning alerts to the responsible parties in order for them to mitigate and plan precautionary measures during such events. This study aims to analyze PM10 variation and investigate the performance of quantile regression in predicting the next-day, the next two days, and the next three days of PM10 levels during a high particulate event. Hourly secondary data of trace gases and the weather parameters at Pasir Gudang, Melaka, and Petaling Jaya during historical haze events in 1997, 2005, 2013, and 2015. The Pearson correlation was calculated to find the correlation between PM10 level and other parameters. Moderate correlated parameters (r > 0.3) with PM10 concentration were used to develop a Pearson–QR model with percentiles of 0.25, 0.50, and 0.75 and were compared using quantile regression (QR) and multiple linear regression (MLR). Several performance indicators, namely mean absolute error (MAE), root mean squared error (RMSE), coefficient of determination (R2), and index of agreement (IA), were calculated to evaluate and compare the performances of the predictive model. The highest daily average of PM10 concentration was monitored in Melaka within the range of 69.7 and 83.3 µg/m3. CO and temperature were the most significant parameters associated with PM10 level during haze conditions. Quantile regression at p = 0.75 shows high efficiency in predicting PM10 level during haze events, especially for the short-term prediction in Melaka and Petaling Jaya, with an R2 value of >0.85. Thus, the QR model has high potential to be developed as an effective method for forecasting air pollutant levels, especially during unusual atmospheric conditions when the overall mean of the air pollutant level is not suitable for use as a model.
      1  9
  • Publication
    A review of PM₁₀ concentrations modelling in Malaysia
    (IOP Publishing Ltd., 2020)
    Wan Nur Shaziayani
    ;
    Ahmad Zia Ul-Saufie
    ;
    Zuraira Libasin
    ;
    Fuziatul Norsyiha Ahmad Shukri
    ;
    Sharifah Sarimah Syed Abdullah
    ;
    The purpose of predictive modelling is to predict the variable of interest with reasonable precision, and often to assess the contribution of the independent variables to the dependent variable. In this paper, all of the works examined are aimed at predicting concentrations of outdoor PM₁₀ concentrations. The vast majority of the works reported used almost exclusively predictors of the meteorological and source emissions. However, the use of the Hybrid model in predicting PM₁₀ concentrations is still not widely used in Malaysia.
      2  10
  • Publication
    Variability of PM10 level with gaseous pollutants and meteorological parameters during episodic haze event in Malaysia: domestic or solely transboundary factor?
    (Elsevier, 2023)
    Nur Alis Addiena A Rahim
    ;
    ;
    Izzati Amani Mohd Jafri
    ;
    Ahmad Zia Ul-Saufie
    ;
    ;
    Ain Nihla Kamarudzaman
    ;
    ;
    Mohd Remy Rozainy Mohd Arif Zainol
    ;
    Sandu Andrei Victor
    ;
    Gyorgy Deak
    Haze has become a seasonal phenomenon affecting Southeast Asia, including Malaysia, and has occurred almost every year within the last few decades. Air pollutants, specifically particulate matter, have drawn a lot of attention due to their adverse impact on human health. In this study, the spatial and temporal variability of the PM10 concentration at Kelang, Melaka, Pasir Gudang, and Petaling Jaya during historic haze events were analysed. An hourly dataset consisting of PM10, gaseous pollutants and weather parameters were obtained from Department of Environment Malaysia. The mean PM10 concentrations exceeded the stipulated Recommended Malaysia Ambient Air Quality Guideline for the yearly average of 150 μg/m3 except for Pasir Gudang in 1997 and 2005, and Petaling Jaya in 2013. The PM10 concentrations exhibit greater variability in the southwest monsoon and inter-monsoon periods at the studied year. The air masses are found to be originating from the region of Sumatra during the haze episodes. Strong to moderate correlation of PM10 concentrations was found between CO during the years that recorded episodic haze, meanwhile, the relationship of PM10 level with SO2 was found to be significant in 2013 with significant negatively correlated relative humidity. Weak correlation of PM10-NOx was measured in all study areas probably due to less contribution of domestic anthropogenic sources towards haze events in Malaysia.
      24  2