Home
  • English
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Latviešu
  • Magyar
  • Nederlands
  • Português
  • Português do Brasil
  • Suomi
  • Log In
    New user? Click here to register. Have you forgotten your password?
Home
  • Browse Our Collections
  • Publications
  • Researchers
  • Research Data
  • Institutions
  • Statistics
    • English
    • Čeština
    • Deutsch
    • Español
    • Français
    • Gàidhlig
    • Latviešu
    • Magyar
    • Nederlands
    • Português
    • Português do Brasil
    • Suomi
    • Log In
      New user? Click here to register. Have you forgotten your password?
  1. Home
  2. Research Output and Publications
  3. Faculty of Civil Engineering & Technology
  4. Journal Articles
  5. Assessment of time series model for predicting long-interval consecutive missing values in air quality dataset
 
Options

Assessment of time series model for predicting long-interval consecutive missing values in air quality dataset

Journal
Journal of Advanced Research Design
ISSN
2289-7984
Date Issued
2025
Author(s)
Daniel Kim Boon Bong
Universiti Malaysia Perlis
Norazian Mohamed Noor
Universiti Malaysia Perlis
Ahmad Zia Ul-Saufie
Universiti Teknologi MARA
Faizal Ab Jalil
Department of Environment, Kedah
György Deák
National Institute for Research and Development in Environmental Protection INCDPM
DOI
10.37934/ard.127.1.1631
Handle (URI)
https://akademiabaru.com/
https://hdl.handle.net/20.500.14170/15738
Abstract
Air pollutant concentration in Malaysia is continuously monitored using the Continuous Ambient Air Quality Machine (CAAQM). During the observation phase by CAAQM, some air pollutant datasets were detected missing due to machine failure, maintenance, position changes and human error. Incomplete datasets especially with the longer gaps of consecutive missing observation may lead to several significant problems including loss of efficiency, difficulties in using some computational software and bias estimation due to differences of observed and predicted dataset. This study aim evaluates the performance of the time series method i.e. Auto Regression Integrated Moving Average (ARIMA) for filling long hours of missing data in an air pollution dataset. The dataset of PM10, SO2, NO2, O3, CO, wind speed, relative humidity and ambient temperature for Pegoh and Kota Kinabalu in 2018 were used for analysis. Monte Carlo Markov Chain (MCMC) and Expectation-Maximization (EM) were employed to compare with ARIMA's effectiveness in filling the simulated missing gaps in air quality dataset. Existing missing data in the raw data were pre-treated and then simulated into 5%, 10% and 15% of missing data ranging from 24-hour to 120-hour intervals. The performance of the imputation approach was assessed using Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Prediction Accuracy (PA) and Index of Agreement (IA). Overall, the Expectation-Maximization technique was selected the most effective at filling in simulated long gaps of missing data of air pollutant dataset with the range of IA from 0.74 to 0.77. In contrast, the ARIMA approach performed poorly in this research with range of IA value of 0.44 to 0.48. This was because of it requires past time-series data to generalize a forecast or impute missing data, hence, the forecast becomes a straight line and performed poorly at predicting series with long hours of missing observation.
Subjects
  • Air pollutant dataset...

  • ARIMA

  • Imputation method

  • Missing data

  • Time series analysis

File(s)
Assessment of time series model for predicting long-interval consecutive missing values in air quality dataset.pdf (1.35 MB)
google-scholar
Views
Downloads
  • About Us
  • Contact Us
  • Policies