Imputatation methods for filling the long interval of missing observations in air pollution data and meteorological dataset (Restricted)

Nur Afiqah Zakaria

Imputatation methods for filling the long interval of missing observations in air pollution data and meteorological dataset (Restricted)

Date Issued

2018

Author(s)

Nur Afiqah Zakaria

Handle (URI)

https://hdl.handle.net/20.500.14170/16067

Abstract

Missing data always happened in real time processing applications that are highly depending on data and the example of time series data in environmental field is air pollution data. These data were obtained from the automated monitoring stations and usually contained missing observations due to the routine maintenance, human error, machine failure, change in monitors siting and other factors that can lead to missing data. The problem that can arise from missing data are error in measurement, insufficient sampling, bias due to systematic difference between observed and unobserved data and fault in data acquisition. In this study, hourly monitoring records of four air pollution data (PM10,CO, SO2, NO2) and three meteorological data (ambient temperature, wind speed and humidity) for Gombak (from year 2000 to 2008) and Klang (from year 2000 to 2009) were used. From these dataset, the data that contain the lowest missing data and the most complete data were chosen as the reference data. The dataset that were used as reference data was Klang and Gombak in 2003. The simulation of missing data in this study was designed based on the real trend and pattern of missing data in Malaysia. The dataset were simulated into four percentages of missing data i.e 5%, 10%, 15% and 20%. Six single imputation methods (series mean, mean nearest neighbor, expectation maximization, linear interpolation, 0.2 and 0.3 exponential smoothing and 3 and 5 moving average) and markov chain monte carlo were applied in this simulation study. Then, the goodness of fit of these imputation methods was described by using four performance indicators (Mean Absolute Error, Root Mean Squared Error, Index of Agreement and Prediction Accuracy). Overall, Expectation Maximization (EM) was found out to be the best imputation method to replace the long gap of missing data, meanwhile series mean imputation method was the worst imputation method.

Subjects

Options

Imputatation methods for filling the long interval of missing observations in air pollution data and meteorological dataset (Restricted)