dc.description.abstract | Air pollution data such as PM10, sulphur dioxide, ozone and carbon monoxide are usually obtained using automated machines located at different sites. These are usually due to mechanical failure, routine
maintenance, changes in siting monitors and human error. The occurrence of missing values requires special attention on analyzing the data. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, the need to find the best way in estimating missing values is very
important so that the data analyzed is ensured of high quality. In this study, four types of imputation techniques
that are linear, quadratic, cubic and nearest neighbour interpolations were used to replace the missing values. Annual hourly monitoring data for PM10 were used to generate missing values. Five randomly simulated missing data were evaluated in order to test the efficiency of the methods used. They are 5%, 10%, 15%, 25% and 40%. Four types of performance indicators that are mean absolute error (MAE), root mean square error (RMSE), coefficient of determination (R2) and prediction accuracy (PA) were calculated to describe the
goodness of fit for all the method. From all the method applied, it was found that linear interpolation method is the best method for estimating data for all percentages of simulated missing values. | en_US |