Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set
Norazian, Mohamed Noor
Mohd Mustafa, Al Bakri Abdullah
Ahmad Shukri, Yahaya
Nor Azam, Ramli
MetadataShow full item record
Missing data is a very frequent problem in many scientific field including environmental research. These are usually due to machine failure, routine maintenance, changes in siting monitors and human error. Incomplete datasets can cause bias due to systematic differences between observed and unobserved data. Therefore, the need to find the best way in estimating missing values is very important so that the data analysed is ensured of high quality. In this study, two methods were used to estimate the missing values in environmental data set and the performances of these methods were compared. The two methods are linear interpolation method and mean method. Annual hourly monitoring data for PM10 were used to generate simulated missing values. Four randomly simulated missing data patterns were generated for evaluating the accuracy of imputation techniques in different missing data conditions. They are 10%, 15%, 25% and 40%. Three types of performance indicators that are mean absolute error (MAE), rootmean squared error (RMSE) and coefficient of determination (R2) were calculated in order to describe the goodness of fit for the two methods. From the two methods applied, it was found that linear interpolation method gave better results compared to mean method in substituting data for all percentage of missing data considered.