Adaptive Calibrations of Spatially Misaligned IoT Data
We consider a challenging problem of calibrating geo-referenced data that suffers from spatial misalignment due to the use of multiple instruments to measure the same variables. Some instruments are precise but expensive, while low-cost instruments are less accurate but more widespread. Data fusion techniques are often employed to combine these different sources and extract more information, but spatial misalignment hinders the direct application of usual fusion methods. Before data fusion, we need careful calibration for those untrustworthy observations. Otherwise, without reliable models, more data can introduce bias and noise.
To address this, we propose a strategy to calibrate fine particulate matter (PM2.5) data in Taiwan. We have two sources of PM2.5 concentration measurements from traditional monitoring stations and low-cost IoT devices called AirBoxes. AirBoxes are unreliable but easily deployed and form a large network. A one-size-fits-all calibration procedure for all AirBoxes does not work well, because the relationship between measurements from AirBoxes and traditional monitoring stations is not homogeneous in space and many outliers exist. We develop a fast, robust method to model the PM2.5 processes and use a spatially varying coefficient regression framework to calibrate AirBox measurements. The calibration significantly improves PM2.5 prediction performance, reducing the rooted mean-squared prediction error by 37% to 67% compared to those without calibration.
Keywords: Geostatistics; Robust estimation; Spatially varying coefficient model; Heterogeneous variance; Misalignment.