A Verifiable Imputation Analysis for Univariate Time Series and Enabling Package

Nianjun Zhou; Dhaval Patel; Arun Iyengar; Shrey Shrivastava; Anuradha Bhamidipaty

doi:10.1109/BigData50022.2020.9377909

Publication

Big Data 2020

Conference paper

A Verifiable Imputation Analysis for Univariate Time Series and Enabling Package

Big Data 2020

View publication

Abstract

This paper proposes a verifiable imputation process and an enabling tool for univariate time series. Common ad-hoc and case-specific imputation are not enough to ensure high quality and effective imputation. We adopt the similar verification logic of supervised learning. We use artificial missing sampling as the test set to estimate a set of imputers' performances and use the estimated performances to select the best imputer. To ensure the correctness of selection, we analyze the impact of various factors on estimation accuracy. Those factors are missing rate, size of artificial missing data and patterns, selected imputers, and noise level. We propose a two-step verifiable imputation process to integrate all of the steps. With this process, we can always leverage the most suitable imputer to achieve a high quality of imputation without tedious and error-prone data cleaning efforts. We implement the tool as a Python package, with many imputers with their unique capabilities and a API. We automate the imputation through a standard process, which returns imputed results and detailed rationales of selection along with quality metrics.

Date

10 Dec 2020

Publication

Big Data 2020

Authors

IBM-affiliated at time of publication

Abstract

Date

Publication

Authors

Share