Data augmentation as a service for single view creation
Abstract
Businesses are increasingly realizing the value of creating a single view of its customers and partners by integrating information residing in 'siloed' datasets within and outside the enterprise. However, the task of augmenting data available within the enterprise with data purchased from third-party providers or that residing in a public domain such as Web often results in warehouses that contain databases having incomplete and/or inconsistent data. Hence, before the data can become useful, one must eliminate the inconsistency in values appended to the enterprise data. In this paper, we present Data Augmentation as a service (DAaS) that can help business in creating a consistent and usable single view of entities of interest. Specifically, our service will enable business rule writers to quickly create data augmentation rules by using our approximate functional dependency driven rule generation scheme. An accompanying challenge comes from having to manage a large number of rules and ensuring that new rules do not negate already existing rules. To mitigate this problem a rule-management and evaluation system that uses the Ripple Down Rules (RDR) framework is provided as part of our service. Using several large real-world datasets, we show our ability to learn rules for imputing attribute values with high accuracy and scalability necessary for enterprise users, how conflicts can arise within rules, and finally our ability to effectively handle those conflicts with high accuracy. © 2011 IEEE.