Publication
DSMM 2019
Conference paper

Learning Explainable Entity Resolution Algorithms for Small Business Data using SystemER

View publication

Abstract

The 2019 FEIII CALI data challenge aims at linking diferent representations of the same real-world entities across multiple public datasets that collect identiication and activity data about small to medium enterprises (SMEs) in California. We formalize this challenge as a learning-based entity resolution (ER) task, the goal of which is to learn a high-precision and high-recall pair-wise ER model that classiies small business entity pairs into matches and non-matches. Realistic ER tasks usually involve a pipeline of labor-intensive and error-prone tasks, such as data preprocesing, gathering of training data, feature engineering, and model tuning. In this task, we apply an advanced human-in-the-loop system, named SystemER, to learn ER algorithms for SME entities. Powered by active learning and via a carefully designed user interface, SystemER can learn high-quality explainable ER algorithms with low human efort, while achieving high-accuracy on the datasets provided by the FEIII CALI data challenge.

Date

Publication

DSMM 2019

Authors

Share