Publication
WWW 2019
Conference paper

Identifying high value opportunities for human in the loop lexicon expansion

View publication

Abstract

Many real world analytics problems examine multiple entities or classes that may appear in a corpus. For example, in a customer satisfaction survey analysis there are over 60 categories of (somewhat overlapping) concerns. Each of these is backed by a lexicon of terminology associated with the concern (e.g., �Easy, user friendly process" or "Process confusing, too many handoffs�). These categories need to be expanded by a subject matter expert as the terminology is not always straight forward (e.g., �handoffs� may also include �ping-pong� and �hot potato� as relevant terms). But given that Subject Matter Expert time is costly, which of the 60+ lexicons should we expand first? We propose a metric for evaluating an existing set of lexicons and providing guidance on which are likely to benefit most from human-in-the-loop expansion. Using our ranking results we achieved 4 improvement in impact when expanding the first few lexicons off our suggested list as compared to a random selection.