Interactive methods for taxonomy editing and validation
Abstract
Taxonomies are meaningful hierarchical categorizations of documents into topics reflecting the natural relationships between the documents and their business objectives. Improving the quality of these taxonomies and reducing the overall cost required to create them is an important area of research. Supervised and unsupervised text clustering are important technologies that comprise only a part of a complete solution. However, there exists a great need for the ability for a human to efficiently interact with a taxonomy during the editing and validation phase. We have developed a comprehensive approach to solving this problem, and implemented this approach in a software tool called eClassifier. eClassifier provides features to help the taxonomy editor understand and evaluate each category of a taxonomy and visualize the relationships between the categories. Multiple techniques allow the user to make changes at both the category and document level. Metrics then establish how well the resultant taxonomy can be modeled for future document classification. In this paper, we present a comprehensive set of viewing, editing and validation techniques we have implemented in the Lotus Discovery Server resulting in a significant reduction in the time required to create a quality taxonomy.