Machines in the conversation: Detecting themes and trends in informal communication streams
Abstract
Data-mining techniques that detect trends and patterns in structured data are often ill-suited for analysis of unstructured text. Information critical to business - and generated by groups such as employees, customers, and the public - appears in such forms as chats, electronic discussion forums, and blogs. This paper describes techniques developed to detect themes and trends in such informal communication streams. Our approach begins with unsupervised text clustering to create initial categories. A human analyst then refines the categories into easily understandable themes. To facilitate this process, we developed an interactive approach to text category creation and validation that aids the analyst in evaluating each category of a taxonomy and makes it possible to visualize relationships among categories. The resulting analysis can then be communicated to participants in real time. We report on the results of using these techniques in IBM companywide "Jam" events, during which tens of thousands of employees worldwide participated in electronic discussions of key business issues. © Copyright 2006 by International Business Machines Corporation.