SympGraph: A framework for mining clinical notes through symptom relation graphs
Abstract
As an integral part of Electronic Health Records (EHRs), clinical notes pose special challenges for analyzing EHRs due to their unstructured nature. In this paper, we present a general mining framework SympGraph for modeling and analyzing symptom relationships in clinical notes. A SympGraph has symptoms as nodes and co-occurrence relations between symptoms as edges, and can be constructed automatically through extracting symptoms over sequences of clinical notes for a large number of patients. We present an important clinical application of SympGraph: symptom expansion, which can expand a given set of symptoms to other related symptoms by analyzing the underlying SympGraph structure. We further propose a matrix update algorithm which provides a significant computational saving for dynamic updates to the graph. Comprehensive evaluation on 1 million longitudinal clinical notes over 13K patients shows that static symptom expansion can successfully expand a set of known symptoms to a disease with high agreement rate with physician input (average precision 0.46), a 31% improvement over baseline co-occurrence based methods. The experimental results also show that the expanded symptoms can serve as useful features for improving AUC measure for disease diagnosis prediction, thus confirming the potential clinical value of our work. © 2012 ACM.