Iterative cohort analysis and exploration
Abstract
Cohort analysis is a widely used technique for the investigation of risk factors for groups of people. It is commonly employed to gain insights about interesting subsets of a population in fields such as medicine, bioinformatics, and social science. The nature of these analyses is evolving as larger collections of data about individuals become available. Examples of emerging large-scale data sources include electronic medical record systems and social network datasets. When domain experts perform cohort analyses using such massive datasets, they typically rely on a team of technologists to help manage and process the data. This results in a slow and cumbersome analysis process in which iterative exploration is difficult. To address this challenge, we are exploring technologies designed to help domain experts work more independently and more quickly. This article describes CAVA, a platform for Cohort Analysis via Visual Analytics. We introduce three primary types of artifacts (cohorts, views, and analytics) and an architecture that connects these elements together to provide an interactive exploratory analysis environment designed for domain experts. In addition to the CAVA design, this article presents two use cases from the health-care domain and a domain-expert evaluation to demonstrate the power of our approach.