Large-scale multimodal mining for healthcare with MapReduce
Abstract
Recent advances in healthcare and bioscience technologies and proliferation of portable medical devices have produce massive amount of multimodal data, the need for parallel processing is apparent for mining these data sets, which can range anywhere from tens of gigabytes, to terabytes or even petabytes. AALIM (Advanced Analytics for Information Management) is a new multimodal mining-based clinical decision support system that brings together patient data captured in many modalities to provide a holistic presentation of a patient's exam data, diseases, and medications. In addition, it offers disease-specific similarity search based on the various data modalities. The current deployed AALIM system is only able to process limited amount of patient data per day. In this paper, we attempt to address this challenge of building a healthcare multimodal mining system on top of the MapReduce framework, specifically its popular open-source implementation, Hadoop. We presented a scalable and generic framework that enables automatic parallelization of the healthcare multimodal mining algorithm, and distribution of large-scale computation that achieves high performance on clusters of commodity servers. Initial testing of importing a single AALIM module (EKG period estimation) using Hadoop on a cluster of servers shows very promising results. © 2010 ACM.