Publication
EMNLP 2024
Paper

HealthAlignSumm : Utilizing Alignment for Multimodal Summarization of Code-Mixed Healthcare Dialogues

Abstract

As generative AI progresses, collaboration between doctors and AI scientists is leading to the development of personalized models to streamline healthcare tasks and improve productivity. Summarizing doctor-patient dialogues has become important, helping doctors understand conversations faster and improving patient care. While previous research has mostly focused on text data, incorporating visual cues from patient interactions allows doctors to gain deeper insights into medical conditions. Most of this research has centred on English datasets, but real-world conversations often mix languages for better communication. To address the lack of resources for multimodal summarization of code-mixed dialogues in healthcare, we developed the MCDH dataset. Additionally, we created HealthAlignSumm, a new model that integrates visual components with the BART architecture. This represents a key advancement in multimodal fusion, applied within both the encoder and decoder of the BART model. Our work is the first to use alignment techniques, including state-of-the-art algorithms like Direct Preference Optimization, on encoder-decoder models with synthetic datasets for multimodal summarization. Through extensive experiments, we demonstrated the superior performance of HealthAlignSumm across several metrics validated by both automated assessments and human evaluations