Impact of Skin Tone Diversity on Out-of-Distribution Detection Methods in Dermatology
Abstract
Addressing representation issues in dermatological settings is crucial due to variations in how skin conditions manifest across skin tones, thereby providing competitive quality of care across different population segments. Although bias and fairness assessment in skin lesion classification has been an active research area, substantially less exploration has been done of the implications of skin tone representations on Out-of-Distribution (OOD) detectors' performance. Most skin datasets are reported to suffer from bias in skin tone distribution, which could lead to skewed model performance across skin tones. This paper explores the impact of variations of representation rates across skin tones during the training of OOD detectors and their downstream implications on performance. We review and compare state-of-the-art OOD detectors across two categories of skin tones, FST I-IV (lighter tones) and FST V-VI (brown and darker tones), over samples collected from different clinical protocols. Our experiments conducted using multiple skin image datasets reveal that increasing the representation of FST V-VI during training reduces the representation gap by $\approx 5\%$. We also observe an increase in the overall performance metrics for FST V-VI when more representation is shown during training. Furthermore, the group fairness metrics evaluation yields that increasing the FST V-VI representation leads to improved group fairness.