Generalizing Adversarial Training to Composite Semantic Perturbations
Abstract
Model robustness against adversarial examples has been widely studied, yet the lack of generalization to more realistic scenarios can be challenging. Specifically, recent works using adversarial training can successfully improve model robustness, but these works primarily consider adversarial threat models limited to -norm bounded perturbations and might overlook semantic perturbations and their composition. In this paper, we firstly propose a novel method for generating composite adversarial examples. By utilizing component-wise PGD update and automatic attack- order scheduling, our method can find the optimal attack composition. We then propose generalized adversarial training (GAT) to extend model robustness from norm to composite semantic perturbations, such as Hue, Saturation, Brightness, Contrast, and Rotation. The results show that GAT can be robust not only on any single attack but also on combination of multiple attacks. GAT also outperforms baseline adversarial training approaches by a significant margin.