Jointly Attacking Graph Neural Network and its Explanations
Abstract
Graph Neural Networks (GNNs) have boosted the performance for many graph-related tasks. Despite the great success, recent studies have shown that GNNs are still vulnerable to adversarial attacks, where adversaries can mislead the GNNs' prediction by modifying graphs. On the other hand, the explanation of GNNs (GnnExplainer for short) provides a better understanding of a trained GNN model by generating a small subgraph and features that are most influential for its prediction. In this paper, we first perform empirical studies to validate that GnnExplainer can act as an inspection tool and have the potential to detect the adversarial perturbations for graphs. This finding motivates us to further investigate a new problem: Whether a graph neural network and its explanations can be jointly attacked by modifying graphs with malicious desires? It is challenging to answer this question since the goals of adversarial attack and bypassing the GnnExplainer essentially contradict with each other. In this work, we give a confirmative answer for this question by proposing a novel attack framework (GEAttack) for graphs, which can attack both a GNN model and its explanations by exploiting their vulnerabilities simultaneously. To the best of our knowledge, this is the very first effort to attack both GNNs and explanations on graph-structured data for the trustworthiness of GNNs. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of the proposed method.