Signature-based anomaly detection in networks
Abstract
The problem of outlier detection has been studied extensively in spatial and multi-dimensional databases. In the multi-dimensional case, the problem is much simpler because of the natural interpretability of the outliers in terms of distances. For example, in multi-dimensional data, the data points satisfy the triangle inequality. This can be considered a relaxed version of the transitivity property in terms of closeness of data points. Therefore, it is much easier to find data points which are situated far away from the majority of other points. This is however not the case in general networks in which closeness does not show such transitivity. In fact, some nodes can be defined as outliers when they are either close to an excessively large number of nodes or far away from a large number of nodes. Therefore, traditional measures of distances or density sparsity cannot be used to accurately model the concept of outliers in massive networks. We define two kinds of signatures in massive networks: distance set signatures and distance frequency signatures. We use these signatures to model the outlier detection problem effectively in massive networks. We present experimental results illustrating the effectiveness of our approach over a structural distance-based approach.