- Mathematics; Computer Science
book chapter description
Anomaly detection is an important problem in many applications, ranging from medical informatics to network security. Various distribution-based techniques have been proposed to tackle this issue, which try to learn the probabilistic distribution of conventional behaviors and consider the observations with low densities as anomalies. For categorical observations, multinomial or dirichlet compound multinomial distributions were adopted as effective statistical models for conventional samples. However, when faced with small-scale data set containing multivariate categorical samples, these models will suffer from the curse of dimensionality and fail to capture the statistical properties of conventional behavior, since only a small proportion of possible categorical configurations will exist in the training data. As an effective bayesian non-parametric technique, categorical latent Gaussian process is able to model small-scale categorical data through learning a continuous latent space for multivariate categorical samples with Gaussian process. Therefore, on the basis of categorical latent Gaussian process, we propose an anomaly detection technique for multivariate categorical observations. In our method, categorical latent Gaussian process is adopted to capture the probabilistic distributions of conventional categorical samples. Experimental results on categorical data set show that our method can effectively detect anomalous categorical observations and achieve better detection performance compared with other anomaly detection techniques.