Algorithm Optimization of Anomaly Detection Based on Data Mining
In this paper, firstly two improved algorithm methods are introduced, namely INFLOF and COF, which are based on LOF, then the motivation of each algorithm, the definition of the algorithm and the specific steps of the algorithm are described respectively. Then through summarizing LOF, INFLOF and COF it can find out the intrinsic link between them: INFLOF can solve the problem of edge misjudgment caused by different density cluster’s closing to each other in data set, while COF can solve the problem of outliers, but these kinds of two algorithms are from different steps to solve the outlier factor. Finally, the advantages of the these two algorithms are presented, thus the algorithm of this paper is introduced. Moreover, the definition of the algorithm, as well as the specific steps of the algorithm is respectively introduced, besides it also analyzed the time complexity of algorithm.
At present, data mining has played an irreplaceable role in all aspects of social life. The traditional data mining out its focus on the model that most of the data are concerned with, such as the frequent pattern and the discovery of association rule, categories of judgment and description and clustering analysis and so on, outlier detection is the relatively sparse and isolated abnormal data mode that is found from massive data. Since LOF is put forward, many scholars put forward the improved algorithm, which can be divided into two aspects: one is to improve the efficiency of outlier detection, the other is to improve the accuracy of outlier detection in the complex data distribution. For the former, it is mainly to remove the class or region which cannot contain outlier by clustering or partitioning, so as to reduce the amount of data.
In this paper, it will firstly introduce the two main algorithm methods based on LOF, namely, INLOF and COF, then putting focus on the proposed improved algorithms according to the shortcomings of these two kinds of algorithms, moreover it analyzes the time complexity of the algorithm, in the next chapter it will analyze the effectiveness of the proposed algorithm through the experiment. In this paper, it studies on the second aspect of the problem, which out its focus on how to improve the accuracy of outlier detection through improved definition of outlier factor, so as to make the data points have outlier factor with higher degree. Wenetal as well as other people proposed an outlier factor based on symmetric neighborhood. INFLOF (Influenced Local Outlier Factor) can define the outlier factor based on the symmetric neighbor relationship, the higher INFLO value of the data is, the greater possibility of data become the outlier points.
In this paper, an improved local outliers detection algorithm based on density is proposed. Through having in-depth analysis on two improved algorithms of outliers detection algorithm based on density namely, INFLOF and COF, we can find out their shortcomings, through integrating the advantages of two algorithms, an improved algorithm is proposed in this paper, thus the algorithm and specific steps are given, moreover it also analyzes the time complexity of the algorithm in this paper.
 Sun Huanliang,Bao Yubin,Yu Ge,et al.Analgorithm based on partition for outlier detection [J].Journal of sofeware,2006,17(5):1009-1016.
 Breunig M,Kriegel H,Ng R,et al.lof:Identifying density-based local outliers[C]//Proc.SIGMOD Conf.IEEE,2000:93-104.
 Knorr M E,Ng R T,Tucakov V.Distance-based outliers: Algorithms and applications [J].The VLDB Journal,2000,8(3-4) :237-253.
 Ester M,Kriegel H,Sander J,et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]//Proceeding of the 2nd International Conference on Knowledge Discovery and Data Mining.1996: 226-231. .