Data Stream Mining in Fog Computing Environment with Feature Selection Using Ensemble of Swarm Search Algorithms
Fog computing emerged as a contemporary strategy to process big streaming data efficiently. It is designed as a distributed computing platform for supporting the data analytics for Internet of Things (IoT) applications that pushes the data analytics from Cloud server to the far edge of a sensor network. As the name suggests, ubiquitous data which is collected from the sensors are processed locally rather than on the central servers. Fog computing helps avoid performance bottleneck at the center point and relieves raw data from overwhelming towards the center of the network. However, suitable data analysis algorithms such as those of data stream mining that are consist of learning and recognizing patterns from the incoming data streams must be fast and accurate enough for supporting Fog computing. This paper reports about a computer simulation of running data stream mining algorithms in Fog environment. Furthermore, feature selection that is powered by swarm search is used as a preprocessing method for improving the accuracy and speed of local Fog data analytics. Through the experiment, the results reveal which algorithms are the best choice to deliver edge intelligence in Fog computing environment.
The data analytics workload is delegated to nodes at the network edge instead of central Cloud server or its core data analytics server. Fog computing committed to ameliorate efficiency and reduce data transfer to the Cloud for processing, analysis and storage, it will therefore min imize data analytics latency. Hence it increases the efficiency of the Internet of Things(IoT) operation. Nowadays there is an increasing growth of data volumes acquired from many sensing devices. The database of scalable distributed data processing systems is used to store all these streaming data. For instance, such systems are Apache Hadoop and Apache Spark, they are used to process large amounts of data like those in the systems by Google, Yandex and the other social network. Using Map Reduce, for example, these big data systems are quite strong and mature in processing and providing analytics at the servers. However, given the sheer volumes of sensing data, when the analytics and perhaps processing are to be done at the edge, it is indeed a challenging computing problem. In Fog computing, the edge nodes are mainly responsible in data preprocessing and analyzing patterns from the incoming data streams. Speed, efficiency and accuracy are required from data mining algorithms for data mining big streaming data which may amount to infinity. For supporting edge intelligence in Fog computing, to find a proper data mining algorithm(s) is essential.
This paper mainly focuses on analyzing about the feasibility of traditional data mining and data stream mining algorithms and compare them in a Fog computing scenario. The data mining experimentation is on a classification problem where air/gas samples are collected from sensors and the model that is built by the algorithm(s) in the form of decision tree would decide what type of gas it is. Decision tree is a kind of non-black-box machine learning model, which is a flow-chart-like structure for decision making. The tree branches could be extracted into useful predicate-type of decision rules. The rules are simple to understand and interpret by both human and machine where they could be coded as logics into embedded devices. Moreover, combining correlation-based feature selection algorithm, traditional search methods with ensemble of swarm search methods are to be integrated into the data mining algorithm as pre-processing mechanism. The simulation of this experiment is concerning: in case of an IoT environment, emergency service takes over in priority. It will take into account of real-time constraint and capability requirements.
Fog computing provides advantages of bringing analytics to the edge intelligence. The paper shows a simulation experiment comparing two classification algorithms C4.5 and HT respectively. C4.5 classification rules have a high accuracy which used to apply in Cloud platform. HT is a popular choice of data stream mining algorithm which could be well used for Fog computing. The simulation experiment is dedicated to IoT emergency services. Through collecting a large amount of data from gas sensor data to analyze all kinds of gas and then measure air quality. As a consequence of the experiment, C4.5 potentially gets high accuracy if the whole data are trained. But in the Fog computing environment, the data are streaming in large amount nonstop into the data stream mining model. So, the model must be able to handle incremental learning from seeing only a portion of the data stream at a time. And it updates itself quickly each time fresh data is seen. Real-time latency and accuracy are required in IoT environment especially in Fog environment; the experiment concludes that FS would have a slightly greater impact on C4.5. However, FS contribute, to ameliorate the performance of HT in Fog environment. Moreover, Harmony search is an effective search method to strengthen the accuracy, time requirement and time cost for HT model in the data stream mining environment. Fog computing using HT coupled with FS Harmony could have a good accuracy, low latency and reasonable data scalability.
 Rabindia K. Bairk, Harishhandra Dubey, Arun B. Samddar, Rajan D. Gupata and I.N. Prakash K. Ray, “FogGIS: Fog Computing for Geospatial Big Data Analytics” IEEE UPCON pp. 613-618, Dec 9-11 2016.
 Ivan Kholod, Mikhail Kuprianov, llya Petukhov, “Distributed Data Mining Based on actors for Internet of Things” MECO 2016 pp. 480-484, 2016
 Trans. AMS, 39:472-482 (1936)  Kholod, I.: Framework for multi threads execution of data mining algorithms. In: Proceeding of 2015 IEEE North West Russia Section Young Researchers in Electrical and Electronic Engineering Conference. (2015 ElConRusW), pp. 74–80. IEEE Xplore (2015)
 Satyanarayanan, Mahadev, et al., Edge analytics in the internet of things, Pervasive Computing, IEEE 14.2 pp. 24-31, 2015.
 P. Spachos, L. Song, and D. Hatzinakos, “Prototypes of opportunistic wireless sensor networks supporting indoor air quality monitoring,” IEEE Consumer Communications and Networking Conference (CCNC) 2013, Jan. 2013
 Y. J. Jung, Y. K. Lee, D. G. Lee, K. H. Ryu, and S. Nittel, “Air pollution monitoring system based on geosensor network”, in Proc. IEEE Int. Geoscience Remote Sensing Symp., 2008, vol. 3, pp. 1370-1373.
 Shisong Zhu, Yunjia Wang, Lifang Kong, “Data Mining of Coal Mining Gas T ime Series and Knowledge Discover,” IEEE Forth International Symposium on Computational Intelligence and Design, 2011, pp. 306- 309.