Comparative Analysis of Energy-Efficient Scheduling Algorithms for Big Data Applications
Nowadays, big data analytics has been widely applied in addressing the growing cybercrime threats. However, energy consumption is explosive increasing with the fast growth of big data processing in anti-cybercrime. In this paper, an energy-efficient framework for big data applications is proposed to reduce energy consumption while satisfying deadline constrains. First, the problem of energy-efficient tasks scheduling of a single Spark job is modeled as an Integer Program. We design an energy-efficient tasks scheduling algorithm to minimize the energy consumption for big data application in Spark. To avoid SLA violations for execution time, we propose an optimal task scheduling algorithm with deadline constrains by trading off execution time and energy consumption. Experiments on a Spark cluster are performed to determine the energy consumption and execution time for several workloads from the HiBench benchmark suite. Our algorithms consume less energy on average than FIFO and FAIR under deadlines. The optimal algorithm is able to find near optimal tasks schedules to trade off energy consumed and response time benefit in small shuffle partitions.
The existing studies on Spark scheduling focused on minimizing the time between the arrival and the completion time of a big data application    . To increase performance and reduce costs, Muhammed et al.  proposed a resource allocation framework of Spark to provide fine-grained resource allocation for any type of big data applications. Resource waste occurs while a big data application runs in all the nodes in a Spark cluster
We design tasks scheduling algorithms that optimize the energy efficiency of running big data application in Spark clusters, while satisfying the deadline constrains. In this algorithm, based on an energy consumption model for Spark, a strategy table for the relationship between task and executor is designed to reduce the energy consumption for spark applications. Compared with original scheduling algorithms of Spark FIFO and FAIR, Our algorithm can satisfy the SLA by trading off the execution time and energy consumption. It also effectively reduces the total energy consumption of Spark application under deadline constrains.
Spark as a unified engine for big data processing has been widely used in anti-cybercrime. While big data applications are executed on large Spark clusters, energy consumption is a critical concern in data centers. Energy-Efficient tasks scheduling under deadline constraint has become a key problem for Spark. In this paper, we design two energy-efficient tasks scheduling algorithms in Spark for big data applications. Our objective is to understand the tradeoff between energy consumption and execution time for several different kinds of workload in benchmarks such as Sort, TeraSort, and PageRank. This work provides some insight on the relationship between energy consumption and SLA guaranteeing.
 P. Dhaka and R. Johari, “CRIB: Cyber crime investigation, data archival and analysis using big data tool,” in Computing, Communication and Automation (ICCCA), 2016 International Conference on, Noida, India, 2016, pp. 117–121.
 O. M. Adedayo, “Big data and digital forensics,” in Cybercrime and Computer Forensic (ICCCF), IEEE International Conference on, Vancouver, BC, Canada, 2016, pp. 1–7.
 A. Shalaginov, J. W. Johnsen, and K. Franke, “Cyber crime investigations in the era of big data,” in Big Data (Big Data), 2017 IEEE International Conference on, Boston, MA, USA, 2017, pp. 3672–3676.
 M. Zaharia, R. S.Xin, and P. Wendell et al., “Apache spark: a unified engine for big data processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, Nov. 2016, DOI. 10.1145/2934664.
 M. Zaharia, M. Chowdhury, and T. Das et al., “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,” in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, San jose, CA, USA, 2012, pp. 2–2.
 M. T. Islam, S. Karunasekera, and R. Buyya, “dSpark: Deadline-based Resource Allocation for Big Data Applications in Apache Spark,” in eScience (e-Science), 2017 IEEE 13th International Conference on, Auckland, New Zealand, 2017, pp. 89–98.
 A. Z. Zhang, “Scheduler module explain,” in Spark internals: design and implement principle of Spark core, 1th ed. Beijing, China: China Machine Press, 2015, ch. 4, sec. 3, pp. 57–72.
 J. Chen, K. Li, and Z. Tang et al., “A parallel random forest algorithm for big data in a Spark cloud computing environment,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 4, pp. 919–933, Apr. 2017, DOI. 10.1109/TPDS.2016.2603511.
 H. Yang, X. Liu, and S. Chen et al., “Improving Spark performance with MPTE in heterogeneous environments,” in Audio, Language and Image Processing (ICALIP), 2016 International Conference on, Shanghai, China, 2016, pp. 28–33.
 A. Gounaris, G. Kougka, and R. Tous et al., “Dynamic configuration of partitioning in spark applications,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 7, pp. 1891–1904, Jul. 2017, DOI. 10.1109/TPDS.2017.2647939.
 J. Huang, Y. Zhou, and Q. Duan et al., “Semantic Web Service Composition in Big Data Environment,” in GLOBECOM 2017 – 2017 IEEE Global Communications Conference, Singapore, Singapore, 2017, pp. 1-7.
 J. Huang, J. Zou, and C. Xing, “Competitions among Service Providers in Cloud Computing: A New Economic Model,” IEEE Transactions on Network and Service Management, PP. 1–12, Apr. 2018, DOI. 10.1109/TNSM.2018.2825358.
 J. Huang, Q. Duan, and S. Guo et al., “Converged NetworkCloud Service Composition with End-to-End Performance Guarantee,” IEEE Transactions on Cloud Computing, pp. 1–15, Oct. 2015, DOI. 10.1109/TCC.2015.2491939.
 R. Buyya, C. S. Yeo, and S. Venugopal et al., “Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility,” uture Generation computer systems, vol. 25, no. 6, pp. 599-616, Jun. 2009, DOI. 10.1016/j.future.2008.12.001.
 S. Srikantaiah, A. Kansal, and F. Zhao, “Energy Aware Consolidation for Cloud Computing,” in Proceedings of the 2008 conference on Power aware computing and systems, San Diego, CA, USA, 2008, pp. 10-10.