APPLYING COMBINATORIAL TESTING TO DATA MINING ALGORITHMS
Data mining algorithms are used to analyze anddiscover useful information from data. This paper presents anexperiment that applies Combinatorial Testing (CT) to five datamining algorithms implemented in an open-source data miningsoftware called WEKA. For each algorithm, we first run thealgorithm with 51 datasets to study the impact different datasetshave on the test coverage. We select one dataset that achieves thehighest branch coverage. Next we construct positive and negativecombinatorial test sets of configuration options and execute eachtest set with the selected dataset. Test effectiveness is measuredusing branch and mutation coverage. Our results suggest thatwhen testing data mining algorithms: (1) larger datasets do notnecessarily achieve higher coverage than smaller datasets; (2) testcoverage increases progressively slower as test strengthincreases; and (3) branch coverage correlates well with mutationcoverage.
The major results of our experiment are summarized asfollows: Larger datasets do not necessarily achieve higher testcoverage than smaller datasets. The sizes of the datasetsthat are applicable to each algorithm range from as fewas 14 instances to as many as 20,000 instances.However, almost all of these datasets achieved similarbranch coverage. In some cases, very small datasetsachieved higher coverage than very large datasets. Forexample, for algorithm Apriori, the weather.nominaldataset has only 14 instances, but it achieved highercoverage than the mushroom dataset, which has 8124instances. This suggests that the size of a dataset is not adominating factor in deciding test coverage. Otherfactors, e.g., structure of a dataset, and relationshipbetween different instances, might play a moresignificant role. Test coverage of CT test set increases progressivelyslower with respect to increase of test strength. In ourexperiment, test coverage increases more significantlywhen test strength increases from 1-way to 3-way. After4-way testing, higher strength test sets no longer providesignificant coverage improvement. This result isconsistent with the results of other empirical studies thatapply CT to general software applications. Branch coverage correlates well with mutationcoverage. The results of our experiment suggest that ingeneral, branch coverage correlates with mutationcoverage. In particular, higher branch coverage oftenimplies higher mutation coverage. This suggests thatbranch coverage could be used as a good indicator offault detection effectiveness for data mining algorithms,since mutation coverage is expensive to measure.
RELATED WORKWe first review previous work on applying CT to differenttypes of software. Lei et al. developed a t-way testingstrategy for testing concurrent programs. Simos et al. andBozic et al. applied CT to perform security testing of webapplications . Li et al. applied CT to test three real-lifeindustrial software systems that include an embedded system,a graphical operating system and a database managementsystem. Dhadyalla et al. applied CT to test automotivecontrol software embedded in a hybrid electric vehicle . Liet al. applied CT to ETL applications. Note that ETL is aspecial type of big data applications. However, the work in focuses on data transformation and management aspects,whereas our work focuses on algorithmic aspects. Theseexisting works show that CT can be effectively applied todifferent domains. However, to our knowledge, our work isthe first one that applies CT to data mining algorithms.Second, we review existing work related to evaluating theeffectiveness of CT. Khun et al. investigated the faultdetection effectiveness of t-way testing. Kuhn et al.  reporta study that applies CT and random testing to detect deadlocksin a network simulator. Bell and Vouk discussed theeffectiveness of pairwise testing and random testing to anetwork-centric software . A number of studies have beenreported that compares the effectiveness of CT and randomtesting There are also studies thatinvestigate the code coverage effectiveness of t-way testing. Our work presented in this paper is the first effort toevaluate the effectiveness of CT to data mining algorithms.Third, we review previous work related to testing datamining applications. Jeske et al.  developed a platform togenerate realistic, synthetic data to test data mining tools. Datamining tools were evaluated in terms of their false positiveand false negative error rates when executed with the syntheticdata. Murphy et al. discussed how to identify metamorphicproperties for performing metamorphic testing of data miningalgorithms. Metamorphic testing is one approach toaddressing the test oracle problem. Murphy et al. discussed approaches to test machine-learning applicationsthat implement ranking algorithms. Our work is different inthat we apply CT to test data mining algorithms.
CONCLUSION AND FUTURE WORK
In this paper, we reported an experiment that applied CT tofive data mining algorithms implemented in the WEKA tool.This is part of a larger effort that is aimed to develop effectiveCT-based methods for testing big data applications. Theexperiment allows us to obtain some initial understandingsabout the effectiveness of CT on data mining algorithms. Inparticular, the results of our experiment indicate that datamining algorithms behave in a way that is similar to generalsoftware. This suggests that CT has the potential to beeffectively applied to data mining algorithms.We plan to continue our work in the following threedirections. First, we will perform detailed code analysis tobetter understand the results of our experiment. In particular,we want to investigate why some branches were executed bynone of our test sets, and whether these branches could beexecuted by using different configuration options and/ordatasets. Second, in our experiment, we only applied CT toconfiguration options. We plan to investigate how to apply CTto create representative datasets. The key challenge is toidentify the characteristics of a dataset that could significantlyimpact the execution of the underlying algorithm. We canmodel these characteristics as abstract parameters, and thenapply CT to these parameters to create representative datasets.Third, negative testing alone has shown great importance inachieving good coverage, we will perform furtherinvestigation and experiments on how we can better usenegative testing to improve the coverage of CT.
 W. A. Ballance , S. Vilkomir, and W. Jenkins. Effectiveness of pairwisetesting for software with boolean inputs. Proceedings of the IEEEFifth International Conference in Software Testing, Verification andValidation (ICST), 580-586, 2012.
 K.Z. Bell, and M.A. Vouk. On effectiveness of pairwise methodologyfor testing network-centric software. Proceedings of the 3rdInternational Conference in Information and CommunicationsTechnology for Enabling Technologies for the New Knowledge Society,221-235, 2005.
 C. Binnig, D. Kossmann, and E. Lo. Testing database applications.Proceedings of the 2006 ACM SIGMOD international conference onManagement of data, 739-741, 2006.
 M.N. Borazjany, L.S. Ghandehari, Y. Lei, R. Kacker and R. Kuhn. Aninput space modeling methodology for combinatorial testing.Proceedings of the Sixth International Conference on Software Testing,Verification and Validation Workshops (ICSTW), 372-381, 2013.
 M.N. Borazjany, L. Yu, Y. Lei, R. Kacker and R. Kuhn. Combinatorialtesting of ACTS: A case study. Proceedings of the Software Testing,Verification and Validation (ICST), 591-600, 2012.
 J. Bozic, B. Garn, I. Kapsalis, D. Simos, S. Winkler, and F. Wotawa.Attack pattern-based combinatorial testing with constraints for websecurity testing. Proceedings of the IEEE International Conference InSoftware Quality, Reliability and Security (QRS), 207-212, 2015.
 R.C. Bryce, A. Rajan, and M.P. Heimdahl. Interaction testing in modelbaseddevelopment: Effect on model-coverage. Proceedings of the 13thAsia Pacific Software Engineering Conference (APSEC), 259-268,2008.
 D. Chays, S. Dan, P.G. Frankl, F.I. Vokolos and E.J. Weyuker. Aframework for testing database applications. Proceedings of the ACMSIGSOFT Software Engineering Notes, 25(5), 147-157, 2000.
 D. Chays, Y. Deng, P.G. Frankl, S. Dan, F.I. Vokolos and E.J. Weyuker.An AGENDA for testing relational database applications. Proceedingsof the Software Testing, verification and reliability, 17-44, 2004.
 M.Y. Chan, and S.C. Cheung. Testing Database Applications with SQLSemantics. In CODAS, 363-374, 1999.