Evaluation of Predictive Data Mining Algorithms in Soil Data Classification for Optimized Crop Recommendation
Agricultural research has strengthened the optimized economical profit, internationally and is very vast and important field to gain more benefits. However, it can be enhanced by the use of different technological resources, tool, and procedures. Today, the term data mining is an interdisciplinary process of analyzing, processing and evaluating the real-world datasets and prediction on the basis of the findings. Our case-based analysis provides empirical evidence that we can use different data mining classification algorithms to classify the dataset of agricultural regions on the basis of soil properties. Additionally, we have investigated the most performing algorithm having powerful prediction accuracy to recommend the best crop for better yield.
In this research, we intended to understand the related domain, analyzed the behavior of different data mining classification algorithms on the soil dataset and evaluating the most predictive and accurate algorithm. The dataset has been accumulated from different soil surveys that were conducted at numerous agricultural areas located in Kasur District, Punjab, Pakistan.To maintain a system that can classify the soil in adequate quantities for best practices. The primary objectives of our study are: i) To classify the soil under different agroecological zones in Kasur district, Punjab, Pakistan by different classification algorithm available in data mining. ii) To recommend the relevant crops depending on their classification. iii) To evaluate the performance of predictive algorithms for better knowledge extraction.
In this study, we have presented the research possibilities for the classification of soil by using well-known classification algorithms as J48, BF Tree, and OneR and Naïve Bayes; in data mining. The experiment was conducted on data instances from Kasur district, Pakistan. We have observed the comparative analysis of these algorithms have the different level of accuracy to determine the effectiveness and efficiency of predictions. However, the benefits of the better understanding of soils classes can improve the productivity in farming, reduce dependence on fertilizers and create better predictive rules for the recommendation of the increase in yield. In the future, we contrive to create a Soil Management and Recommendation System, which can be utilized effectively by agriculturist and laboratories for Soil Testing. This System will help to recommend a suitable fertilizer and predict for better yield.
 Goebel, M., and Gruenwald, L. (1999). A survey of data mining and knowledge discovery software tools. ACM SIGKDD explorations newsletter, 1 (1), 20-33.
 Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier.
 Kumar, A., & Kannathasan, N. (2011). A survey on data mining and pattern recognition techniques for soil data mining. IJCSI International Journal of Computer Science Issues, 8(3), 1694-0814.
 Rokach, L., & Maimon, O. (2008). Data mining with decision trees: theory and applications
 Wahbeh, A. H., Al-Radaideh, Q. A., Al-Kabi, M. N., & Al- Shawakfa, E. M. (2011). A comparison study between data mining tools over some classification methods. International Journal of Advanced Computer Science and Applications, 8(2), 18-26.
 Heß, A., Dopichaj, P., & Maaß, C. (2008). Multi-value classification of very short texts. KI 2008: Advances in Artificial Intelligence, 70-77.
 Zhou, S., Ling, T. W., Guan, J., Hu, J., & Zhou, A. (2003, March). Fast text classification: a training-corpus pruning based approach. In Database Systems for Advanced Applications, 2003.(DASFAA 2003). Proceedings. Eighth International Conference on (pp. 127-136). IEEE.
 Li, Y., & Bontcheva, K. (2008). Adapting support vector machines for f-term-based classification of patents. ACM Transactions on Asian Language Information Processing (TALIP), 7(2), 7.
 Eiben, A. E., Raue, P. E., & Ruttkay, Z. (1994, October). Genetic algorithms with multi-parent recombination. In International Conference on Parallel Problem Solving from Nature (pp. 78-87). Springer, Berlin, Heidelberg.
 Tubiello, F. N., Salvatore, M., Cóndor Golec, R. D., Ferrara, A., Rossi, S., Biancalani, R., … & Flammini, A. (2014). Agriculture, forestry and other land use emissions by sources and removals by sinks. Rome, Italy..
 Agriculture Statistics of Pakistan, Pakistan Bureau of Statistical, Retrieved 10 September, 2016 by http://www.pbs.gov.pk/content/agriculture-statistics
 Doran, J. W., & Parkin, T. B. (1994). Defining and assessing soil quality. Defining soil quality for a sustainable environment, (definingsoilqua), 1-21.
 Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases. AI magazine, 17(3), 37.
 Crone, S. F., Lessmann, S., & Stahlbock, R. (2006). The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing. European Journal of Operational Research, 173(3), 781-800.
 Larose, D. T. (2014). Discovering knowledge in data: an introduction to data mining. John Wiley & Sons.