DATA-DRIVEN ANSWER SELECTION IN COMMUNITYQA SYSTEMS

 

ABSTRACT

Finding similar questions from historical archives has been applied to question answering, with well theoreticalunderpinnings and great practical success. Nevertheless, each question in the returned candidate pool often associates withmultiple answers, and hence users have to painstakingly browse a lot before finding the correct one. To alleviate such problem,we present a novel scheme to rank answer candidates via pairwise comparisons. In particular, it consists of one offline learningcomponent and one online search component. In the offline learning component, we first automatically establish the positive,negative, and neutral training samples in terms of preference pairs guided by our data-driven observations. We then present anovel model to jointly incorporate these three types of training samples. The closed-form solution of this model is derived. In theonline search component, we first collect a pool of answer candidates for the given question via finding its similar questions. Wethen sort the answer candidates by leveraging the offline trained model to judge the preference orders. Extensive experimentson the real-world vertical and general community-based question answering datasets have comparatively demonstrated itsrobustness and promising performance. Also, we have released the codes and data to facilitate other researchers.

PROPOSED SYETM:

inspired by our user studies and observations,we present a novel approach to constructing thepositive, neutral, and negative training samplesin terms of preference pairs. This greatly savesthe time-consuming and labor-intensive labelingprocess.• We propose a pairwise learning to rank model foranswer selection in cQA systems. It seamlesslyintegrates hinge loss, regularization, and anadditive term within a unified framework.Different from the traditional pairwise learningto rank models, ours incorporates the neutraltraining samples and learns the discriminativefeatures. In addition, we have derived its closedformsolution by equivalently reformulatingthe objective function into a smoothed anddifferentiable one.• We have released the codes and datasets tofacilitate other researchers to repeat our work andverify their ideas7.

EXISTING SYSTEM:

Conventional techniques for filtering answersprimarily focus on generating complementaryfeatures relying on the highly structured cQA sites.Jeon et al.  extracted a set of non-textual featurescovering the contextual information of QA pairs,and proposed a language model for processing thesefeatures in order to predict the quality of answerscollected from a specific cQA service. Two years later,Liu et al.  found powerful features includingstructural, textual, and community features, andleveraged the traditional shallow learning methodsto combine these heterogeneous features.et al.  developed a hierarchical framework toidentify the predictive factors for obtaining a highqualityanswer based on textual and non-textualfeatures. Beyond textual features, Nie et al. explored a set of features extracted from mediaentities, such as color, shape and bag-of-visualwords.Following them, Ding et al.  introduceda general classification framework to combine theevidence from different views, including the graphbasedrelationship, content, and usage-based features.In recent years, the authors in described theirsystem for SemEval-2015 Task 3: Answer Selectionin cQA.

CONCLUSION AND FUTURE WORK

In this work, we present a novel scheme foranswer selection in cQA settings. It comprises of anoffline learning and an online search component.In the offline learning component, instead of timeconsumingand labor-intensive annotation, weautomatically construct the positive, neutral, andnegative training samples in the forms of preferencepairs guided by our data-driven observations. Wethen propose a robust pairwise learning to rankmodel to incorporate these three types of trainingwsamples. In the online search component, for agiven question, we first collect a pool of answercandidates via finding its similar questions. Wethen employ the offline learned model to rank theanswer candidates via pairwise comparison. Wehave conducted extensive experiments to justify theeffectiveness of our model on one general cQA datasetand one vertical cQA dataset. We can conclude thefollowing points: 1) our model can achieve betterperformance than several state-of-the-art answerselection baselines; 2) our model is non-sensitive toits parameters; 3) our model is robust to the noisescaused by enlarging the number of returned similarquestions; 4) the pairwise learning to rank modelsincluding our proposed PLANE are very sensitive tothe error training samples.Beyond the traditional pairwise learning torank models, our model is able to incorporatethe neutral training samples and select thediscriminative features. It, however, also has theinherent disadvantages of the pairwise learningto rank family, such as noise-sensitive, large-scalepreference pairs, and loss of information about thefiner granularity in the relevance judgment. In thefuture, we plan to address such disadvantages in thefield of cQA.

REFERENCES

[1] M. A. M. L. Wei Ding, He Jiang, Modern Advances in IntelligentSystems and Tools, ser. SCI. Springer, 2012, vol. 431. 1, 3

[2] L. Nie, M. Wang, L. Zhang, S. Yan, B. Zhang, and T. S. Chua,“Disease inference from health-related questions via sparsedeep learning,” TKDE, vol. 27, no. 8, pp. 2107–2119, 2015. 1

[3] A. Shtok, G. Dror, Y. Maarek, and I. Szpektor, “Learning fromthe past: Answering new questions with past answers,” inProceedings WWW’12. ACM, 2012, pp. 759–768. 1, 3

[4] E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne,“Finding high-quality content in social media,” in Proceedingsof WSDM’08. ACM, 2008, pp. 183–194. 2

[5] J. Jeon, W. B. Croft, J. H. Lee, and S. Park, “A framework topredict the quality of answers with non-textual features,” inProceedings of SIGIR’06. ACM, 2006, pp. 228–235. 2

[6] Z. Ji and B. Wang, “Learning to rank for question routing incommunity question answering,” in Proceedings of CIKM’13.ACM, 2013, pp. 2363–2368. 2

[7] T. C. Zhou, M. R. Lyu, and I. King, “A classificationbasedapproach to question routing in community questionanswering,” in Proceedings of WWW’12. ACM, 2012, pp. 783–790. 2

[8] L. Yang, M. Qiu, S. Gottipati, F. Zhu, J. Jiang, H. Sun, andZ. Chen, “Cqarank: Jointly model topics and expertise incommunity question answering,” in Proceedings of CIKM’13.ACM, 2013, pp. 99–108. 2

[9] B. Li and I. King, “Routing questions to appropriate answerersin community question answering services,” in Proceedings ofCIKM’10. ACM, 2010, pp. 1585–1588. 2

[10] K. Wang, Z. Ming, and T.-S. Chua, “A syntactic tree matchingapproach to finding similar questions in community-based qaservices,” in Proceedings of SIGIR’09. ACM, 2009, pp. 187–194.