DIFFERENTIALLY PRIVATE DATA PUBLISHING AND ANALYSIS: A SURVEY

 

ABSTRACT

Differential privacy is an essential and prevalent privacy model that has been widely explored in recent decades. Thissurvey provides a comprehensive and structured overview of two research directions: differentially private data publishing anddifferentially private data analysis. We compare the diverse release mechanisms of differentially private data publishing given a varietyof input data in terms of query type, the maximum number of queries, efficiency, and accuracy. We identify two basic frameworks fordifferentially private data analysis and list the typical algorithms used within each framework. The results are compared and discussedbased on output accuracy and efficiency. Further, we propose several possible directions for future research and possible applications.

EXISTING SYSTEM:

The initial work on differential privacy was pioneered byDwork et al. [3] in 2006. Over the last decade, severalsurveys on differential privacy have been completed:1) The first survey by Dwork et al.  recalled the definitionof differential privacy and two of its principlemechanisms with the aimed to of showing how to applythese techniques in data publishing.2) The report exploited the difficulties that arise whendata publishing encounters prospective solutions inthe context of statistics analysis. It identified severalresearch issues in data analysis that had not been adequatelyinvestigated at that time.3) In a review, Dwork et al. provided an overviewof the principal motivating scenarios, together with asummary of future research directions.4) Sarwate et al.  focused on privacy preserving for continuousdata to solve the problems in signal processing.5) A book by Dwork et al.  presented an accessiblestarting place for anyone looking to learn about thetheory of differential privacy.

 

 

PROPSED SYSTEM:

Here we attempt to find a clearer way to present theconcepts and practical aspects of differential privacy for thedata mining research community.• We avoid detailed theoretical analysis of related differentiallyprivate algorithms and instead place morefocus on its practical aspects which may benefit applicationsin the real world.• We try to avoid repeating many references that havealready been analyzed extensively in the above wellcitedsurveys.• Even though differential privacy covers multiple researchdirections, we restrict our observations to datapublishing and data analysis scenarios, which are themost popular scenarios in the research community.Table 1 defines the scope of these two research directionswithin the survey. Mechanism design for data publishingis normally independent from its publishing targets, asthe goals of publishing is to release query answers or adataset for further usage and, hence, is unknown to thecurator. The mechanism design for data analysis aims topreserve privacy during the analysis process. The curatoralready knows the details of the analysis algorithm, so themechanism is associated with the analysis algorithm.

CONCLUSIONS

This paper presents a multi-disciplinary survey of workon differential privacy including an overview of the hugeamount of literature in two major differential privacy researchstreams: data publishing and data analysis. We identifieddifferent publishing mechanisms for data publishingand compared various types of input and output data.In addition, we presented two basic dataset publishingmethods: anonymized and learning-based. We discussedtwo basic frameworks for data analysis and illustrated theirrespective analysis scenarios. The basic techniques in differentialprivacy look simple and intuitively appealing, andwhen combined with specific problems, differential privacydemonstrates itself as a powerful and useful tool for adiverse range of applications.Differential privacy still has much unknown potential,and the summarized literature in this paper is intended asa starting point for exploring new challenges in the future.Our goal is to provide an overview of existing works ondifferential privacy to show their use to newcomers, as wellas experienced practitioners, in various fields. We also hopethis review will help to prevent redundant or ad hoc effortsfor both researchers and industry.

 

REFERENCES

[1] C. C. Aggarwal and P. S. Yu, Eds., Privacy-Preserving Data Mining- Models and Algorithms, ser. Advances in Database Systems.Springer, 2008, vol. 34.

[2] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu, “Privacypreservingdata publishing: A survey of recent developments,”ACM Comput. Surv., vol. 42, no. 4, 2010.

[3] C. Dwork, “Differential privacy,” in ICALP, 2006, pp. 1–12.

[4] ——, “Differential privacy: a survey of results,” in TAMC’08,2008, pp. 1–19.[5] ——, “Differential privacy in new settings,” in SODA ’10.Philadelphia, PA, USA: Society for Industrial and Applied Mathematics,2010, pp. 174–183.

[6] ——, “A firm foundation for private data analysis,” Commun.ACM, vol. 54, no. 1, pp. 86–95, 2011.

[7] A. D. Sarwate and K. Chaudhuri, “Signal processing and machinelearning with differential privacy: Algorithms and challenges forcontinuous data,” IEEE Signal Processing Magazine, vol. 30, no. 5,pp. 86–94, 2013.

[8] C. Dwork and A. Roth, “The algorithmic foundations of differentialprivacy,” Found. Trends Theor. Comput. Sci., vol. 9, pp. 211–407,Aug. 2014.

[9] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor,“Our data, ourselves: Privacy via distributed noise generation,”in EUROCRYPT, 2006, pp. 486–503.

[10] A. Beimel, K. Nissim, and U. Stemmer, “Private learning andsanitization: Pure vs. approximate differential privacy,” CoRR,vol. abs/1407.2674, 201