Privacy-Preserving Utility Verification of the DataPublished by Non-interactive DifferentiallyPrivate Mechanisms

ABSTRACT:

In the problem of privacy-preserving collaborative data publishing(PPCDP), a central data publisher is responsible for aggregatingsensitive data from multiple parties and then anonymizing it beforepublishing for data mining. In such scenarios, the data users may havea strong demand to measure the utility of the published data since mostanonymization techniques have side effects on data utility. Nevertheless,this task is non-trivial because the utility measuring usually requires theaggregated raw data, which is not revealed to the data users due toprivacy concerns. What’s worse, the data publishers may even cheat inthe raw data since no one including the individual providers knows thefull dataset.

In this paper, we first propose a privacy-preserving utility verificationmechanism based upon cryptographic technique for DiffPart– a differentiallyprivate scheme designed for set-valued data. This proposal canmeasure the data utility based upon the encrypted frequencies of theaggregated raw data instead of the plain values, which thus preventsprivacy breach. Moreover, it is enabled to privately check the correctnessof the encrypted frequencies provided by the publisher, which helpsdetect dishonest publishers. We also extend this mechanism to DiffGen– another differentially private publishing scheme designed for relationaldata. Our theoretical and experimental evaluations demonstrate thesecurity and efficiency of the proposed mechanism.

EXISTING SYSTEM:

A lot of privacy modelsand corresponding anonymization mechanisms have beenproposed in the literature such as k-anonymityanddifferential privacy.

k-anonymity and its variants(e.g. l-diversity and t-closeness protect privacyby generalizing the records such that they cannot be distinguishedfrom some other records. Differential privacy isa much more rigorous privacy model. It requires that thereleased data is insensitive to the addition or removal of asingle record.

DISADVANTAGES OF EXISTING SYSTEM:

All these data anonymization mechanisms have serious side effects on the data utility. As a result, the users of the published data usually have a strong demand to verify the real utility of the anonymized data

This task is extremely challenging because utility computing usually requires to know the raw data, which, however, should be concealed from the verifier due to privacy concerns.

In some cases, the data publishers may even cheat in this process for various reasons.

PROPOSED SYSTEM:

We first propose a privacy-preserving utility verification mechanism for DiffPart, a differentially private anonymization algorithm designed for set-valued data.

DiffPartperturbs the frequencies of the records based on a context-free taxonomy tree and no items in the original data are generalized.

Our proposal solves the challenge to verify the utility of the published data based on the encrypted frequencies of the original data records instead of their plain values. As a result, it can protect the original data from the verifying parties (i.e., the data users) because they cannot learn whether or how many times a specific record appears in the raw dataset without knowing its real frequency. In addition, since the encrypted frequencies are provided by the publisher, we also present a scheme for the verifying parties to incrementally verify its correctness.

We then extend the above mechanism to DiffGen, a differentially private anonymization algorithm designed for relational data. Different from DiffPart, DiffGenmay generalize the attribute values before perturbing the frequency of each record. Information losses are caused by both the generalization and the perturbation. These two kinds of information losses are measured separately by distinct utility metrics.We take both into consideration.

Our analysis shows that the utility verification for generalization operations can be carried out with only the published data. As a result, this verification does not need any protection. The utility metric for the perturbation is similar with that for DiffPart.We thus adapt the proposed privacy-preserving mechanism to this verification.

We conduct a series of experiments upon the realworld set-valued data and relational data to evaluate the efficiency of the proposed mechanisms. The results show that these mechanisms are efficient enough provided that both the data publishing and utility verification are offline.

ADVANTAGES OF PROPOSED SYSTEM:

Our theoretical analysis demonstrates the correctness and the security of the proposed mechanism.

We consider the problem of directly computing the utility of final data published via differential privacy in a horizontally distributed context.

SYSTEM ARCHITECTURE:

SYSTEM REQUIREMENTS:

HARDWARE REQUIREMENTS:

System: Pentium Dual Core.

Hard Disk : 120 GB.

Monitor: 15’’LED

Input Devices: Keyboard, Mouse

Ram: 1GB.

SOFTWARE REQUIREMENTS:

Operating system :Windows 7.

Coding Language:JAVA/J2EE

Tool:Netbeans 7.2.1

Database:MYSQL

REFERENCE:

Jingyu Hua, An Tang, Yixin Fang, Zhenyu Shen, and Sheng Zhong, “Privacy-Preserving Utility Verification of the DataPublished by Non-interactive DifferentiallyPrivate Mechanisms”, IEEE Transactions on Information Forensics and Security, 2016.