DARM: A Privacy-preserving Approach for Distributed Association - - PowerPoint PPT Presentation

darm a privacy preserving approach for distributed
SMART_READER_LITE
LIVE PREVIEW

DARM: A Privacy-preserving Approach for Distributed Association - - PowerPoint PPT Presentation

DARM: A Privacy-preserving Approach for Distributed Association Rules Mining on Horizontally-partitioned Data Presenter: Gaby Dagher Omar Abdel Wahab, Concordia University Moulay Omar Hachami, Concordia University Arslan Zaffari, Concordia


slide-1
SLIDE 1

Omar Abdel Wahab, Concordia University Moulay Omar Hachami, Concordia University Arslan Zaffari, Concordia University MeryVivas, Concordia University Gaby G. Dagher, Concordia University

DARM: A Privacy-preserving Approach for Distributed Association Rules Mining

  • n Horizontally-partitioned Data

Presenter: Gaby Dagher

1

slide-2
SLIDE 2

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

2

Conclusions Proposed Solution 4

slide-3
SLIDE 3

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

3

Conclusions Proposed Solution 4

slide-4
SLIDE 4

Introduction

Motivation:  Rapid evolution of data collection and storage technologies  Extracting knowledge and hidden patterns from stored data has become a major necessity for individuals, companies, and government agencies.  Applying data mining techniques to extract information is considered a challenge when the data is distributed over multiple owners

  • Each data owner is concerned about the privacy of individuals in his data.

4

slide-5
SLIDE 5

Introduction

Motivating Scenario

5

slide-6
SLIDE 6

Introduction

Challenges:  Data Privacy

  • One data provider should not learn sensitive information about the data of
  • ther providers.

 Data Utility

  • The generated rules should satisfy the data consumer’s request and needs.

 Protection against Inference Attacks

  • Prevent the data consumer from inferring sensitive information about the

individuals involved in the database.

6

slide-7
SLIDE 7

Introduction

Contributions Contribution #1: Propose a comprehensive privacy-preserving approach for answering association rules queries in a distributed environment Contribution #2: Protect all providers against inference attacks from data consumers by guaranteeing that the returned association rules satisfy ε-differential privacy. Contribution #3: Preserve the privacy of the mined data by preventing each data provider from learning sensitive information about other data providers during the mining process. Contribution #4: Protect the confidentiality of the data consumer’s query against the data providers. Contribution #5: We conduct performance evaluation on real-life data, and show that that our approach is both scalable and efficient.

7

slide-8
SLIDE 8

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

8

Conclusions Proposed Solution 4

slide-9
SLIDE 9

Literature Review

9

Association Rules Mining [1], [2], [3], [4], [5], [6], [7]: Summary: Study the problem of mining association rules in distributed and parallel manners, where the data is partitioned across several nodes. Limitations: these approaches were mostly interested in increasing the efficiency of the mining process, while ignoring the privacy concerns that may arise from building a global mining model.

slide-10
SLIDE 10

Literature Review

10 10

Privacy in Distributed Mining Models [8], [9], [10], [11], [12]: Summary: Consider the privacy concerns that may arise from mining the data globally. Limitations: rely on encryption to achieve privacy between data providers. However, a recent study shows that most encryption schemes are insufficient to guarantee data privacy and confidentiality, as the protocol on which they are based, namely precise query protocol (PQP), is vulnerable to attribute values inference.

slide-11
SLIDE 11

Literature Review

11 11

Privacy-preserving Data Mashup [13], [14], [15], [16]: Summary: Preserve the privacy of the data in a data mashup scenario. Limitations: In contrary to our model which considers privacy-preserving data mining (PPDM), these approaches are designed to support privacy-preserving data publishing (PPDP) since they assume that the data itself will be shared among the different parties.

slide-12
SLIDE 12

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

12 12

Conclusions Proposed Solution 4

slide-13
SLIDE 13

Problem Definition

13 13

System Inputs:

(1) Association Rules Queries: To obtain the set of strong association rules R from the distributed data, the data consumer submits a query request q to the master miner in which he specifies the minimum support threshold γ, the minimum confidence threshold α, and a set of predicates P. (2) ε-differentially Private Data: We assume that the data is horizontally partitioned into sub- tables each of which is hosted by one data provider.

  • Each data provider owns the same type of attribute information on different set of

individuals.

slide-14
SLIDE 14

Problem Definition

14 14

  • Adversary Model

Semi-honest, where each party is expected to follow the protocol correctly; however, it is curious and might try to infer sensitive information about the other parties.

  • Problem Statement

Given relational data D that is horizontally partitioned into n partitions, the objective is to design a privacy-preserving model for answering association rules queries in a distributed environment. The model must achieve three objectives: (1) to prevent each data provider from learning sensitive information about other data providers during the mining process, (2) to protect all providers against inference attacks from the data consumers, and (3) to preserve the confidentiality of each data consumer’s query against the data providers.

slide-15
SLIDE 15

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

15 15

Conclusions Proposed Solution 4

slide-16
SLIDE 16

Proposed Solution

  • Step 1 - Data Anonymization
  • Step 2 - Frequent Itemsets Generation
  • Step 3 - Association Rules Generation

16 16

slide-17
SLIDE 17

Proposed Solution

Step1: Data Anonymization:  In this step, the data providers use the ε-differential privacy algorithm, called DiffGen, to anonymize their data and provide protection against linkage and inference attacks.  Using DiffGen, the data owner makes sure that the regenerated data table provides privacy guarantee while being insensitive to any specific record.  The data anonymization process can be divided into three main parts: (1) Selecting a candidate attribute for specialization (2) Determining the split value parameter (3) Publishing the noisy counts

17 17

slide-18
SLIDE 18

Proposed Solution

18 18

slide-19
SLIDE 19

Proposed Solution

Step 2: Frequent Itemsets Generation:  The master miner receives the data consumer’s query  The master miner requests the support counts of all the attributes the data consumer is interested in from the different data providers  The master miner generates all the possible frequent itemsets of different lengths subject to the minimum support threshold γ specified in the query.

19 19

slide-20
SLIDE 20

Proposed Solution

20 20

slide-21
SLIDE 21

Proposed Solution

Step 3 - Association Rules Generation:  Now that the frequent itemsets are known, the master miner generates all the possible combinations of the k-length (k > 1) frequent itemsets that may constitute association rules.  The master miner then sends these combinations to the data providers which separately calculate and send back the support counts of these combinations  The master miner computes the confidence of each association rule based on the feedback from the data providers.

  • For each association rule, if its confidence exceeds the minimum confidence threshold α

specified by the data consumer, then the rule is considered a useful rule.  Finally, the master miner returns to the data consumer the set of all useful association rules.

21 21

slide-22
SLIDE 22

Proposed Solution

22 22

slide-23
SLIDE 23

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

23 23

Conclusions Proposed Solution 4

slide-24
SLIDE 24

Performance Evaluation

Efficiency

24 24

slide-25
SLIDE 25

Performance Evaluation

Scalability

25 25

slide-26
SLIDE 26

Performance Evaluation

Efficiency w.r.t. nSpecializations

26 26

slide-27
SLIDE 27

Outline

Introduction Problem Definition Literature Review

Performance Evaluation

2 6 1 3 5

27 27

Conclusions Proposed Solution 4

slide-28
SLIDE 28

Conclusions

28 28

 In this paper, we propose a comprehensive privacy-preserving approach for answering association rules queries in a distributed environment, with the goal of preserving both data privacy and query confidentiality.  The proposed approach (1) protects all providers against inference attacks from data consumers by guaranteeing that the returned association rules to the data consumer satisfy ε-differential privacy, (2) preserves the privacy of the mined data by preventing each data provider from learning sensitive information about other data providers during the mining process, and (3) protects the confidentiality of the data consumer’s query against the data providers such that the master miner is able to mine the association rules without revealing the query to the data providers.

slide-29
SLIDE 29

References (1)

  • 1. R. Agrawal and J. Shafer, “Parallel mining of association rules,” IEEE Trans. Knowl. Data Eng., vol. 8, no. 6, pp. 962-

969, 1996.

  • 2. D. W.-L. Cheung, J. Han, V. T. Y. Ng, A. W.-C. Fu, and Y. Fu, “A fast distributed algorithm for mining association

rules,” in Proceedings of the fourth international conference on on Parallel and distributed information systems, 1996, pp. 31-43.

  • 3. D. W. Cheung, V. T. Ng, and a. W. Fu, “Efficient mining of association rules in distributed databases,” IEEE Trans.
  • Knowl. Data Eng., vol. 8, no. 6, pp. 911-922, 1996.
  • 4. J. Park, M. Chen, and P. Yu, “Efficient parallel data mining for association rules,” in Proceedings of the fourth

international conference on Information and knowledge management, 1995, pp. 31-36.

  • 5. M. Z. Ashrafi, D. Taniar, and K. Smith, “ODAM: an optimized distributed association rule mining algorithm,” IEEE
  • Distrib. Syst. Online, vol. 5, no. 3, pp. 1-18, Mar. 2004.
  • 6. A. Anitha and G. R. Suhanantham, “An Efficient Association Rule Mining Model for Distributed Databases,” Int. J.
  • Comput. Sci. Technol., vol. 3, no. 1, pp. 794-797, 2012.

7.

  • J. A. Renjit, “Mning the Data from Distributed Database using an Improved MningAlgorithm,” Int. J. Comput. Sci.
  • Inf. Secur., vol. 7, no. 3, pp. 116-121, 2010.
  • 8. N. Zhang, M. Li, andW. Lou, “Distributed Data Mining with Differential Privacy,” in 2011 IEEE International

Conference on Communications (ICC), 2011, pp. 1-5.

slide-30
SLIDE 30

References (2)

  • 9. C. Clifton and M. Kantarcioglu, “Privacy-preserving Distributed Mining of Association Rules on Horizontally

Partitioned Data,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 9, pp. 1026-1037, 2004.

  • 10. J. Vaidya and C. Clifton, “Privacy preserving association rule mining in vertically partitioned data,” in The Eighth

ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 639- 644.

  • 11. W. Wong, D. Cheung, and E. Hung, “Security in outsourcing of association rule mining,” in Proceedings of the 33rd

international conference on Very large data bases, 2007, pp. 111-122.

  • 12. F. Giannotti, L. V. S. Lakshmanan, A. Monreale, D. Pedreschi, and H. Wang, “Privacy-Preserving Mining of

Association Rules From Outsourced Transaction Databases,” IEEE Syst. J., vol. 7, no. 3, pp. 385-395, Sep. 2013.

  • 13. T. Trojer, B. C. M. Fung, and P. C. K. Hung, “Service-Oriented Architecture for Privacy-Preserving Data Mashup,” in

IEEE International Conference on Web Services, 2009, pp. 767-774.

  • 14. P. Gurunathan, N. Ishwarya, V. Sridevi, C. Nandhini, and S. Deepalakshmi, “High-Dimensional Confidential Data

Mash up using Service-Oriented Architecture,” Int. J. Emerg. Sci. Eng., vol. 1, no. 6, pp. 48-51, 2013.

  • 15. S. A. Chun, J. Warner, and A. D. Keromytis, “Privacy policy-driven mashups,” Int. J. Bus. Contin. Risk Manag., vol. 4,
  • no. 4, pp. 344-370, 2013.
  • 16. N. Mohammed, B. C. M. Fung, K. Wang, and P. C. K. Hung, “Privacy-preserving data mashup,” in Proceedings of the

12th International Conference on Extending Database Technology Advances in Database Technology - EDBT 09, 2009,

  • pp. 228-239.
slide-31
SLIDE 31

Thank You…