Privacy-Aware Machine Learning Systems Borja Balle Data is the New - PowerPoint PPT Presentation

Privacy-Aware Machine Learning Systems Borja Balle

Data is the New Oil The Economist, May 2017

The Importance of (Data) Privacy 4.5.2016 Official Journal of the European Union L 119/1 EN Universal declaration of human rights #DeleteFacebook I Article 12 . No one shall be subjected to arbitrary interference with his privacy , family, home or (Legislative acts) correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks. REGULATIONS REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) (Text with EEA relevance) THE EUROPEAN PARLIAMENT AND THE COUNCIL OF THE EUROPEAN UNION, Having regard to the Treaty on the Functioning of the European Union, and in particular Article 16 thereof, Having regard to the proposal from the European Commission, After transmission of the draft legislative act to the national parliaments, Having regard to the opinion of the European Economic and Social Committee ( 1 ), Having regard to the opinion of the Committee of the Regions ( 2 ), Acting in accordance with the ordinary legislative procedure ( 3 ), Whereas: (1) The protection of natural persons in relation to the processing of personal data is a fundamental right. Article 8(1) of the Charter of Fundamental Rights of the European Union (the ‘Charter’) and Article 16(1) of the Treaty on the Functioning of the European Union (TFEU) provide that everyone has the right to the protection of personal data concerning him or her. (2) The principles of, and rules on the protection of natural persons with regard to the processing of their personal data should, whatever their nationality or residence, respect their fundamental rights and freedoms, in particular their right to the protection of personal data. This Regulation is intended to contribute to the accomplishment of an area of freedom, security and justice and of an economic union, to economic and social progress, to the strengthening and the convergence of the economies within the internal market, and to the well-being of natural persons. (3) Directive 95/46/EC of the European Parliament and of the Council ( 4 ) seeks to harmonise the protection of fundamental rights and freedoms of natural persons in respect of processing activities and to ensure the free flow of personal data between Member States. ( 1 ) OJ C 229, 31.7.2012, p. 90. ( 2 ) OJ C 391, 18.12.2012, p. 127. ( 3 ) Position of the European Parliament of 12 March 2014 (not yet published in the Official Journal) and position of the Council at first reading of 8 April 2016 (not yet published in the Official Journal). Position of the European Parliament of 14 April 2016. ( 4 ) Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data (OJ L 281, 23.11.1995, p. 31).

Anonymization Fiascos Vijay Pandurangan. tech.vijayp.ca , 2014 “Robust De-anonymization of Large Datasets (How to Break Anonymity of the Netflix Prize Dataset)” A. Narayanan & V. Shmatikov. Security and Privacy, 2008 “Only You, Your Doctor, and Many Others May Know” L. Sweeney. Technology Science, 2015

Privacy Risks in Machine Learning The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets Membership Inference Attacks Against Machine Learning Models Nicholas Carlini Chang Liu University of California, Berkeley University of California, Berkeley Reza Shokri Marco Stronati ∗ Congzheng Song Vitaly Shmatikov ´ Jernej Kos Ulfar Erlingsson Dawn Song Cornell Tech INRIA Cornell Cornell Tech National University of Singapore Google Brain University of California, Berkeley This paper presents exposure , a simple-to-compute Abstract —We quantitatively investigate how machine learning metric that can be applied to any deep learning model models leak information about the individual data records on for measuring the memorization of secrets. Using this which they were trained. We focus on the basic membership metric, we show how to extract those secrets efficiently inference attack: given a data record and black-box access to using black-box API access. Further, we show that un- a model, determine if the record was in the model’s training intended memorization occurs early, is not due to over- dataset. To perform membership inference against a target model, fitting, and is a persistent issue across different types of we make adversarial use of machine learning and train our own inference model to recognize differences in the target model’s models, hyperparameters, and training strategies. We ex- periment with both real-world models (e.g., a state-of- predictions on the inputs that it trained on versus the inputs that it did not train on. the-art translation model) and datasets (e.g., the Enron email dataset, which contains users’ credit card numbers) We empirically evaluate our inference techniques on classification models trained by commercial “machine learning as a to demonstrate both the utility of measuring exposure service” providers such as Google and Amazon. Using realistic and the ability to extract secrets. datasets and classification tasks, including a hospital discharge Finally, we consider many defenses, finding some in- dataset whose membership is sensitive from the privacy perspec- effective (like regularization), and others to lack guaran- tive, we show that these models can be vulnerable to membership tees. However, by instantiating our own differentially- inference attacks. We then investigate the factors that influence private recurrent model, we validate that by appropri- this leakage and evaluate mitigation strategies. ately investing in the use of state-of-the-art techniques, the problem can be resolved, with high utility. Security and Privacy, 2017 ArXiv, 2018

What Makes Privacy Difficult? High-dimensional data Side information

Privacy Enhancing Technologies (PETS) • Initially a sub-field of applied cryptography – Now percolating into databases, machine learning, statistics, etc. • Privacy-preserving release (eg. differential privacy) – Release statistics/models/datasets while preventing reverse-engineering of the original data • Privacy-preserving computation (eg. secure multi-party computation) – Perform computations on multi-party data without ever exchanging the inputs in plaintext

Privacy-Preserving Release Trusted Curator Privacy Barrier

Differential Privacy: Informal Definition Bart or Milhouse? Randomized Data ? Analysis Algorithm

Differential Privacy [DMNS’06; Godel Prize 2017] A randomized algorithm ! ∶ # $ → & satisfies differential privacy with parameter ' if for any pair of datasets ( and (’ differing in a single row and for any possible output * , the following inequality is satisfied: ℙ ! ( = * ≤ . / ℙ ! (′ = * ... approximate differential privacy with ' parameters ( ', 2 ) ... set of outputs E ... 2 ℙ ! ( ∈ 4 ≤ . / ℙ ! (′ ∈ 4 + 2

Fundamental Properties of Differential Privacy • Compositionality – Enables rigorous engineering through modularity • Quantifiable – Amenable to mathematical analysis, continuous instead of black-or-white • Robust to side knowledge – Protects even in the event of collusions and side information

Multi-Party Data Analysis Medical Data Census Data Financial Data Treatment Outcome Attr. 1 Attr. 2 … Attr. 4 Attr. 5 … Attr. 7 Attr. 8 … -1.0 0 54.3 … North 34 … 5 1 … 1.5 1 0.6 … South 12 … 10 0 … -0.3 1 16.0 … East 56 … 2 0 … 0.7 0 35.0 … Centre 67 … 15 1 … 3.1 1 20.2 … West 29 … 7 1 …

The Trusted Party “Solution” The Trusted Party assumption: (secure channel) • Introduces a single point of failure (with disastrous consequences) • Relies on weak incentives (especially when private data is valuable) (secure channel) • Requires agreement between all data providers => Useful but unrealistic. Maybe can be simulated ? (secure channel) Trusted Party Receives plain-text data, runs algorithm, returns result to parties

Secure Multi-Party Computation (MPC) Public: f ( x 1 , x 2 , . . . , x p ) = y Private: x i (party i) Compute f in a way that each party Goal: learns y (and nothing else!) Oblivious Transfers (OT), Garbled Circuits (GC), Tools: Homomorphic Encryption (HE), etc Honest but curious adversaries, malicious adversaries, Guarantees: computationally bounded adversaries, collusions

Challenges and Trade-offs • Protocols: out of the box vs. tailored • Threat models: semi-honest vs. malicious • Interaction: off-line vs. on-line • Trusted external parties: speed vs. privacy • Scalability: amount of data, dimensions, # parties

In This Talk… Part I: Privacy-Preserving Distributed Linear Regression on High-Dimensional Data PETS 2017, with Adria Gascon, Phillipp Schoppmann, Mariana Raykova, Jack Doerner, Samee Zahur, and David Evans Part II: Private Nearest Neighbors Classification in Federated Databases Preprint, with Adria Gascon and Phillipp Schoppmann

Privacy-Aware Machine Learning Systems Borja Balle Data is the New - PowerPoint PPT Presentation

Privacy-Aware Machine Learning Systems Borja Balle Data is the New Oil The Economist, May 2017 The Importance of (Data) Privacy 4.5.2016 Official Journal of the European Union L 119/1 EN Universal declaration of human rights

Privacy by Design Principles of Privacy-Aware Ubiquitous Systems Marc Langheinrich Privacy by

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

Differential Privacy Machine Learning Li Xiong Big Data + Machine Learning + Machine

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

$ Lesson Ten Consumer Privacy 04/09 privacy and information information privacy: privacy that

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy in Wireless Networks privacy notions and metrics; privacy in RFID systems; location

Introduction to Machine Learning Introduction to Machine Learning Introduction to Machine

Quantum Machine Learning Adam Brown, HEP-AI Quantum Computing Machine Learning Quantum

MICROSOFT AZURE MACHINE LEARNING Oscar Naim Microsoft Microsoft Azure Machine Learning What is

MACHINE LEARNING Overview 1 1 APPLIED MACHINE LEARNING 2011-2012 APPLIED MACHINE LEARNING

MACHINE LEARNING kernels 1 MACHINE LEARNING 2012 MACHINE LEARNING Kernels: Intuition How

Enhancing Privacy in Machine Learning Mathias Humbert INSA Toulouse/CNRS Toulouse, January 22,

Privacy in Machine Learning Fatemehsadat Mireshghallah ICLR2020 Privacy: A Major Concern for

Introduction to Cybersecurity Database Privacy Review: Anonymity vs. Privacy Privacy -

Automatic Collocation Extraction from Text Corpora Pavel Pecina Ustav form aln a

Cautious label-wise ranking with constraint satisfaction Sbastien Destercke, Yonatan Carlos

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech ECE 4424 / 5424G (CS

MDL L for or Pat atte tern Min inin ing Jill illes V s Vreeken 4 4 June une 2014 2014

Welcome! Graz, 2. September, 2008 Markus Strohmaier 2008 1 Knowledge Management Institute

Wissenschaftliches Arbeiten 193.052, SS 2020, 2.0h (3 ECTS) Philipp Erler

r t t

Semantic Knowledge Acquisition using Frequency Based Patterns Roy Schwartz and Ari Rappoport