Patterns and Packages in PostgreSQL for Privacy Preservation - - PowerPoint PPT Presentation

patterns and packages
SMART_READER_LITE
LIVE PREVIEW

Patterns and Packages in PostgreSQL for Privacy Preservation - - PowerPoint PPT Presentation

PostgreSQL, Planning, PostGIS, Partitioning, PaaS, Permissions and now. Patterns and Packages in PostgreSQL for Privacy Preservation mantaq10 15 November 2019, Sydney www.2019.pgdu.org Atif Rahman I was like her According to Pearson-R


slide-1
SLIDE 1

Patterns and Packages

in PostgreSQL for

Privacy Preservation

mantaq10

www.2019.pgdu.org

Atif Rahman

15 November 2019, Sydney

PostgreSQL, Planning, PostGIS, Partitioning, PaaS, Permissions and now….

slide-2
SLIDE 2

I was like her According to Pearson-R We were both outliers

  • Data Engineering
  • ML Pipelines
  • Herding Cats
slide-3
SLIDE 3

NDB Breach Notifications April 2018 – March 2019 964

mantaq10

Attack Error Others

60% 35%

OAIC Report 2019

55% 41%

Healthcare Financial

Human Error

www.2019.pgdu.org

slide-4
SLIDE 4

mantaq10

www.2019.pgdu.org

OAIC Report 2019

slide-5
SLIDE 5

mantaq10

www.2019.pgdu.org

You can have security but not necessarily privacy

slide-6
SLIDE 6

mantaq10

www.2019.pgdu.org

Security Protection Privacy Usage

ISO/IEC 29100:2011: Privacy Framework Binary Contextual

slide-7
SLIDE 7

mantaq10

www.2019.pgdu.org

Privacy Guarantees

1 2 De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage * Math) 3 Ethical Computing (Permissibility & Compliance) 4

“Homomorphic encryption schemes are often repackaging vulnerabilities (practical chosen- ciphertext attacks) as features.” – The Internet x

f(x) 𝐺"# 𝐺 Loss-less Functions vs Lossy Functions PII and Attribute Augmentation

slide-8
SLIDE 8

mantaq10

www.2019.pgdu.org

Record Linkage

"87% of the U.S. population is uniquely identified by date of birth, gender, postal code.” Latanya Sweeney (k-anonymity) “Decreasing the precision of the data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility”. Chris Culnane, Benjamin Rubinstein, Vanessa Teague @UniMelb

slide-9
SLIDE 9

mantaq10

www.2019.pgdu.org

Privacy vs Utility Trade-off

SM: Secure Multiparty Computing

SM

DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation

DP HE AN

Privacy Guarantee Better Utility Bleeding Edge Cutting Edge Established

slide-10
SLIDE 10

mantaq10

www.2019.pgdu.org

  • 1. AN: (Pseudo)Anonymisation

REPLACEMENT

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 SUPRESSION PERTURBATION GENERALISATION REPLACEMENT

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 MIKE OBAMA 13-07-1982 JB Vet 63456 12 112 BRUCE LEE 19-11-1991 FBI 54367 45

(reversible or random) (PG String Functions) (PGAnonymizer)

slide-11
SLIDE 11

mantaq10

www.2019.pgdu.org

  • 1. AN: (Pseudo) Anonymisation

REPLACEMENT

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 SUPRESSION PERTURBATION GENERALISATION

ID NAME EMPLOYER ZIPCODE FK_SHOP

101 M*** ****A JB Vet 63456 12 112 B**** **E FBI 54367 45

(Wildcard or Removal)

  • 18 PII Attributes

(PG String Functions) (PGAnonymizer)

SUPRESSION

slide-12
SLIDE 12

mantaq10

www.2019.pgdu.org

  • 1. AN: (Pseudo) Anonymisation

REPLACEMENT

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 SUPRESSION PERTURBATION GENERALISATION

(Additive Noise) (PDF) (Data Imputation) (PGAnonymizer) (Google DP) (Uber DP)

PERTURBATION

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 SARAH CONNOR 12-07-1958 JB Vet 64532 12 112 PAMELA LANDY 18-11-1973 FBI 57843 45

slide-13
SLIDE 13

mantaq10

www.2019.pgdu.org

  • 1. AN: (Pseudo)Anonymisation

REPLACEMENT

ID NAME DOB EMPLOYER ZIPCODE FK_SHOP

101 SARAH CONNOR 12-06-1962 JB Vet 63456 12 112 PAMELA LANDY 18-10-1971 FBI 54367 45 SUPRESSION PERTURBATION GENERALISATION

(K-Anonymity or Masking) (PGAnonymizer) (PG Aggregate Functions)

GENERALISATION

ID NAME DOB EMPLOYER 𝝉_ZIPCODE FK_SHOP

101 SARAH CONNOR 1960s JB Vet 0.37 12 112 PAMELA LANDY 1970s FBI

  • 0.99

45

slide-14
SLIDE 14

mantaq10

www.2019.pgdu.org

Privacy vs Utility Trade-off

SM: Secure Multiparty Computing

SM

DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation

DP HE AN

Privacy Guarantee Better Utility Bleeding Edge Cutting Edge Established

slide-15
SLIDE 15

mantaq10

www.2019.pgdu.org

Differential Privacy

Statistical Properties The Oracle Perturbations (Noise) Database with Ned in it Private Database. Not sure if Ned is there anymore

?

  • Works on the Data itself, not on the management environment
  • Considerably fast compared to encryption techniques.
  • Quantum Safe (ish)
slide-16
SLIDE 16

mantaq10

www.2019.pgdu.org

Differential Privacy on PostgreSQL

https://github.com/google/differential-privacy

Count Sum Mean Variance Standard deviation Order statistics (including min, max, and median) Laplace Functions for UDFs

Privacy Loss

  • Epsilon & Delta
  • Risk Score for every attribute

used for a particular person

  • Risk Score for total number of

records with similar values

  • (rule of thumb) k = 11
slide-17
SLIDE 17

mantaq10

www.2019.pgdu.org

HE: Homomorphic Encryption

Ability to apply computations

  • n encrypted

data! Malleable Performance Operators

Trade-Offs Schemes

BFV BGV CKKS Full HE Partial HE

Categories

Microsoft SEAL PALISADE HELib HEAAN TFHE

Libraries

slide-18
SLIDE 18

mantaq10

www.2019.pgdu.org

Privacy vs Utility Trade-off

SM: Secure Multiparty Computing

SM

DP: Differential Privacy HE: Homomorphic Encryption AN: Anonymisation

DP HE AN

Privacy Guarantee Better Utility Bleeding Edge Cutting Edge Established

slide-19
SLIDE 19

mantaq10

www.2019.pgdu.org

Secure Multi-party Computing A B C D

X1 = A_pay + 876532 X2 = B_pay + X1 X3 = C_pay + X2 X4 = D_pay + X3 X4/4 = Avg_pay

K-Anonymity

slide-20
SLIDE 20

mantaq10

www.2019.pgdu.org

Privacy Guarantees

1 2 De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage & Math) 3 Ethical Computing (Permissibility & Compliance) 4

“Homomorphic encryption schemes are often repackaging vulnerabilities (practical chosen- ciphertext attacks) as features.” – The Internet x

f(x) 𝐺"# 𝐺 Loss-less Functions vs Lossy Functions PII and Attribute Augmentation

slide-21
SLIDE 21

mantaq10

www.2019.pgdu.org

Sources Landing 1 2 Unified Key Management System Processing Serving

De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage) Ethical Computing (Permissibility & Compliance)

1 1 3 2 4 2

………Privacy Gates

Typical Data Pipelines

slide-22
SLIDE 22

mantaq10

www.2019.pgdu.org

Sources Unified Key Management System Processing & Serving Persistence

De-Identification (Record Keys (PK, FK, SK)) Re-Identification (Brute Force & Decryption) Re-Identification (Record Linkage) Ethical Computing (Permissibility & Compliance)

Emerging Data Architecture (Data Fabrics) [HTAP = OLTP + OLAP]

*Gaps to Close:

  • Encryption

Performance

  • Developer UX
  • Admin Tooling
  • Extensions!
slide-23
SLIDE 23

mantaq10

www.2019.pgdu.org

Key Takeaways

Securing your database doesn’t guarantee data privacy. There are trade-offs between privacy and utility You can provision privacy controls within PostgreSQL PostgreSQL fits emerging (data) architecture patterns Atif is pledging to build an extension, he needs my help!

slide-24
SLIDE 24

Questions

24