SLIDE 1 Privacy Preserved Data Augmentation using Enterprise Data Fabric
Final blow before Tea!
I was like her according to her; We were both outliers Twitter: @mantaq10 Atif Rahman Zetaris www.zetaris.com
SLIDE 2 Data Exchanged (without consent)
- GPS
- HIV Status
- Email addresses
- Weapon: Contract
- Response: Excuse
- Exposure: (Potential) exposure
- f marginalized people.
SLIDE 3 Data Breach:
- Email Addresses
- Username & Passwords
Exposure:
Response:
- No clear Apologies
- (Delayed) Corrective Actions
Weapon: Contract
SLIDE 4 Data Breach:
- Names
- Loyalty data
- Email addresses
- Physical addresses
- DOB
- Credit Card last 4 digits
Exposure:
Response:
- Denial
- Fake Solutions
- 8 months before first action
SLIDE 5 Paper contracts are still the most common weapon organizations use to get away with. As regulations get more mature, the impetus to be more effective in privacy preservation will be on service providers.
SLIDE 6 From the exhibition: "M. Hulot, the protagonist in Jacques Tati's 1967 film Playtime, is continually frustrated by the endless repetition of office cubicles.
Enterprises have different data landscape than consumer facing (typically tech) organisations. Enterprises have silos, legacy systems, have to learn to be data driven the hard way and have divergent forces giving a unique focus on
SLIDE 7 Agenda
- Data Augmentation
- First Principles
- Enterprise Data Fabric
SLIDE 8 Data Augmentation
ORG A Class 1 Class 2 Class 3
SLIDE 9 Data Augmentation
ORG A Class 1 Class 2 Class 3 ORG A Class 1 Class 2 Class 3 ORG B ORG C
Potentially Better
Typical Modeling Exercise Modeling after data augmentation
SLIDE 10 ORG A Class 1 Class 2 Class 3 ORG B ORG C
Content Shared
- Aggregated Data / Insights
- Open Data
- Stratified Sampling
- Synthetic Data
- De-identified / Anonymized
Channels:
- Public Portals
- Private Marketplaces
- In Person Walk
throughs/handovers
Data Augmentation
SLIDE 11 Data as an asset
- Easy to copy and spawn
- Does not depreciate or depletes
- Really hard to valuate
- Process to yield value
- Various forms and derivatives
Resolve to First Principles
Data has properties that make it intrinsically hard to ensure privacy
- preservation. Therefore, we must
adhere to first principles to better understand the problem statement first.
SLIDE 12 The Five Safes
Safe Data Safe People Safe Setting Safe Project Safe Output
Great Resources
ACS Data Sharing Frameworks The De-Identification Decision Making Framework
SLIDE 13
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 14
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 15
Safe Data – (Encryption)
Data at Rest Standard Encryption Data in Transit Secure the Pipe Data for Compute Homomorphic Encryption
SLIDE 16 Homomorphic Encryption
Partial Homomorphic Encryption (PHE) Somewhat Homomorphic Encryption (SWHE) Full Homomorphic Encryption (FHE) Addition/Multiplication Low Order Polynomials Eval of Arbitrary Functions
More General Less Costly
Data Analytics without seeing the data Max Ott, YOW Data 2016
SLIDE 17
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 18
Safe Setting - Confidential Computing
Trusted Execution Environments (Safe Data in Safe Setting)
Microsoft Azure Confidential Computing Google Cloud Platform: Asylo Open Source Framework Confidential Computing at the Software layer?
SLIDE 19
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 20
SLIDE 21
Alice Bob
SLIDE 22
Safe People – (System Span)
SLIDE 23
Safe People – (System Span)
SLIDE 24
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 25
Safe People – (System Span)
SLIDE 26 Safe People – (System Span)
Expanding the Span of control
SLIDE 27
First Principles
Safe Data Safe People Safe Setting Safe Project Safe Output Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
SLIDE 28
Safe Project – Audit Trails & Lineage
SLIDE 29 Safe Project – Audit Trails & Lineage
?
Data in the wild
Its still very hard within enterprises to have a point to point track of data lineage and processing. The problem is expounded when data leaves the span of vision.
SLIDE 30 One Ring to Rule them All?
Encryption Authentication & Authorisation Environment for Data Controllers & Processors Audit Trail, Lineage and Access & Query Logs Linkage Problem
A data landscape must cover all principles of data privacy.
SLIDE 31
Monoliths in the era of Microservices
SLIDE 33 App DB Server DB Server DB Server
SLIDE 34 App DB Server DB Server DB Server DB Caching DB In-Memory DB Streams DB Messaging App App
SLIDE 35 DB Server App Server Server DB DB DB App App The Enterprise Data Fabric A unified data layer that is used by both user facing applications and downstream analytics, a potential holistic five safes environment
SLIDE 36 The Zetaris Enterprise Data Fabric – Location Aware, Usage Aware, People Aware, Privacy Preserved data in a secure environment. Also check out Apache Ignite, Redhat OpenShift + JBoss Virtualization,.
SLIDE 37
SLIDE 38 GDPR Highlights
Data Portability Erasure Access Consent
Right to transfer personal data from one electronic processing system to and into another. Right to withdraw consent and ask for personal data to be deleted Right to know what’s been collected and how its being processed Consumer is informed in ’clear’ and plain language. Consent to collect can be withdrawn at any time By Design By Design By Design By Design Only through Serialization Random writes are not typical Limited Purview Hard
Monoliths e.g. Lakes Data Fabric
SLIDE 39 As data scientists, we are at the forefront of disruption and hold the potential to change things. We are automating decisions in all aspects of society. Yet, our work has serious negative implications, we need to educate ourselves
questions around regulations, ethics and impact Enjoy the Tribe!