Page 1
Encryption and Anonymization in Hadoop
Sept-28-2015 ApacheCon, Budapest
Current and Future needs
Encryption and Anonymization in Hadoop Current and Future needs - - PowerPoint PPT Presentation
Encryption and Anonymization in Hadoop Current and Future needs Sept-28-2015 ApacheCon, Budapest Page 1 Agenda Need for data protection Encryption and Anonymization Current State of Encryption in Hadoop Demo Future focus
Page 1
Sept-28-2015 ApacheCon, Budapest
Current and Future needs
Page 2
Page 3
Chief Security Architect, Hortonworks Committer - Apache Ranger and Apache Hawq Sr Director, Enterprise Security Hortonworks Committer - Apache Ranger
bosco@apache.org bganesan@apache.org
Page 4
in Hadoop
encryption
audit reporting w/ Apache Ranger
control with Apache Ranger
Authorization What can I do? Audit What did I do? Data Protection
Can data be encrypted at rest and over the wire?
Apache Knox Authentication Who am I/prove it? Hadoop Ecosystem Centralized Security Administration w/ Ranger
Page 5
Encryption and Anonymization
Page 5
Page 6
(Retail, Consumer) or HIPAA ( Healthcare)
Page 7
Granualrity Ease of implementation
Page 8
Partition 2..n DM - CRYPT Partition 1
/root / grid0 / grid2 / gridn
Hadoop Why it helps?
– all data is encrypted
and Vendor solutions available Cons
data
Page 9
Ranger KMS
NN A B C D HDFS Client A B C D A B C D DN DN DN
Why it helps?
levels
application, little changes needed
Page 10
NN A B C D A B C D A B C D DN DN DN HBase Hive Oozie Sqoop Spark
Guidelines
stored in HDFS
to ensure scratch dir is encrypted
HDFS, Yarn, Oozie
should be in EZ
Page 11
Page 12
Create Encryption Zone Create EZ Keys Provide EZ Keys Ranger KMS NN, DN Client NN marks folder as EZ
Page 13
Receive EDEK. Request DEK Create DEK and encrypt with EZ Key Decrypt EDEK, provide DEK NN, DN Client Client request to write to EZ NN does access check. Encrypt data and write to DN. Send block information to
stored with file Ranger KMS
Page 14
Receive EDEK. Request DEK Decrypt EDEK, provide DEK NN, DN Client Client request to read from EZ NN does access check. Provide data, EDEK Use DEK to read file data Ranger KMS
Page 15
Why it helps?
encrypted and stored in disk
configuration
in Java keystore
Page 16
Don Bosco Durai
Page 16
Page 17
Focus areas for the community
Page 18
² Hive Column Encryption ² Solidifying Hbase Encryption ² Kafka and Solr Encryption ² Need for Tokenization/Masking
Page 19
ORC-14
encrypted with different key
Ranger KMS
How it will help?
fields instead of file
protected in HDFS as well as OS layer
Page 20
How it will help?
local data stored in disks
encrypted
Page 21
ORC
Ranger)
How it will help?
could be stored in indexes, may need to be encrypted
granularity than OS or HDFS encryption
Page 22
number) with some other value. Could be format preserving or random unique value.
can be changed to xxxx xxxx xxxx 1234)
How it helps?
sensitive data beyond access control
control
compliance to privacy laws
Page 23
Page 24
Based on policy, if user is Data Scientist, then tokenize/mask data before returning Name Returned (Format Preserved) Actual John Doe 415-123-4567 415-682-5638 Jane Smith 408-123-4567 408-802-4027 Mary Pick 650-123-4567 650-865-6921
Page 25