parquet modular encryption
play

Parquet Modular Encryption Gidon Gershinsky IBM Research Haifa Lab - PowerPoint PPT Presentation

Parquet Modular Encryption Gidon Gershinsky IBM Research Haifa Lab Speaker Senior Architect at IBM Research Haifa Lab gidon@il.ibm.com Leading role in Apache Parquet work on definition of encryption format and its implementation


  1. Parquet Modular Encryption Gidon Gershinsky IBM Research – Haifa Lab

  2. Speaker Senior Architect at IBM Research – Haifa Lab gidon@il.ibm.com Leading role in Apache Parquet work on definition of encryption format and its implementation • community work, folks from many companies are involved Number of projects on secure analytics on encrypted data • connected car and healthcare usecases • Apache Spark with Parquet encryption • Spark&AI Summit talk, 2018

  3. Overview • Goals of this technology • Parquet encryption – Features • Sample usecases • How to use Parquet encryption API • Basic integration with Apache Spark • Performance implications • Roadmap

  4. Apache Parquet Popular columnar storage format Encoding, compression Advanced data filtering • columnar projection : skip columns • predicate push down : skip files, or row groups, or data pages Performance benefits • less data to fetch from storage: I/O, latency • less data to process: CPU, latency How to protect sensitive Parquet data? • in any storage - keeping projection/predicates, supporting column access control, data tamper- proofing etc.

  5. Parquet Encryption: Goals Protect sensitive data-at-rest (in storage) • data privacy/confidentiality: encryption - hiding sensitive information • data integrity: tamper-proofing sensitive information • in any storage - untrusted, cloud or private, file system, object store, archives Preserve performance of analytic engines • full Parquet capabilities (columnar projection, predicate pushdown, etc) with encrypted data Leverage encryption for fine-grained access control • per-column encryption keys • key-based access in any storage: private -> cloud -> archive

  6. Parquet Encryption: Features Privacy: Hiding sensitive information • Full encryption: all data and metadata modules • min/max values, schema, encryption key ids, list of sensitive columns, etc • Separate keys for sensitive columns • column data and metadata • column access control • Separate key for file-wide metadata • Parquet file footer – encrypted with footer key • Storage server / admin never sees encryption keys or unencrypted data • “client - side” encryption

  7. Parquet Encryption: Features Privacy: Hiding sensitive information (continued) • Multiple encryption algorithms • different security and performance trade-offs • currently two algorithms are defined and implemented • AES_GCM : encrypts and tamper-proofs everything (data and metadata) • AES_GCM_CTR : encrypts everything, tamper-proofs metadata only could be useful in platforms without AES hardware acceleration, like Java 8 • if you need a new one, talk to us • Optional plaintext footer mode for legacy readers • any (old) Parquet reader can access unencrypted columns • footer is unencrypted – but tamper-proofed • signed with footer key

  8. Parquet Encryption: Features customers-sept-2019.part0.parquet customers-jan-2014.part0.parquet Data integrity verification • File data and metadata are not tampered with • modifying data page contents • replacing one data page with another • File not replaced with wrong file • unmodified - but e.g. outdated • sign file contents and file id • Example: altering customer / billing data • Example: altering healthcare data (!) - patient record or medical sensor readings • AES GCM: “authenticated encryption” • implemented in hardware

  9. Current Status • Apache Parquet community work • Encryption specification approved in January 2019 • signed-off by PMC • Specification and Thrift format merged • in apache/parquet-format master • part of parquet-format-2.7.0 release pull request (merged too) • Implementation • C++ and Java code • pull requests being reviewed, some already merged • implementation and API that closely follows the encryption specification

  10. Parquet Encryption Usecases Same as “Parquet Usecases ” – with sensitive column data • Data queries, analytic applications - in any industry • Spark/Hive/Presto with Parquet: horizontal platform, not a vertical solution • Protect data privacy / confidentiality • personal data privacy • sensitive business data • regulations • Protect data integrity • business processes • wrong billing due to tampering with e.g. customer data • personal health • wrong treatment due to tampering with patient records or sensor readings

  11. Connected Car Usecase “ RestAssured ” – EU Horizon 2020 research project (N 731678) Project partners IBM, Adaptant, OCC, Thales, UDE, IT Innovation Project usecases • usage-based car insurance, social services • encryption: protect personal data • integrity: prevent billing tampering Spark&AI Summit EU 2018: demo shots with Spark/Parquet Encryption

  12. Healthcare Usecase “ ProTego ” – EU Horizon 2020 research project (N 826284) Project partners St Raffaele hospital, Marina Salud hospital, IBM, GFI, ITI, UAH, IMEC, KUL, ICE Project usecases • Queries / analytics on sensitive healthcare data • HL7 FHIR standard: maps nicely to Parquet • encryption: protect personal data • integrity: prevent tampering with diagnosis and treatment

  13. Encryption API • Parquet API - without encryption ParquetFileWriter fileWriter = new ParquetFileWriter(file_path , schema, …); • then write data ParquetFileReader fileReader = ParquetFileReader.open(file_path, options); • then read data • Parquet API - with encryption ParquetFileWriter fileWriter = new ParquetFileWriter(file_path , schema, …, fileEncryptionProperties ); • then write data (just like before) ParquetFileReader fileReader = ParquetFileReader.open(file_path, options, fileDecryptionProperties ); • then read data (just like before)

  14. File Encryption Properties Trivial • encrypt all columns (and footer) with key0 • tamper-proof encrypted content • enable columnar projection, predicate pushdown, etc byte[] key0 = … // e.g. 128 bit key – 16 bytes FileEncryptionProperties fileEncryptionProps = FileEncryptionProperties.builder(key0).build();

  15. File Encryption Properties Basic • encrypt columnA with key1, columnB with key2 (and footer with key0) • differential column access control • assign key IDs (key metadata) for simplified key retrieval • tamper-proof encrypted content • enable columnar projection, predicate pushdown, etc

  16. File Encryption Properties Basic • encrypt columnA with key1, column with key2 (and footer with key0) byte[] key1 = … // e.g. 128 bit key – 16 bytes ColumnEncryptionProperties encrColumnA = ColumnEncryptionProperties .builder(“ columnA") .withKey(key1) .withKeyID (”key1”) .build(); same for column B. Then file properties: FileEncryptionProperties fileEncryptionProps = FileEncryptionProperties.builder(key0) .withFooterKeyID (“key0”) .withEncryptedColumns(encryptedColumns) // list (map) of column encryption properties .build();

  17. File Encryption Properties Advanced • Protect against file replacement attacks • Replacement with untampered but e.g. outdated file (table partition) String fileID = “customers -sept- 2019.part0”; byte[] aadPrefix = fileID.getBytes(); FileEncryptionProperties fileEncryptionProps = FileEncryptionProperties.builder(key0) .withFooterKeyID (“key0”) .withAADPrefix(aadPrefix) .withEncryptedColumns(encryptedColumns) .build();

  18. File Encryption Properties Advanced • Allow legacy clients to read unencrypted columns in encrypted files • plaintext (unencrypted) footer mode • visible file metadata (schema, names of secret columns and of their keys, etc) • tamper-proof (sign) file metadata with footer key FileEncryptionProperties fileEncryptionProps = FileEncryptionProperties.builder(key0) .withFooterKeyID (“key0”) .withPlaintextFooter() .withEncryptedColumns(encryptedColumns) .build();

  19. File Encryption Properties Advanced • Use alternative encryption algorithm • better performance in old Java versions • tamper-proofing metadata only (not data) FileEncryptionProperties fileEncryptionProps = FileEncryptionProperties.builder(key0) .withFooterKeyID (“key0”) .withAlgorithm(ParquetCipher.AES_GCM_CTR_V1) .withEncryptedColumns(encryptedColumns) .build();

  20. File Decryption Properties Simpler than encryption properties • most of details are specified in file metadata StringKeyIdRetriever keyRetriever = new StringKeyIdRetriever(); keyRetriever.putKey (“key0”, key0); keyRetriever.putKey (“key1”, key1); keyRetriever.putKey (“key2”, key2); FileDecryptionProperties fileDecryptionProps = FileDecryptionProperties.builder() .withKeyRetriever(keyRetriever) .build();

  21. File Decryption Properties Advanced • Protect against file replacement attacks String fileID = “customers -sept- 2019.part0”; byte[] aadPrefix = fileID.getBytes(); FileDecryptionProperties fileDecryptionProps = FileDecryptionProperties.builder() .withKeyRetriever(keyRetriever) .withAADPrefix(aadPrefix) .build();

  22. Beyond Low Level API Low level API – full power of Parquet encryption • directly implements the approved specification features • enables any key management scheme • work with KMS instead of explicit keys • you need to build one – choosing from many options for KMS, Auth, envelope encryption (data key wrapping) • if you know how – Parquet low level encryption API is all you need • no one-size-fits-all solution for KMS/Auth/Wrapping

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend