SECURING DATA IN A BANKING DOMAIN 1 WHOAMI Federico Leven @ - - PowerPoint PPT Presentation

securing data in
SMART_READER_LITE
LIVE PREVIEW

SECURING DATA IN A BANKING DOMAIN 1 WHOAMI Federico Leven @ - - PowerPoint PPT Presentation

HADOOP UNDER ATTACK SECURING DATA IN A BANKING DOMAIN 1 WHOAMI Federico Leven @ ReactoData federico@reactodata.net Big Data + Open Source from 2012 Web : http://www.reactodata.net Big Data Meetup coordinator Twitter: @reactodata


slide-1
SLIDE 1

HADOOP UNDER ATTACK SECURING DATA IN A BANKING DOMAIN

1

slide-2
SLIDE 2

2

Federico Leven

WHOAMI

@ ReactoData Big Data + Open Source from 2012 Big Data Meetup coordinator (http://www.iaar.site), speaker ... federico@reactodata.net Web : http://www.reactodata.net Twitter: @reactodata Linkedin : https://www.linkedin.com/in/federicoleven/

slide-3
SLIDE 3

3

We are a startup based in Buenos and Poland, providing Big Data + Cloud solutions based on Open Source and proprietary software and Hadoop consultancy.

  • Big Data and Hadoop applications development
  • Machine Learning
  • Cloud
  • UX/UI and Mobile Apps for Big Data platforms
  • Hadoop Consultancy
slide-4
SLIDE 4

4

Agenda

The Challenge : Best Practices + Regulations How to do it in Hadoop End-to-End Secured Architecture What can go wrong ? References Conclusion & Questions

slide-5
SLIDE 5 Buzzconf 2018

5

Th The Cha e Challen llenge ge Be Best Practices st Practices an and r d reg egulations ulations

slide-6
SLIDE 6

The Challenge : Data Security

6

The set of preventive, detective and corrective measures to protect the integrity, confidenciality and availability of the data. CAAIN

  • CONFIDENCIALITY
  • AVAILABILITY
  • AUTHENTICITY
  • INTEGRITY

❑ ACCOUNTABILITY / AUDITING ❑ TRACEABILITY

  • NON-REPUDIATION

Cain and Abel

slide-7
SLIDE 7

C(A)AIN

7

CONFIDENCIALITY : Data is not made available or disclosed to unauthorized parties. AVAILABILITY : Data is available when is needed. AUTHENTICITY : Data source identity is verifiable. INTEGRITY : Data is accurate and complete over its entire lifecycle. NON-REPUDIATION : Parties of a data transaction cannot deny having received/sent the data .

slide-8
SLIDE 8

The Challenge : Threats in financial and banking domain

8

Emerging Technologies Challenges

  • Botnet
  • IoT unsecured devices
  • DDoS (Distributed Denial of Service

Attack)

Target

  • Sensitive data
  • Access credentials

Regulation Challenges

  • Periodically new and/or stricter

regulations

  • US Data Protection rules
  • EUR : GDPR

Insider Challenges

  • Unintentional actions
  • Malicious users
slide-9
SLIDE 9

The Challenge : Best Practices in banking

9

  • Security Officer
  • End User Guidelines
  • Access Policies
  • Governance

Human Organization Technological

  • Networking
  • Software Updates
  • Data Protection
  • Auditing
  • Employees Awareness
  • Training
slide-10
SLIDE 10 Buzzconf 2018

10

How to do it in Hadoop

slide-11
SLIDE 11

From concepts to technology

11

AUTHENTICATION : Identify the user. AUTHORIZATION : Grant user access to the data. PROTECTION : Protect data from being used except by authorized users. AVAILABILITY : Make data accessible when needed.

slide-12
SLIDE 12

From concepts to technology

12

C Authentication Authorization

A

Authentication

I

Protection Availability

A

Availability

N

  • Kerberos
  • LDAP
  • Sentry
  • HBase ACLs
  • +
  • Hadoop
  • HA (HDFS,

HBASE …)

  • Kerberos
  • LDAP
  • Encryption

(Motion & Rest)

  • Redaction
  • Hadoop/HA

Auditing Traceability Metadata Lineage Log Audit Cloudera Navigator

slide-13
SLIDE 13

From concepts to technology

13

https://www.cloudera.com/documentation/enterprise/5-14-x/topics/sg_edh_overview.html

  • BCRA A6375
  • BCRA A6495
  • ISO 17799/27001
slide-14
SLIDE 14

What we needed in a banking infrastructure for Hadoop

14

slide-15
SLIDE 15 Buzzconf 2018

15

End-to-End Secured Architecture

slide-16
SLIDE 16

Example Production deployment (CDH 5.13)

16

HDFS YARN ZOOKEEPER HBASE Kerberos AD Oracle DB HIVE METASTORE HIVESERVER SSL/TLS KTS KMS

  • C. MANAGER

SENTRY

slide-17
SLIDE 17

HDFS

Secure data pipeline example

17

ORACLE

Sources Ingest HDFS Landing Area HDFS Business Area

Rabbit MQ Flume Agent Sqoop

Web UI HDFS

Impala SparkSQL Hive

Encrypted Zone

Sqoop SSL ON

Data redaction custom component

Spark ETL

slide-18
SLIDE 18 Buzzconf 2018

18

What can go wrong ?

slide-19
SLIDE 19

What can go wrong ? Some good news and some bad news

19

UNSECURE APPLICATIONS WILL NOT WORK ON SECURE ENVIRONMENTS Sentry HDFS synchronization does not support Hive Metastore HA (CDH 5.9) Sentry HA not supported (CDH 5.9) To use CM Kerberos wizard, you need a high level privileges user SparkSQL does not respect Sentry permissions (Latest) Enabling Sentry turns off Hive impersonation (CDH 5.9) Spark Streaming cannot consume from secure Kafka (CDH 5.9)

slide-20
SLIDE 20 Buzzconf 2018

20

References

slide-21
SLIDE 21

21

✓ http://www.bcra.gob.ar/Pdfs/Texord/t-rmsist.pdf ✓ http://www.bcra.gov.ar/pdfs/texord/t-seguef.pdf ✓ https://en.wikipedia.org/wiki/ISO/IEC_27002 ✓ http://web.iram.org.ar/index.php?vernorma&id=2439 ✓ https://www.cloudera.com/documentation/enterprise/latest/PDF/cloudera-security.pdf ✓ https://www.cloudera.com/documentation/enterprise/5-9-x/topics/security.html ✓ https://www.forbes.com/sites/gregorymcneal/2014/05/26/banks-challenged-by- cybersecurity-threats-state-regulators-acting/#228d745597f7

References

slide-22
SLIDE 22

22

Thank you ! Questions, suggestions or complaints ?

“No Hadoop was harmed in the making of this presentation”