Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu - - PowerPoint PPT Presentation

secure your hadoop cluster with apache sentry incubating
SMART_READER_LITE
LIVE PREVIEW

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu - - PowerPoint PPT Presentation

1 Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang |Software Engineer, Cloudera April 07, 2014 2 Outline Introduction Hadoop security primer Authentication Authorization Data Protection Governance


slide-1
SLIDE 1

1

Secure Your Hadoop Cluster With Apache Sentry (Incubating)

Xuefu Zhang |Software Engineer, Cloudera April 07, 2014

slide-2
SLIDE 2

Outline

  • Introduction
  • Hadoop security primer
  • Authentication
  • Authorization
  • Data Protection
  • Governance and Auditing
  • Introducing Apache Sentry
  • What's Sentry
  • Sentry Architecture
  • Sentry Internal
  • Future work
  • Q&A

2

slide-3
SLIDE 3

Introduction

  • Hadoop gets bigger ...
  • Hadoop has been enjoying an increasing

adoption rate

  • More and more data on Hadoop Cluster
  • More and more access to the data
  • Data warehouse offload is the most common use

case

  • Apache Hive, Apache Drill, Cloudera Impala
  • SQL on Hadoop is phenomenon

3

slide-4
SLIDE 4

Introduction (cont'd)

  • But more encumbrance ...
  • Enterprises wants to protect sensitive data
  • Government regulations, compliance, like HIPPA,

PII, FISMA

  • Existing security problems with Hadoop has

hindered the adoption

  • Security has become the top priority

4

slide-5
SLIDE 5

Introduction (cont'd)

  • Reality is
  • Different components, different security

mechanisms

  • Multiple components may access the same data

set

  • Hadoop was born out of trust, not security
  • Thinking of Windows

5

slide-6
SLIDE 6

Outline

  • Introduction
  • Hadoop security primer
  • Authentication
  • Authorization
  • Data Protection
  • Governance and Auditing
  • Introducing Apache Sentry
  • What's Sentry
  • Sentry Architecture
  • Sentry Internal
  • Future work
  • Q&A

6

slide-7
SLIDE 7

Hadoop Security Primer

  • Authentication
  • Identify who you are
  • Untrusted users has no access to the cluster

network

  • Trusted network, every one is good citizen
  • Who you are is determined by client host

7

slide-8
SLIDE 8

Hadoop Security Primer

  • Strong Authentication
  • Kerberos
  • LDAP, ActiveDirectory
  • LDAP, AD integrated with Kerberos, establishing

a single point of truth

  • Single point of truth

8

slide-9
SLIDE 9

Hadoop Security Primer (cont'd)

  • Kerberos
  • Strong authentication
  • Provides mutual authentication
  • Protects against eavesdropping and replay

attacks

  • Every user and service has a Kerberos “principal”
  • Credentials: keytabs (service), password (user)

9

slide-10
SLIDE 10

Hadoop Security Primer (cont'd)

  • Authorization
  • HDFS Posix style permission R/W/E for O/G/O,

coarse-grained

  • Other components have authorization
  • MR job queue
  • HBase ACLs on table and column family.
  • Accumulo provides cell-level access control
  • Impersonation

10

slide-11
SLIDE 11

Hadoop Security Primer (cont'd)

  • Data Protection
  • Data at rest and in transit
  • Hadoop provides encryption on data in transit:

DTP, HTTP, RPC, JDBC/ODBC

  • Hadoop has no native encryption on data at rest
  • Relying on OS-level encryption

11

slide-12
SLIDE 12

Hadoop Security Primer (cont'd)

  • Governance and auditing
  • Again, component to component
  • DFS and MapReduce provide base audit support
  • Apache Hive metastore records audit (who/when)

information for Hive interactions.

  • Apache Oozie provides audit trail for services

12

slide-13
SLIDE 13

Outline

  • Introduction
  • Hadoop security primer
  • Authentication
  • Authorization
  • Data Protection
  • Governance and Auditing
  • Introducing Apache Sentry
  • What's Sentry
  • Sentry Architecture
  • Sentry Internal
  • Future work
  • Q&A

13

slide-14
SLIDE 14

Introducing Apache Sentry

14

  • Hadoop Authorization
  • Existing authorization is fragmented, coarse-

grained, and manual

  • A lot of times data is just unprotected for

simplicity

  • Enterprises need a centralized authorization

component that work across components with ease of use, fine-grained, role based

slide-15
SLIDE 15

Introducing Apache Sentry (cont'd)

15

  • What's Sentry
  • Sentry is an authorization module for Hive,

Search, Impala, and beyond

  • It unlocks Key RBAC Requirements: secure, fine-

grained, role-based authorization, multi-tenant administration

  • Open Source, Apache Incubator project
  • Ecosystem Support: Apache SOLR, HiveServer2,

& Impala 1.1+

slide-16
SLIDE 16

Introducing Apache Sentry (cont'd)

16

  • Key Benefits
  • Store Sensitive Data in Hadoop
  • Extend Hadoop to More Users
  • Comply with Regulations
slide-17
SLIDE 17

Introducing Apache Sentry (cont'd)

17

  • Key Capabilities
  • Fine-Grained: SERVERS, DATABASES, TABLES &

VIEWS; INDEXES, COLLECTIONS

  • Role-Based: role including privileges such as

SELECT, INSERT, ALL; UPDATE, QUERY

  • Multi-T

enant administration

  • Separate policies for each database/schema
  • Can be maintained by separate admins
slide-18
SLIDE 18

Introducing Apache Sentry (cont'd)

18

Binding Layer Impala

Impal a Hive Policy Engine Policy Provider File Database

HiveServ er2 Authorization Provider

Local FS/HDFS Search

SOLR

Pig …

Sentry Architecture

slide-19
SLIDE 19

Introducing Apache Sentry (cont'd)

19 Query MR SQL Parse Build Check Plan Sentry

Validate SQL grammar Construct statement tree Validate statement objects

  • First check: Authorization

Forward to execution planner

slide-20
SLIDE 20

Introducing Apache Sentry (cont'd)

  • Actors
  • User
  • User group membership
  • Resources
  • Privilege
  • Role

20

slide-21
SLIDE 21

Introducing Apache Sentry (cont'd)

  • User
  • User authenticated
  • User identity obtained from session context

21

slide-22
SLIDE 22

Introducing Apache Sentry (cont'd)

  • User group membership
  • Defined outside sentry policy
  • Obtained from user directory (LDAP, AD, HDFS)
  • Maybe available from session context

22

slide-23
SLIDE 23

Introducing Apache Sentry (cont'd)

  • Resources
  • Data to be protected
  • File or directory on HDFS
  • T

able or views in Hive

  • URI
  • Resource can be hierarchical

23

slide-24
SLIDE 24

Introducing Apache Sentry (cont'd)

  • Privilege
  • Action or operation on a resource
  • Exists in a role only
  • SELECT on a given TABLE or VIEW
  • CREATE a TABLE or VIEW
  • QUERY on a search COLLECTION
  • DELETE a FILE or DIRECTORY
  • Example

collection=customerCol->action=query

24

slide-25
SLIDE 25

Introducing Apache Sentry (cont'd)

  • Roles
  • A collection of privileges
  • Defined in Sentry policy
  • Example

[roles] ana_query_role = collection=sentryColl->action=query ana_update_role = collection=sentryColl->action=update test_role = collection=testColl->action=update full_admin_role = collection=*

25

slide-26
SLIDE 26

Introducing Apache Sentry (cont'd)

  • (Group, Role) mapping
  • Defined in policy
  • One-to-Many
  • Example

[groups] analyts = ana_query_role, ana_update_role admins = full_admin_role testgroup = test_role hbase = full_admin_role

26

slide-27
SLIDE 27

Introducing Apache Sentry (cont'd)

  • Rule evaluation
  • Who's the user?
  • Which group(s) does the user belong to?
  • What resource to be accessed?
  • How the resource is accessed (READ, SELECT,

etc.)?

  • Does any of the user's groups have a role, which

has the right privilege?

  • Yes – great! Go head!
  • No – sorry! No sufficient privilege!

27

slide-28
SLIDE 28

Outline

  • Introduction
  • Hadoop security primer
  • Authentication
  • Authorization
  • Data Protection
  • Governance and Auditing
  • Introducing Apache Sentry
  • What's Sentry
  • Sentry Architecture
  • Sentry Internal
  • Future work
  • Q&A

28

slide-29
SLIDE 29

Future Work

29

  • Introduce Sentry to more Hadoop components for

their authorization needs

  • Centralized policy store aiming for the whole

enterprise

  • Grant/Revoke
  • Centralized authorization service for all protected

resources including metadata

  • We need your contribution or support
slide-30
SLIDE 30

Click to edit Master title style

30