Audi's journey to an enterprise big data platform Strata Data 2018 - - - PowerPoint PPT Presentation

audi s journey to an enterprise big data platform
SMART_READER_LITE
LIVE PREVIEW

Audi's journey to an enterprise big data platform Strata Data 2018 - - - PowerPoint PPT Presentation

Audi's journey to an enterprise big data platform Strata Data 2018 - London Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany) 2 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big


slide-1
SLIDE 1

Strata Data 2018 - London

Audi's journey to an enterprise big data platform

Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)

slide-2
SLIDE 2

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

2

WHO ARE WE?

slide-3
SLIDE 3

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

3

Audi Group Audi, Lamborghini, Ducati and Italdesign

slide-4
SLIDE 4

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

4

Vorsprung is our promise Strategy 2025

slide-5
SLIDE 5

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

5

Audi Business Innovation GmbH

...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding

  • f shares in the field of future mobility.

Audi mobility innovations

Audi on demand

Audi balanced technologies

Audi e-gas

Audi customer IT solutions

slide-6
SLIDE 6

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

6

About us Matthias Graunitz AUDI AG » Center of Competence Big Data & BI » Big Data Architect » 10+ years Data Warehousing & BI Carsten Herbe Audi Business Innovation GmbH » Data Platform & Solution Architecture » Hadoop since 2013 » 10+ years Data Warehousing & BI

slide-7
SLIDE 7

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

7

2 YEARS AGO…

STARTING BIG DATA AT AUDI

slide-8
SLIDE 8

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

8

Analytical Capabilities by 2015

!

Data Domains

Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists

Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

slide-9
SLIDE 9

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

9

Analytical Capabilities by 2015

!

Data Domains

Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists

Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

slide-10
SLIDE 10

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

10

Analytical Capabilities by 2015

!

Data Domains

Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists

Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Cloud Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage

AAP – AUDI ANALYTIC PLATTFORM

File Systems (HDFS) Stream Processing Machine Learning

slide-11
SLIDE 11

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

11

Our first Hadoop Cluster 2015

Hadoop per node Sum

# data nodes 1 4 RAM 128 GB 0,5 TB Cores 24 96 HDD* 40 TB 160 TB

DEV

* Raw Capacity without replication and FS overhead!

slide-12
SLIDE 12

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

12

Our first attempt to walk with Big Data Technologies

SCREWDRIVER ANALYSIS COMPANY CAR ANALYSIS

slide-13
SLIDE 13

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

13

ENTERPRISE INTEGRATION VS SPEED OF DELIVERY

slide-14
SLIDE 14

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

14

Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture …

Access Control: ACLs User Management Local OS users Basic Security: iptables + ssh tunneling Authentication: LDAP for Hive Protection from outside: Knox Protection from inside Kerberos Access Control File Attributes Dedicated network: BI Zone Access Control & Audit Ranger User Management LDAP

slide-15
SLIDE 15

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

15

Legend: password required no password required next step

Password Hell

Hive WebHDFS SparkUI HDFS/YARN Knox

Audi Active Directory: [ AD User ] Named User Technical Hive User

DATA NODE 1 - X NAME NODE 1 - 2 EDGE NODE 1 - 2

OS Level [ Local User ] OS Named User Technical Hive User Technical Project User Hadoop User

SSH 2 EdgeNode kinit

Hadoop KDC: [ Kerberos Principal ] Name User Technical Hive User Technical Project User Hadoop User

slide-16
SLIDE 16

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

16

DATA INGESTION

slide-17
SLIDE 17

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

17

Data ingestion: technical requirements from projects, security and ops

» Streaming data » Batch data » easy writing to HDFS/DWH

INGESTION

» Data Sources should not directly be coupled to analytical backend jobs » This allows adding new analytical jobs without changing the source

DECOUPLING

» Data ingestion must be available 24x7 » Data must be buffered (persisted) in case backend or backend job is not available

HA & BUFFERING

» Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec » Authentication + Data in motion encryption (multi tenancy) » Protocol must be auditable » Some data sources run in the cloud

SECURITY

» Amount of data will increase over time for most projects » Number of projects will increase

SCALABILITY

slide-18
SLIDE 18

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

18

Solution: Kerberized Confluent Kafka Platform

FW BI FW MSG FW MSG FW SRC #1

BI Data Zone Data Source network #1 AAP Messaging Zone

authentication Legend: encrypted (SSL) not encrypted protocol / direction

Data Source network #n

FW SRC #n firewall pain point

Schema Registry

HTTP HTTP none none

Kafka Client

Kerberos BIN / push

Kafka Client

Kerberos BIN / push

HDFS Connector

BIN / pull

Hadoop KDC

Kerberos

HDFS

Kerberos

Spark Streaming

BIN / pull Kerberos

DataProxy KDC Kafka Broker

Kerberos Kerberos BIN BIN

Zookeeper

Kerberos Kerberos

slide-19
SLIDE 19

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

19

Edge Node

Kafka Distributed Connector: unsecured REST API

User Bob

Connector Java Process

Bob‘s Kafka keytab Bob‘s HDFS keytab HDFS Sink Bob HTTP Bob’s data sink config Bob topic Bob User Eve sink config Eve File Sink Eve Bob’s data HDFS Source Eve source config Eve

Legend: evil connection good connection

slide-20
SLIDE 20

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

20

TODAY CURRENT STATE

slide-21
SLIDE 21

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

21

Architecture & Network Zones – Data Ingestion

Data Proxy BI Data Zone Messaging Zone Data Warehouse System A System A HDFS Connector Spark Streaming Cloud App System System

Legend: encrypted (SSL) not encrypted

S3 Backup

slide-22
SLIDE 22

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

22

Architecture & Network Zones – User & Developer Access

PIPE BI Data Zone Deployment Zone BI Application Zone AAP Data Warehouse Audi Office LAN Audi Laptop Data Mining Dashboarding

AAP Remote Desktop Legend: encrypted (SSL) not encrypted

slide-23
SLIDE 23

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

23

Hadoop Cluster Sizing Production 2017

* Raw Capacity without replication and FS overhead!

Hadoop

per node Sum

# data nodes 1 12 RAM 512 GB 6 TB Cores 24 288 HDD* 96 TB 9.216 TB

PROD

Kafka

per node Sum

# broker nodes 1 4 RAM 32 GB 128 GB Cores 6 24 HDD* 4 TB 16 TB

PROD

slide-24
SLIDE 24

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

24

Current state

Organisational Tasks

slide-25
SLIDE 25

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

25

Organisational Tasks

  • Data Ownership & Data Governance

(Data Domain Modell with clear responsibility in each domain)

  • Lifecycle Management for each Shared Service in strong collaboration

with the projects and programs

  • Defined SLAs for each Shared Service based on general availability, data

loss, confidentiality and verifiability

  • Different Development Lifecycle between car and backend systems
  • Use of Open Source Software and Support requirements from IT

continuity

  • Balance between multi tenant environment and flexibility
  • Very long lifecycle of cars > 10 years with various built in software

versions

slide-26
SLIDE 26

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

26

TOMORROW WHAT’S UP NEXT

slide-27
SLIDE 27

AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform

27

Hybrid Approach for the AAP

Public Cloud On Premise / private Cloud Entry Zone Application Zone Data Zone Web GATEWAY Full Client (Tableau, BO, etc.) Web Client (Tableau, BO, etc.) HDP Data Warehouse Messaging Zone Kafka Internet RDP GATEWAY Business User Ingestor 1* Repositories Knox Direct Cloud Connect Swarm VPC Kafka Data Inventory Analytical VPC Ingestor HDP Knox

slide-28
SLIDE 28

WE ARE HIRING

https://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html https://karriere.audibusinessinnovation.com/