Strata Data 2018 - London
Audi's journey to an enterprise big data platform
Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)
Audi's journey to an enterprise big data platform Strata Data 2018 - - - PowerPoint PPT Presentation
Audi's journey to an enterprise big data platform Strata Data 2018 - London Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany) 2 AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big
Strata Data 2018 - London
Audi's journey to an enterprise big data platform
Matthias Graunitz (AUDI AG, Germany) Carsten Herbe (Audi Business Innovation GmbH, Germany)
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
2
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
3
Audi Group Audi, Lamborghini, Ducati and Italdesign
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
4
Vorsprung is our promise Strategy 2025
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
5
...is the development, establishment, sales and operation of innovative concepts, products and services, as well the holding
Audi mobility innovations
Audi on demand
Audi balanced technologies
Audi e-gas
Audi customer IT solutions
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
6
About us Matthias Graunitz AUDI AG » Center of Competence Big Data & BI » Big Data Architect » 10+ years Data Warehousing & BI Carsten Herbe Audi Business Innovation GmbH » Data Platform & Solution Architecture » Hadoop since 2013 » 10+ years Data Warehousing & BI
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
7
2 YEARS AGO…
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
8
Analytical Capabilities by 2015
!
Data Domains
Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists
Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
9
Analytical Capabilities by 2015
!
Data Domains
Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists
Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
10
Analytical Capabilities by 2015
!
Data Domains
Finance Purchase Production Quality Sales Car Data Programs Projects Data Scientists
Embed Analytics Analyze Data Store, Distribute and Process Data Deliver Information Secure Data Infrastruc- ture & Services Provision Data Deliver Service Manage Infor- mation Design & Maintain Solutions Authentifi- cation Data Encryption Auditing Complex Event Processing Analyitcal APIs Dash- boarding Planning & Simulation Visual Analytics BI Report & OLAP Statistical Methods Analytical Script Data Warehouse Analytical Databases ETL Framework Batch Processing Data Access / APIs On-Prem Platform Cloud Platform Application Deployment Hardware, Network, OS Monitoring Lifecycle Mgmt Development Process & Methods Master Data Mgmt Data Lineage
AAP – AUDI ANALYTIC PLATTFORM
File Systems (HDFS) Stream Processing Machine Learning
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
11
Our first Hadoop Cluster 2015
Hadoop per node Sum
# data nodes 1 4 RAM 128 GB 0,5 TB Cores 24 96 HDD* 40 TB 160 TB
* Raw Capacity without replication and FS overhead!
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
12
Our first attempt to walk with Big Data Technologies
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
13
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
14
Securing the Cluster as multi-tenant environment Step by step by step towards our target architecture …
Access Control: ACLs User Management Local OS users Basic Security: iptables + ssh tunneling Authentication: LDAP for Hive Protection from outside: Knox Protection from inside Kerberos Access Control File Attributes Dedicated network: BI Zone Access Control & Audit Ranger User Management LDAP
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
15
Legend: password required no password required next step
Password Hell
Hive WebHDFS SparkUI HDFS/YARN Knox
Audi Active Directory: [ AD User ] Named User Technical Hive User
DATA NODE 1 - X NAME NODE 1 - 2 EDGE NODE 1 - 2
OS Level [ Local User ] OS Named User Technical Hive User Technical Project User Hadoop User
SSH 2 EdgeNode kinit
Hadoop KDC: [ Kerberos Principal ] Name User Technical Hive User Technical Project User Hadoop User
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
16
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
17
Data ingestion: technical requirements from projects, security and ops
» Streaming data » Batch data » easy writing to HDFS/DWH
INGESTION
» Data Sources should not directly be coupled to analytical backend jobs » This allows adding new analytical jobs without changing the source
DECOUPLING
» Data ingestion must be available 24x7 » Data must be buffered (persisted) in case backend or backend job is not available
HA & BUFFERING
» Source systems must not connect directly to the data zone (Hadoop, DWH) – by IT Sec » Authentication + Data in motion encryption (multi tenancy) » Protocol must be auditable » Some data sources run in the cloud
SECURITY
» Amount of data will increase over time for most projects » Number of projects will increase
SCALABILITY
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
18
Solution: Kerberized Confluent Kafka Platform
FW BI FW MSG FW MSG FW SRC #1
BI Data Zone Data Source network #1 AAP Messaging Zone
authentication Legend: encrypted (SSL) not encrypted protocol / direction
Data Source network #n
FW SRC #n firewall pain point
Schema Registry
HTTP HTTP none none
Kafka Client
Kerberos BIN / push
Kafka Client
Kerberos BIN / push
HDFS Connector
BIN / pull
Hadoop KDC
Kerberos
HDFS
Kerberos
Spark Streaming
BIN / pull Kerberos
DataProxy KDC Kafka Broker
Kerberos Kerberos BIN BIN
Zookeeper
Kerberos Kerberos
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
19
Edge Node
Kafka Distributed Connector: unsecured REST API
User Bob
Connector Java Process
Bob‘s Kafka keytab Bob‘s HDFS keytab HDFS Sink Bob HTTP Bob’s data sink config Bob topic Bob User Eve sink config Eve File Sink Eve Bob’s data HDFS Source Eve source config Eve
Legend: evil connection good connection
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
20
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
21
Architecture & Network Zones – Data Ingestion
Data Proxy BI Data Zone Messaging Zone Data Warehouse System A System A HDFS Connector Spark Streaming Cloud App System System
Legend: encrypted (SSL) not encrypted
S3 Backup
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
22
Architecture & Network Zones – User & Developer Access
PIPE BI Data Zone Deployment Zone BI Application Zone AAP Data Warehouse Audi Office LAN Audi Laptop Data Mining Dashboarding
AAP Remote Desktop Legend: encrypted (SSL) not encrypted
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
23
Hadoop Cluster Sizing Production 2017
* Raw Capacity without replication and FS overhead!
Hadoop
per node Sum
# data nodes 1 12 RAM 512 GB 6 TB Cores 24 288 HDD* 96 TB 9.216 TB
Kafka
per node Sum
# broker nodes 1 4 RAM 32 GB 128 GB Cores 6 24 HDD* 4 TB 16 TB
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
24
Current state
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
25
Organisational Tasks
(Data Domain Modell with clear responsibility in each domain)
with the projects and programs
loss, confidentiality and verifiability
continuity
versions
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
26
AUDI AG Strata Data London 2018 - Audi's journey to an enterprise big data platform
27
Hybrid Approach for the AAP
Public Cloud On Premise / private Cloud Entry Zone Application Zone Data Zone Web GATEWAY Full Client (Tableau, BO, etc.) Web Client (Tableau, BO, etc.) HDP Data Warehouse Messaging Zone Kafka Internet RDP GATEWAY Business User Ingestor 1* Repositories Knox Direct Cloud Connect Swarm VPC Kafka Data Inventory Analytical VPC Ingestor HDP Knox
https://www.audi.com/corporate/de/karriere/einstieg-bei-audi.html https://karriere.audibusinessinnovation.com/