 
              Evolution of a modern cloud-based data lake Viacheslav Inozemtsev viacheslav.inozemtsev@zalando.de O’Reilly Software Architecture Conference Berlin, 07.11.2019
OUTLINE Introduction of Zalando Why data lake? What is a data lake? Evolution of the data lake Future
Put images in the grey Who am I? dotted box "unsupported placeholder" Please write the title in all capital letters Viacheslav Inozemtsev ● 2010 - Specialist in Applied Mathematics and Computer Science at the Tomsk State University, Russia 2014 - Master of Computer Science at the University of Bonn, Germany ● ● Total of 8 years of experience as Data and Software Engineer ● Last 3 years at Zalando 3
Introduction of Zalando 4
Put images in the grey Introduction of Zalando dotted box "unsupported placeholder" Please write the title in all capital letters 29M active customers ● 5
Put images in the grey Introduction of Zalando dotted box "unsupported placeholder" Please write the title in all capital letters 29M active customers ● 14000 employees ● 6
Put images in the grey Introduction of Zalando dotted box "unsupported placeholder" Please write the title in all capital letters 29M active customers ● 14000 employees ● ● 2000 engineers and data scientists 7
Put images in the grey Introduction of Zalando dotted box "unsupported placeholder" Please write the title in all capital letters 29M active customers ● 14000 employees ● ● 2000 engineers and data scientists ● 200 engineering teams 8
Put images in the grey Introduction of Zalando dotted box "unsupported placeholder" Please write the title in all capital letters 29M active customers ● 14000 employees ● ● 2000 engineers and data scientists ● 200 engineering teams Variety of major data systems ● Messaging Bus ○ ○ BI Data Warehouse ○ Google Analytics platform Custom datasets ○ 9
Why data lake? 10
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 11
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 12
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 13
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 14
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 15
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 16
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 17
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 18
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters 19
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters To enable sharing of the data so, that it is: ● 20
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters To enable sharing of the data so, that it is: ● easy to publish and consume ○ 21
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters To enable sharing of the data so, that it is: ● easy to publish and consume ○ ○ compliant 22
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters To enable sharing of the data so, that it is: ● easy to publish and consume ○ ○ compliant ○ secure 23
Put images in the grey Why data lake? dotted box "unsupported placeholder" Please write the title in all capital letters To enable sharing of the data so, that it is: ● easy to publish and consume ○ ○ compliant ○ secure cost-efficient ○ 24
What is a data lake? 25
Put images in the grey What is a data lake? dotted box "unsupported placeholder" Please write the title in all capital letters Central exchange system ● connects producers and consumers of the data ○ ○ defends data from malicious misuse or accidental leaking 26
Put images in the grey What is a data lake? dotted box "unsupported placeholder" Please write the title in all capital letters Central exchange system ● connects producers and consumers of the data ○ ○ defends data from malicious misuse or accidental leaking Central big data system ● ○ provides as much different data as possible ○ has to be fast and easy to use 27
Put images in the grey What is a data lake? dotted box "unsupported placeholder" Please write the title in all capital letters Central exchange system ● connects producers and consumers of the data ○ ○ defends data from malicious misuse or accidental leaking ○ open vs hide! Central big data system ● ○ provides as much different data as possible ○ has to be fast and easy to use 28
Put images in the grey What is a data lake? dotted box "unsupported placeholder" Please write the title in all capital letters Central exchange system ● connects producers and consumers of the data ○ ○ defends data from malicious misuse or accidental leaking ○ open vs hide! Central big data system ● ○ provides as much different data as possible ○ has to be fast and easy to use scale vs performance! ○ 29
Evolution of the data lake 30
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 31
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 32
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 33
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 34
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 35
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 36
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 37
Put images in the grey Evolution of the data lake - Stage 1: Inception dotted box "unsupported placeholder" Please write the title in all capital letters 38
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 39
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 40
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 41
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 42
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 43
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 44
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 45
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 46
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 47
Put images in the grey Evolution of the data lake - Stage 2: Optimization dotted box "unsupported placeholder" Please write the title in all capital letters 48
Recommend
More recommend