Evolution of a modern cloud-based data lake Viacheslav Inozemtsev - - PowerPoint PPT Presentation

evolution of a modern cloud based data lake
SMART_READER_LITE
LIVE PREVIEW

Evolution of a modern cloud-based data lake Viacheslav Inozemtsev - - PowerPoint PPT Presentation

Evolution of a modern cloud-based data lake Viacheslav Inozemtsev viacheslav.inozemtsev@zalando.de OReilly Software Architecture Conference Berlin, 07.11.2019 OUTLINE Introduction of Zalando Why data lake? What is a data lake? Evolution


slide-1
SLIDE 1

Evolution of a modern cloud-based data lake

Viacheslav Inozemtsev

O’Reilly Software Architecture Conference Berlin, 07.11.2019 viacheslav.inozemtsev@zalando.de

slide-2
SLIDE 2

OUTLINE

Introduction of Zalando Why data lake? What is a data lake? Evolution of the data lake Future

slide-3
SLIDE 3

3

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Who am I?

Viacheslav Inozemtsev

  • 2010 - Specialist in Applied Mathematics and Computer Science at the

Tomsk State University, Russia

  • 2014 - Master of Computer Science at the University of Bonn, Germany
  • Total of 8 years of experience as Data and Software Engineer
  • Last 3 years at Zalando
slide-4
SLIDE 4

4

Introduction of Zalando

slide-5
SLIDE 5

5

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Introduction of Zalando

  • 29M active customers
slide-6
SLIDE 6

6

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Introduction of Zalando

  • 29M active customers
  • 14000 employees
slide-7
SLIDE 7

7

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Introduction of Zalando

  • 29M active customers
  • 14000 employees
  • 2000 engineers and data scientists
slide-8
SLIDE 8

8

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Introduction of Zalando

  • 29M active customers
  • 14000 employees
  • 2000 engineers and data scientists
  • 200 engineering teams
slide-9
SLIDE 9

9

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Introduction of Zalando

  • 29M active customers
  • 14000 employees
  • 2000 engineers and data scientists
  • 200 engineering teams
  • Variety of major data systems

○ Messaging Bus ○ BI Data Warehouse ○ Google Analytics platform ○ Custom datasets

slide-10
SLIDE 10

10

Why data lake?

slide-11
SLIDE 11

11

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-12
SLIDE 12

12

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-13
SLIDE 13

13

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-14
SLIDE 14

14

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-15
SLIDE 15

15

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-16
SLIDE 16

16

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-17
SLIDE 17

17

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-18
SLIDE 18

18

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-19
SLIDE 19

19

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

slide-20
SLIDE 20

20

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

  • To enable sharing of the data so, that it is:
slide-21
SLIDE 21

21

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

  • To enable sharing of the data so, that it is:

○ easy to publish and consume

slide-22
SLIDE 22

22

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

  • To enable sharing of the data so, that it is:

○ easy to publish and consume ○ compliant

slide-23
SLIDE 23

23

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

  • To enable sharing of the data so, that it is:

○ easy to publish and consume ○ compliant ○ secure

slide-24
SLIDE 24

24

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Why data lake?

  • To enable sharing of the data so, that it is:

○ easy to publish and consume ○ compliant ○ secure ○ cost-efficient

slide-25
SLIDE 25

25

What is a data lake?

slide-26
SLIDE 26

26

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is a data lake?

  • Central exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking

slide-27
SLIDE 27

27

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is a data lake?

  • Central exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking

  • Central big data system

○ provides as much different data as possible ○ has to be fast and easy to use

slide-28
SLIDE 28

28

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is a data lake?

  • Central exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!
  • Central big data system

○ provides as much different data as possible ○ has to be fast and easy to use

slide-29
SLIDE 29

29

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is a data lake?

  • Central exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!
  • Central big data system

○ provides as much different data as possible ○ has to be fast and easy to use ○ scale vs performance!

slide-30
SLIDE 30

30

Evolution of the data lake

slide-31
SLIDE 31

31

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-32
SLIDE 32

32

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-33
SLIDE 33

33

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-34
SLIDE 34

34

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-35
SLIDE 35

35

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-36
SLIDE 36

36

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-37
SLIDE 37

37

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-38
SLIDE 38

38

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 1: Inception

slide-39
SLIDE 39

39

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-40
SLIDE 40

40

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-41
SLIDE 41

41

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-42
SLIDE 42

42

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-43
SLIDE 43

43

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-44
SLIDE 44

44

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-45
SLIDE 45

45

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-46
SLIDE 46

46

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-47
SLIDE 47

47

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-48
SLIDE 48

48

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 2: Optimization

slide-49
SLIDE 49

49

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - Stage 3: Revolution

https://martinfowler.com/articles/data-monolith-to-mesh.html

slide-50
SLIDE 50

50

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - centralized

slide-51
SLIDE 51

51

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Evolution of the data lake - federated

slide-52
SLIDE 52

52

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is data lake?

  • Central exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!
  • Central big data system

○ provides as much different data as possible ○ has to be fast and easy to use ○ scale vs performance!

slide-53
SLIDE 53

53

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is data lake?

  • Federated exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!
  • Federated big data system

○ provides as much different data as possible ○ has to be fast and easy to use ○ scale vs performance!

slide-54
SLIDE 54

54

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is data lake?

  • Federated exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!

○ resolved by centralized governance

  • Federated big data system

○ provides as much different data as possible ○ has to be fast and easy to use ○ scale vs performance!

slide-55
SLIDE 55

55

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

What is data lake?

  • Federated exchange system

○ connects producers and consumers of the data ○ defends data from malicious misuse or accidental leaking ○

  • pen vs hide!

○ resolved by centralized governance

  • Federated big data system

○ provides as much different data as possible ○ has to be fast and easy to use ○ scale vs performance! ○ resolved by decentralized ownership

slide-56
SLIDE 56

56

Future

slide-57
SLIDE 57

57

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Future

  • Make data lake fully federated
slide-58
SLIDE 58

58

Please write the title in all capital letters Put images in the grey dotted box "unsupported placeholder"

Future

  • Make data lake fully federated
  • Abstract physical storage into logical layer of datasets
slide-59
SLIDE 59

59

Evolution of a modern cloud-based data lake

Viacheslav Inozemtsev O’Reilly Software Architecture Conference Berlin, 07.11.2019

viacheslav.inozemtsev@zalando.de

www.zalando.com jobs.zalando.com/tech