Putting Big Data in its Place Mike Amundsen, API Academy at CA - - PowerPoint PPT Presentation

putting big data in its place
SMART_READER_LITE
LIVE PREVIEW

Putting Big Data in its Place Mike Amundsen, API Academy at CA - - PowerPoint PPT Presentation

Putting Big Data in its Place Mike Amundsen, API Academy at CA @mamund HH Camp Strasbourg, March 2015 Introduction Big Data Challenges Those who cannot remember the past are condemned to repeat it . George Santayana, 1905 Those


slide-1
SLIDE 1

Putting Big Data in its Place

Mike Amundsen, API Academy at CA @mamund

HH Camp – Strasbourg, March 2015

slide-2
SLIDE 2

Introduction

slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5

Big Data Challenges

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8

“Those who cannot remember the past are condemned to repeat it.”

George Santayana, 1905

slide-9
SLIDE 9

“Those who ignore the mistakes of the future are bound to make them.”

Joseph D. Miller, 2006

slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13

Data and Storage

slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

It's called a database

slide-19
SLIDE 19

It's called a database not an informationbase

slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33

1 Gigabyte per day

slide-34
SLIDE 34
slide-35
SLIDE 35

365 truck loads per person per year

slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38

1 Yottabyte of Storage

slide-39
SLIDE 39
slide-40
SLIDE 40

100 Terabytes

slide-41
SLIDE 41

100 Terabytes 100,000 Gigabytes

slide-42
SLIDE 42

100 Terabytes 100,000 Gigabytes 250+ years of storage per person

slide-43
SLIDE 43
slide-44
SLIDE 44

NO

slide-45
SLIDE 45
slide-46
SLIDE 46

Pruning data into long-term memory

slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49

“Forgetting makes our brains more efficient.”

slide-50
SLIDE 50
slide-51
SLIDE 51

Learning to choose is hard.

slide-52
SLIDE 52

Learning to choose is hard. Learning to choose well is harder.

slide-53
SLIDE 53

“Learning to choose well in a world of unlimited possibilities is, perhaps, too hard.”

Barry Schwartz, 2004

slide-54
SLIDE 54
slide-55
SLIDE 55
slide-56
SLIDE 56
slide-57
SLIDE 57
slide-58
SLIDE 58
slide-59
SLIDE 59

Data and Storage Challenges

  • Support Pruning Strategies
  • Implement Data Lakes
  • Reduce Data Overload
slide-60
SLIDE 60

Modeling Information

slide-61
SLIDE 61
slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64
slide-65
SLIDE 65

Models allow us to add meaning to data

slide-66
SLIDE 66
slide-67
SLIDE 67
slide-68
SLIDE 68
slide-69
SLIDE 69
slide-70
SLIDE 70

data + model = information

slide-71
SLIDE 71
slide-72
SLIDE 72
slide-73
SLIDE 73
slide-74
SLIDE 74
slide-75
SLIDE 75
slide-76
SLIDE 76

We can improve

slide-77
SLIDE 77

We can improve the usability of messages

slide-78
SLIDE 78

There are three ways to do that...

slide-79
SLIDE 79
  • 1. Format
slide-80
SLIDE 80
slide-81
SLIDE 81

application/json adds very little affordance

slide-82
SLIDE 82
slide-83
SLIDE 83

collection+json adds quite a bit

  • f affordance
slide-84
SLIDE 84
slide-85
SLIDE 85
  • 2. Protocol
slide-86
SLIDE 86
slide-87
SLIDE 87
slide-88
SLIDE 88
slide-89
SLIDE 89
slide-90
SLIDE 90
slide-91
SLIDE 91

So far, we're still in "Shannon-land"

slide-92
SLIDE 92
  • 3. Semantics
slide-93
SLIDE 93

On the web, the "internal model" is represented by Semantics

slide-94
SLIDE 94
slide-95
SLIDE 95
slide-96
SLIDE 96
slide-97
SLIDE 97
slide-98
SLIDE 98
slide-99
SLIDE 99
slide-100
SLIDE 100

Modeling Information

  • Represent Data in Rich Formats
  • Support Multiple Protocols
  • Separate Semantics from Format & Protocol
slide-101
SLIDE 101

Ravages of Time

slide-102
SLIDE 102

“Everything changes and nothing stands still.”

Heraclitus, 402 (quoted)

slide-103
SLIDE 103
slide-104
SLIDE 104
slide-105
SLIDE 105
slide-106
SLIDE 106
slide-107
SLIDE 107
slide-108
SLIDE 108
slide-109
SLIDE 109

Storage Format

slide-110
SLIDE 110

Storage Format is not

slide-111
SLIDE 111

Storage Format is not Transfer Format

slide-112
SLIDE 112

CSV

slide-113
SLIDE 113

XML

slide-114
SLIDE 114

JSON

slide-115
SLIDE 115

RDF (n3)

slide-116
SLIDE 116

Select a Storage Format

slide-117
SLIDE 117

Select a Storage Format

  • CSV has no strong schema modeling
  • XML and JSON both have schema tooling
  • RDF-family offers built-in semantics
slide-118
SLIDE 118

Select a Storage Format

  • CSV has no strong schema modeling
  • XML and JSON both have schema tooling
  • RDF-family offers built-in semantics
slide-119
SLIDE 119

Storage Media

slide-120
SLIDE 120

Storage Media is

slide-121
SLIDE 121

Storage Media is Volatile

slide-122
SLIDE 122
slide-123
SLIDE 123
slide-124
SLIDE 124
slide-125
SLIDE 125

Million-Year Data Storage via DNA

ETH Zurich, 2015

slide-126
SLIDE 126

Million-Year Data Storage via DNA

ETH Zurich, 2015

slide-127
SLIDE 127
slide-128
SLIDE 128
slide-129
SLIDE 129
slide-130
SLIDE 130
slide-131
SLIDE 131

Ravages of Time

  • Prepare to hold the data for 100+ years
  • Be ready to migrate the data to new media
  • Archive a functional app with the data
slide-132
SLIDE 132

And so…

slide-133
SLIDE 133
slide-134
SLIDE 134

“If we don’t want our digital lives to fade away, we need to make sure that the

  • bjects we create

today can still be rendered far into the future.”

Vint Cerf, 2015

slide-135
SLIDE 135

“Those who ignore the mistakes of the future are bound to make them.”

Joseph D. Miller, 2006

slide-136
SLIDE 136
slide-137
SLIDE 137

Putting Big Data in its Place

Mike Amundsen, API Academy at CA @mamund

HH Camp – Strasbourg, March 2015

http://g.mamund.com/2015-hhcamp