BigData R EBECCA P ARSONS M ARTIN F OWLER CTO - - PowerPoint PPT Presentation

bigdata
SMART_READER_LITE
LIVE PREVIEW

BigData R EBECCA P ARSONS M ARTIN F OWLER CTO - - PowerPoint PPT Presentation

BigData R EBECCA P ARSONS M ARTIN F OWLER CTO http://martinfowler.com rparsons@thoughtworks.com http://thoughtworks.com http://thoughtworks.com @martinfowler Thursday, March 8, 2012 Big Data R EBECCA P ARSONS M ARTIN F OWLER CTO


slide-1
SLIDE 1

MARTIN FOWLER

http://martinfowler.com http://thoughtworks.com @martinfowler

REBECCA PARSONS

rparsons@thoughtworks.com http://thoughtworks.com CTO

BigData

Thursday, March 8, 2012
slide-2
SLIDE 2

MARTIN FOWLER

http://martinfowler.com http://thoughtworks.com @martinfowler

REBECCA PARSONS

rparsons@thoughtworks.com http://thoughtworks.com CTO

BigData

Thursday, March 8, 2012
slide-3
SLIDE 3

MARTIN FOWLER

http://martinfowler.com http://thoughtworks.com @martinfowler

REBECCA PARSONS

rparsons@thoughtworks.com http://thoughtworks.com CTO

The Evolving Panorama of Data

Thursday, March 8, 2012
slide-4
SLIDE 4

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-5
SLIDE 5

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-6
SLIDE 6

Data is:

Thursday, March 8, 2012
slide-7
SLIDE 7

Data is: Growing

Thursday, March 8, 2012
slide-8
SLIDE 8

Data is: Growing

Thursday, March 8, 2012
slide-9
SLIDE 9

Walmart: 1 million transactions per hour Facebook: 40 billion photos

The Economist: Feb 25th 2010

Data is: Growing

Thursday, March 8, 2012
slide-10
SLIDE 10

640K ought to be enough for anybody

Thursday, March 8, 2012
slide-11
SLIDE 11

Lots of T raffic

Thursday, March 8, 2012
slide-12
SLIDE 12 Thursday, March 8, 2012
slide-13
SLIDE 13 Thursday, March 8, 2012
slide-14
SLIDE 14

Data is:

Thursday, March 8, 2012
slide-15
SLIDE 15

Data is: Distributed

Thursday, March 8, 2012
slide-16
SLIDE 16

2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

442 1,990 8,640 40,223 127,942 356,191 616,308 853,698 1,080,872 1,287,537 1,482,824

Monthly Contributors to Wikipedia

souce: wikipedia

Data is: Distributed

Thursday, March 8, 2012
slide-17
SLIDE 17

Data is: Distributed

Thursday, March 8, 2012
slide-18
SLIDE 18
  • Data is: Distributed
Thursday, March 8, 2012
slide-19
SLIDE 19

Data is: Distributed

98% of internet access points in Africa are mobile 30 million networked sensor nodes growing 30% per year

McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity

Thursday, March 8, 2012
slide-20
SLIDE 20

Data is:

Thursday, March 8, 2012
slide-21
SLIDE 21

Data is: Valuable

Thursday, March 8, 2012
slide-22
SLIDE 22

Data is: Valuable

$300 billion / year for US health care 60% increase in retail margins

McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity

Thursday, March 8, 2012
slide-23
SLIDE 23

Data is:

Thursday, March 8, 2012
slide-24
SLIDE 24

Data is: Urgent

Thursday, March 8, 2012
slide-25
SLIDE 25

Data is:

Thursday, March 8, 2012
slide-26
SLIDE 26

Data is: Connected

Thursday, March 8, 2012
slide-27
SLIDE 27

Data Silos

Thursday, March 8, 2012
slide-28
SLIDE 28

Enterprise Data Model

App App App App

Thursday, March 8, 2012
slide-29
SLIDE 29

Enterprise Data Model

App App App App

Thursday, March 8, 2012
slide-30
SLIDE 30

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-31
SLIDE 31

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-32
SLIDE 32 Thursday, March 8, 2012
slide-33
SLIDE 33

SQL SQL

Thursday, March 8, 2012
slide-34
SLIDE 34 Thursday, March 8, 2012
slide-35
SLIDE 35

Bigtable Dynamo

Thursday, March 8, 2012
slide-36
SLIDE 36 Thursday, March 8, 2012
slide-37
SLIDE 37

"NoSQL"

Thursday, March 8, 2012
slide-38
SLIDE 38

Definition of NoSQL

Thursday, March 8, 2012
slide-39
SLIDE 39
  • f NoSQL

Characteristics

Thursday, March 8, 2012
slide-40
SLIDE 40
  • f NoSQL

Characteristics

non-relational

  • pen-source

cluster-friendly 21st Century Web schema-less

Thursday, March 8, 2012
slide-41
SLIDE 41

Document Graph aph Key-value Column- family

Thursday, March 8, 2012
slide-42
SLIDE 42

Aggregate

Thursday, March 8, 2012
slide-43
SLIDE 43

Eric Evans

Aggregate

Thursday, March 8, 2012
slide-44
SLIDE 44

line items: ID: 1001 customer: Ann $48 2 0321293533 $39 1 0321601912 $51 1 0131495054 $96 $39 $51 payment details: Card: Amex CC Number: 12345 expiry: 04/2001

  • rders

customers

  • rder lines

credit cards

Thursday, March 8, 2012
slide-45
SLIDE 45

line items: ID: 1001 customer: Ann $48 2 0321293533 $39 1 0321601912 $51 1 0131495054 $96 $39 $51 payment details: Card: Amex CC Number: 12345 expiry: 04/2001

aggregate

Order 1001

Thursday, March 8, 2012
slide-46
SLIDE 46

line items: ID: 1001 customer: Ann $48 2 0321293533 $39 1 0321601912 $51 1 0131495054 $96 $39 $51 payment details: Card: Amex CC Number: 12345 expiry: 04/2001

aggregate

document key-value column family

Order 1001

Thursday, March 8, 2012
slide-47
SLIDE 47

Order 1001 Order 1004 Order 1026 Order 2843 Order 1001 Order 1004 Order 1026 Order 3754 Order 1001 Order 1004

Thursday, March 8, 2012
slide-48
SLIDE 48

Document Graph aph Key-value Column- family

Thursday, March 8, 2012
slide-49
SLIDE 49

Document Graph aph Key-value Column- family

Thursday, March 8, 2012
slide-50
SLIDE 50

aph

Thursday, March 8, 2012
slide-51
SLIDE 51

aph

Thursday, March 8, 2012
slide-52
SLIDE 52

aph

Thursday, March 8, 2012
slide-53
SLIDE 53

aph

P

  • lyglot

P ersistence

Thursday, March 8, 2012
slide-54
SLIDE 54

Billing Inventory

Thursday, March 8, 2012
slide-55
SLIDE 55

Billing Inventory

Integration Database

Thursday, March 8, 2012
slide-56
SLIDE 56

Billing Inventory

Integration Database

Billing Inventory

Thursday, March 8, 2012
slide-57
SLIDE 57

Billing Inventory

Integration Database

Billing Inventory

Application Database

Thursday, March 8, 2012
slide-58
SLIDE 58

Event Sourcing

Thursday, March 8, 2012
slide-59
SLIDE 59

John Smith : Person :Address city = Portland

Thursday, March 8, 2012
slide-60
SLIDE 60

John Smith : Person :Address city = Portland

change my address

Thursday, March 8, 2012
slide-61
SLIDE 61

John Smith : Person

change my address

:Address city = Boston

Thursday, March 8, 2012
slide-62
SLIDE 62

John Smith : Person :Address city = Portland

change my address

Thursday, March 8, 2012
slide-63
SLIDE 63

John Smith : Person :Address city = Portland

change my address

:Event

change address

Thursday, March 8, 2012
slide-64
SLIDE 64

John Smith : Person

change my address

:Address city = Boston :Event

change address

Thursday, March 8, 2012
slide-65
SLIDE 65

Log Application State

Thursday, March 8, 2012
slide-66
SLIDE 66

Log Application State

Thursday, March 8, 2012
slide-67
SLIDE 67

Log Application State

Thursday, March 8, 2012
slide-68
SLIDE 68
  • n-demand

self-service broad network access resource pooling rapid elasticity measured service

special publication 800-145

Thursday, March 8, 2012
slide-69
SLIDE 69

Data Sources

Thursday, March 8, 2012
slide-70
SLIDE 70

Data Sources

were

Thursday, March 8, 2012
slide-71
SLIDE 71

Data Sources

were will be

text image video connections

Thursday, March 8, 2012
slide-72
SLIDE 72

Analytics

Thursday, March 8, 2012
slide-73
SLIDE 73

Analytics

were

roll-ups trends variance

Thursday, March 8, 2012
slide-74
SLIDE 74

Analytics

will be

pattern recognition data mining chasing connections

were

roll-ups trends variance

Thursday, March 8, 2012
slide-75
SLIDE 75 Thursday, March 8, 2012
slide-76
SLIDE 76 Thursday, March 8, 2012
slide-77
SLIDE 77 Thursday, March 8, 2012
slide-78
SLIDE 78

map map map map map map map map map map map map

per order

Thursday, March 8, 2012
slide-79
SLIDE 79

map map map map reduce map map map map reduce map map map map reduce

per order per month

Thursday, March 8, 2012
slide-80
SLIDE 80

map map map map reduce map map map map reduce map map map map reduce reduce

per order per month

Thursday, March 8, 2012
slide-81
SLIDE 81 Thursday, March 8, 2012
slide-82
SLIDE 82

10,000 ft view (literally)

CodeCity by Richard Wettel

http://www.inf.unisi.ch/phd/wettel/codecity.html

Thursday, March 8, 2012
slide-83
SLIDE 83 Thursday, March 8, 2012
slide-84
SLIDE 84

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-85
SLIDE 85

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-86
SLIDE 86

Data Scientist

Thursday, March 8, 2012
slide-87
SLIDE 87

Data

Thursday, March 8, 2012
slide-88
SLIDE 88

Data Journalist

Thursday, March 8, 2012
slide-89
SLIDE 89 Thursday, March 8, 2012
slide-90
SLIDE 90 Thursday, March 8, 2012
slide-91
SLIDE 91 Thursday, March 8, 2012
slide-92
SLIDE 92

Data Warehousing

Thursday, March 8, 2012
slide-93
SLIDE 93

Data Warehousing

Thursday, March 8, 2012
slide-94
SLIDE 94 Thursday, March 8, 2012
slide-95
SLIDE 95 Thursday, March 8, 2012
slide-96
SLIDE 96

http://ureport.ug/

Thursday, March 8, 2012
slide-97
SLIDE 97

http://ushahidi.com/

Thursday, March 8, 2012
slide-98
SLIDE 98

http://libyacrisismap.net/

Thursday, March 8, 2012
slide-99
SLIDE 99

http://opendata.go.ke/

Thursday, March 8, 2012
slide-100
SLIDE 100

http://datawithoutborders.cc/

Thursday, March 8, 2012
slide-101
SLIDE 101

100 200 300 400 500

Supply Demand

Deep Analytical Talent in 2018

McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity

Thursday, March 8, 2012
slide-102
SLIDE 102

100 200 300 400 500

Supply Demand

Deep Analytical Talent in 2018

50% of supply

McKinsey Global Institute: Big data: The next frontier for innovation, competition, and productivity

Thursday, March 8, 2012
slide-103
SLIDE 103 Thursday, March 8, 2012
slide-104
SLIDE 104

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-105
SLIDE 105

Changing Nature of Data Response How we use data now

Thursday, March 8, 2012
slide-106
SLIDE 106

What about us?

Thursday, March 8, 2012
slide-107
SLIDE 107 Thursday, March 8, 2012
slide-108
SLIDE 108 Thursday, March 8, 2012
slide-109
SLIDE 109 Thursday, March 8, 2012
slide-110
SLIDE 110 Thursday, March 8, 2012
slide-111
SLIDE 111

MARTIN FOWLER

http://martinfowler.com http://thoughtworks.com @martinfowler

REBECCA PARSONS

rparsons@thoughtworks.com http://thoughtworks.com CTO

clip art from http://openclipart.org

Thursday, March 8, 2012