Welcome It used to be easy they all looked pretty much alike NoSQL - - PowerPoint PPT Presentation

welcome it used to be easy they all looked pretty much
SMART_READER_LITE
LIVE PREVIEW

Welcome It used to be easy they all looked pretty much alike NoSQL - - PowerPoint PPT Presentation

Welcome It used to be easy they all looked pretty much alike NoSQL BigData MapReduce Graph Document Shared Column Eventual BigTable CAP Nothing Oriented Consistency ACID BASE Mongo Coudera Hadoop Voldemort Cassandra Dynamo


slide-1
SLIDE 1

Welcome

slide-2
SLIDE 2

It used to be easy…

slide-3
SLIDE 3

they all looked pretty much alike

slide-4
SLIDE 4

NoSQL BigData MapReduce Graph Document BigTable Shared Nothing Column Oriented CAP Eventual Consistency ACID BASE Mongo Coudera Hadoop Voldemort Cassandra Dynamo Marklogic Redis Velocity Hbase Hypertable Riak BDB

slide-5
SLIDE 5

Now it’s downright

c0nfuZ1nG!

slide-6
SLIDE 6

What Happened?

slide-7
SLIDE 7

we changed scale

slide-8
SLIDE 8

we changed tack

slide-9
SLIDE 9

so w whe here d does

big d data me

meet

big d database?

slide-10
SLIDE 10

The world’s largest NoSQL database?

slide-11
SLIDE 11

The Internet

slide-12
SLIDE 12

So how Big is Big?

Words (0.6) Web Pages (40) Everything (5000)

Sizes in Petabytes 0.01%

slide-13
SLIDE 13

Many more Big Sources

mobile sensors Logs video audio Social data weather

slide-14
SLIDE 14

But it is pretty useful

Marketing Fraud detection Tax Evasion Intelligence Advertising Scientific research

slide-15
SLIDE 15

Gartner

80% of business is conducted on unstructured information

slide-16
SLIDE 16

Big Data is now a new class of economic asset*

*World economic forum 2012

slide-17
SLIDE 17

Yet 80% Enterprise Databases < 1TB

slide-18
SLIDE 18

Along came the Big Data Movement

slide-19
SLIDE 19

MapReduce (2004)

  • Large, distributed,
  • rdered map
  • Fault-tolerant file

system

  • Petabyte scaling
slide-20
SLIDE 20

Disruptive

Simple Pragmatic Solved an insoluble problem Unencumbered by tradition (good & bad) Hacker rather than Enterprise culture

slide-21
SLIDE 21

A Different Focus

Tradition n

  • Global consistency
  • Schema driven
  • Reliable Network
  • Highly Structured

The he ne new w wave

  • Local consistency
  • Schemaless / Last
  • Unreliable Network
  • Semi-structured/

Unstructured

slide-22
SLIDE 22

Novel?

Possibly better put as: A timely and elegant combination of existing ideas, placed together to solve a previously unsolved problem.

slide-23
SLIDE 23

Backlash (2009)

Not novel (dates back to the 80’s) Physical level not the logical level (messy?) Incompatible with tooling Lack of integrity (referential) & ACID MR is brute force ignoring indexing, scew

slide-24
SLIDE 24

All points are reasonable

slide-25
SLIDE 25

And they proved it too!

“A comparison of Approaches to Large Scale Data Analysis” – Sigmod 2009

  • Vertica vs. DBMSX vs.

Hadoop

  • Vertica up to 7 x faster than

Hadoop over benchmarks Databases faster than Hadoop

slide-26
SLIDE 26

But possibly missed the point?

slide-27
SLIDE 27

Was MapReduce was not supposed to be a Data Warehousing tool?

slide-28
SLIDE 28

If you need more, layer it on top

For example Tensing & Magastore @ Google

slide-29
SLIDE 29

So MapReduce represents a bottom-up approach to accessing very large data sets that is unencumbered by the past.

slide-30
SLIDE 30

…and the Database Field knew it had Problems

slide-31
SLIDE 31

We Lose: Joe Hellerstein (Berkeley) 2001

“Databases are commoditised and cornered to slow-moving, evolving, structure intensive, applications that require schema evolution.“ … “The internet companies are lost and we will remain in the doldrums of the enterprise space.” … “As databases are black boxes which require a lot of coaxing to get maximum performance”

slide-32
SLIDE 32

Yet they do some very cool stuff

Statistically based optimisers, Compression, indexing structures, distributed optimisers, their own declarative language

slide-33
SLIDE 33
slide-34
SLIDE 34

They are an Awesome Tool

slide-35
SLIDE 35

They Don’t talk our Language

slide-36
SLIDE 36

They Default to Constraint

slide-37
SLIDE 37

So NoSurprise with NoSQL then

Simpler Contract Shared nothing No joins / ACID No impedance mismatch No slow schema evolution Simple code paths Just works

slide-38
SLIDE 38

The NoSQL Approach Simple, flexible storage

  • ver a diverse range of

data structures that will scale almost indefinitely.

slide-39
SLIDE 39

Different Flavours

slide-40
SLIDE 40

Two Ways In: Key Based Access

Client

slide-41
SLIDE 41

Two Ways In: Broadcast to Every Node

Client

slide-42
SLIDE 42

So..

A simple bottom up approach to data storage that scales almost indefinitely.

  • No relations
  • No joins
  • No SQL
  • No Transactions
  • No sluggish schema evolution
slide-43
SLIDE 43

The Relational Database

slide-44
SLIDE 44

The ‘Relational Camp’ had been busy too

Realisation that the traditional architecture was insufficient for various modern workloads

slide-45
SLIDE 45

End of an Era Paper - 2007

“Because RDBMSs can be beaten by more than an order of magnitude on the standard OLTP benchmark, then there is no market where they are competitive. As such, they should be considered as legacy technology more than a quarter of a century in age, for which a complete redesign and re-architecting is the appropriate next step.” – Michael Stonebraker

slide-46
SLIDE 46

No Longer a One-Size-Fits-All

slide-47
SLIDE 47

Architecting for Different Non- Functionals

In-Memory Shared Nothing / Disk Fast Network/ SSD Column Orientation

slide-48
SLIDE 48

In-Memory

slide-49
SLIDE 49

Distributed In-Memory

slide-50
SLIDE 50

Shared Disk Architecture All machines see all data

Cache sits above whole dataset Single node can handle any query

slide-51
SLIDE 51

Shared Nothing Architecture

  • Autonomy over a shard
  • Divide and conqueror

(non-key hit every node)

Cache

  • ver just

the shard Queries hit every node

slide-52
SLIDE 52

Vendors polarise over this issue

Sha hared N Nothi hing ng

  • TerraData (Aster Data)
  • Netezza (IBM)
  • ParAccel
  • Vertica
  • Greenplumb

Sha hared E Everyt ythi hing ng

  • Oracle RAC/Exadata
  • IBM purescale
  • Sybase IQ
  • Microsoft SQL Server

(there is some blurring)

slide-53
SLIDE 53

Column Oriented Storage

Columns laid contiguously 2-10x compression typical Indexing becomes less important. Pinpoint I/O slow (tuple construction) Bulk read/write faster Compression >> row-based alternatives

slide-54
SLIDE 54

Solid State Drives

1ms 1µs

SSD Drive HDD Seek Time

  • Traditional databases are designed for

sequential access over magnetic drives, not random access over SSD.

  • Weakens the columnar/row argument
slide-55
SLIDE 55

Faster Networking

1ms 1µs 1ns

Gigabit Ethernet 10Gigabit Ethernet RDMA RAM SSD Drive HDD Seek Time

slide-56
SLIDE 56

The best technologies of the moment are leveraging many of these factors

slide-57
SLIDE 57

There is a new and impressive breed

  • Products < 5 years old
  • Shared nothing with SSD’s over shards
  • Large address spaces (256GB+)
  • No indexes (column oriented)
  • No referential integrity
  • Surprisingly quick for big queries when

compared with incumbent technologies.

slide-58
SLIDE 58

TPC-H Benchmarks

Several new contenders with good scores:

– Exasol – ParAccel – Vectorwise

slide-59
SLIDE 59

TPC-H Benchmarks

  • Exasol has 100GB -> 10TB benchmarks
  • Up to 20x faster than nearest rivals

(But take benchmarks with a pinch of salt)

slide-60
SLIDE 60

Relational Approach Solid data from every angle, bounded in terms of scale, but with a boundary that is rapidly expanding.

slide-61
SLIDE 61

Comparisons

slide-62
SLIDE 62

At the extreme MapReduce has it

  • 100

10 1 1000 10,000

TB

slide-63
SLIDE 63

But there is massive overlap

  • 100

10 1 1000 10,000

TB

slide-64
SLIDE 64

It’s not just data volume/velocity

slide-65
SLIDE 65

The Dimensions of Data

  • Volume (pure physical size)
  • Velocity (rate of change)
  • Variety (number of different types of data,

formats and sources)

  • Static & Dynamic Complexity
slide-66
SLIDE 66

Consider the characteristics of data to be integrated, and how that equates to cost

slide-67
SLIDE 67

Ability to model data is much more of a gating factor than raw size, particularly when considering new forms of data

Dave Campbell (Microsoft – VLDB Keynote)

slide-68
SLIDE 68

It becomes about your data and you want to do with it

Do you need to more than just SQL to process your data? Does your data change rapidly? Are you ok with some degree of eventual consistency? Do isolation and consistency matter Do you need to answer questions absolutely or within a tolerance? Do you want to keep your data in its natural form? Do you prefer to work bottom up or top down? How risk averse are you? Are you willing to pay big vendor prices?

slide-69
SLIDE 69

Composite Offerings

Hadoop has Pig & Hbase Mongo offers Query Language, atomaticity & MR Oracle have BigData appliance with Cloudera IBM have a Map Reduce offering Sybase (now part of SAP) provides MR natively EMC acquired Greenplum which has MR support

slide-70
SLIDE 70

Complementary Solutions

slide-71
SLIDE 71

Relational world has focused on keeping data consistent and well structured so it can be sliced and diced at will

slide-72
SLIDE 72

Big data technologies focus on executing code next to data, where that data is held in a more natural form.

slide-73
SLIDE 73

So

  • NoSQL has disrupted the database market,

questioning the need for constraint and highlighting the power of simple solutions.

  • DB startups are providing some surprisingly fast

solutions that drop some traditional database tenets and cleverly leverage new hardware advances.

  • Your problem (and budget) is likely a better guide

than the size of the data

  • The market is converging on both sides towards a

middle ground and integrated suites of complementary tools.

slide-74
SLIDE 74

The right tool for the job

“Attempting to force one technology or tool to satisfy a particular need for which another tool is more effective and efficient is like attempting to drive a screw into a wall with a hammer when a screwdriver is at hand: the screw may eventually enter the wall but at what cost?” E.F. Codd, 1993

slide-75
SLIDE 75

Thanks

http://www.benstopford.com