T i me S e r i e s D a t a b a s e s a n d S t - - PowerPoint PPT Presentation

t i me s e r i e s d a t a b a s e s a n d s t r e a mi n
SMART_READER_LITE
LIVE PREVIEW

T i me S e r i e s D a t a b a s e s a n d S t - - PowerPoint PPT Presentation

T i me S e r i e s D a t a b a s e s a n d S t r e a mi n g a l g o r i t h ms I n t r o d u c t i o n a n d mo t i v a t i o n f o r T i me S e r i e s F i n a n c i a


slide-1
SLIDE 1

T i me S e r i e s D a t a b a s e s a n d S t r e a mi n g a l g

  • r

i t h ms

slide-2
SLIDE 2

I n t r

  • d

u c t i

  • n

a n d mo t i v a t i

  • n

f

  • r

T i me S e r i e s

slide-3
SLIDE 3

F i n a n c i a l

slide-4
SLIDE 4

I n t e r n e t

  • f

t h i n g s

slide-5
SLIDE 5

D

  • mo

t i c s

slide-6
SLIDE 6

P r e d i c t i v e Ma i n t e n a n c e

slide-7
SLIDE 7

E n v i r

  • n

me n t a l t r a c k i n g

slide-8
SLIDE 8

A t i me s e r i e s i s a s e q u e n c e

  • f

d a t a p

  • i

n t s , t y p i c a l l y c

  • n

s i s t i n g

  • f

s u c c e s s i v e me a s u r e me n t s ma d e

  • v

e r a t i me i n t e r v a l .

slide-9
SLIDE 9

Wh y T i me S e r i e s D a t a b a s e s ?

  • H

i g h V

  • l

u me

  • f

D a t a

  • L

a r g e Q u a n t i t i e s

  • f

I mmu t a b l e D a t a

  • I

s P r i ma r i l y S

  • r

t e d T e mp

  • r

a l l y

  • N

e e d s t

  • B

e R

  • l

l e d U p T

  • G

a i n Ma j

  • r

i t y

  • f

I n s i g h t s

  • N

e e d s t

  • B

e N

  • r

ma l i z e d A c r

  • s

s Mu l t i p l e T i me Z

  • n

e s

https://blog.tempoiq.com/blog/2013/01/25/characteristics-of-a-time-series-dataset-time-series-database-overview-part-2

slide-10
SLIDE 10

P r

  • b

l e ms u s i n g R e l a t i

  • n

a l D B s

1 . I t ’ s D i ffic u l t t

  • C

h a n g e t h e S a mp l e R a t e 2 . I t ’ s D i ffic u l t T

  • U

s e S Q L Q u e r i e s F

  • r

A n a l y s i s 3 . T i me Z

  • n

e s A d d E x t r a C

  • mp

l e x i t y T

  • Y
  • u

r D a t a A n a l y s i s

https://blog.tempoiq.com/blog/2013/04/22/optimizing-relational-databases-for-time-series-data-time-series-database-

  • verview-part-3
slide-11
SLIDE 11

A d v a n t a g e s

  • f

N

  • S

Q L

1 . G r e a t e r s i mp l i c i t y i n t h e D B e n g i n e 2 . A b i l i t y t

  • h

a n d l e s e mi

  • s

t r u c t u r e d a n d d e n

  • r

ma l i z e d d a t a 3 . P

  • t

e n t i a l l y mu c h h i g h e r s c a l a b i l i t y

slide-12
SLIDE 12

D i s a d v a n t a g e s

  • f

N

  • S

Q L

1 . H i g h e r c

  • mp

l e x i t y i n t h e a p p l i c a t i

  • n

2 . L

  • s

s

  • f

a b s t r a c t i

  • n

p r

  • v

i d e d b y t h e q u e r y

  • p

t i mi z e r

slide-13
SLIDE 13

B a s i c O p e r a t i

  • n

s

  • n

T i me S e r i e s D a t a

slide-14
SLIDE 14

Wh a t d

  • w

e n e e d t

  • d
  • w

i t h T S

  • A

c q u i r e

– Me

a s u r e me n t , t r a s n mi s s i

  • n

, r e c e p t i

  • n
  • S

t

  • r

e

  • R

e t r i e v e

  • A

n a l i z e a n d v i s u a l i z e

slide-15
SLIDE 15

R e s c a l i n g

  • T

r a n s f

  • r

m t h e r a n g e

  • f

v a r i a t i

  • n

t

  • a

g i v e n s c a l e

  • U

s e f u l f

  • r

a l g

  • r

i t h ms s e n s i t i v e t

  • t

h e ma g n i t u d e

  • f

t h e s i g n a l

slide-16
SLIDE 16

R e s a mp l i n g

  • D

i ff e r e n c e s i n s a mp l i n g r e s

  • l

u t i

  • n
  • B

r i n g b

  • t

h s e r i e s t

  • t

h e s a me s a mp l e f r e q u e n c y

  • R

e q u i r e s a f u n c t i

  • n

f

  • r

c

  • l

l a p s i n g p

  • i

n t s t

  • g

e t h e r

slide-17
SLIDE 17

S h i f t i n g

  • A

l i g n s e r i e s w e k n

  • w

a r e mi s a l i g n e d

  • B

a d r e f e r e n c e t i me , d r i f t i n g c l

  • c

k , . . .

slide-18
SLIDE 18

S l i c i n g

  • R

e t r i e v e a t i me s e r i e s b a s e d

  • n

a g i v e n t i me r a n g e

slide-19
SLIDE 19

D y n a mi c T i me Wa r p i n g

  • U

s e d f

  • r

me a s u r i n g s i mi l a r i t y b e t w e e n s e r i e s t h a t v a r y i n t i me

  • r

s p e e d

  • D

y n a mi c t i me w a r p i n g i s a s e q u e n c e a l i g n me n t t e c h n i q u e u s e d i n s p e e c h r e c

  • g

n i t i

  • n
  • I

t i s a n a l g

  • r

i t h m t h a t h a s O ( n ² ) c

  • mp

l e x i t y

slide-20
SLIDE 20

S u b s e q u e n c e Ma t c h i n g

  • A

s e q u e n c e q u e r y i s ma t c h e d a g a i n s t a l

  • n

g e r T S

  • A

l s

  • r

e l a t e d w i t h C h u n k i n g w h e r e w e l

  • k

f

  • r

r e p e a t i n g p a t t e r n s

slide-21
SLIDE 21

S t a t i s t i c a l me a s u r e s

  • Me

a n

  • Me

d i a n

  • S

t a n d a r d D e v i a t i

  • n
  • V

a r i a n c e

  • Q

u a n t i l e s

slide-22
SLIDE 22

S t a t i s t i c a l fi t t i n g

  • I

n t e r p

  • l

a t i

  • n
  • L

i n e a r mo d e l s

  • N
  • n

l i n e a r mo d e l s

slide-23
SLIDE 23

D a t a S t

  • r

a g e f

  • r

T i me S e r i e s D a t a

slide-24
SLIDE 24

L

  • g

F i l e s

  • S

i mp l e s t s

  • l

u t i

  • n
  • R

i g h t s

  • l

u t i

  • n

w h e n l

  • w

n u mb e r

  • f

t i me s e r i e s

  • r

d a t a fi t s i n me mo r y

1950 1 0.92000E+00 1950 2 0.40000E+00 1950 3 -0.36000E+00 1950 4 0.73000E+00 1950 5 -0.59000E+00 1950 6 -0.60000E-01 1950 7 -0.12600E+01 1950 8 -0.50000E-01 1950 9 0.25000E+00 1950 10 0.85000E+00 1950 11 -0.12600E+01 1950 12 -0.10200E+01 1951 1 0.80000E-01 1951 2 0.70000E+00 1951 3 -0.10200E+01 1951 4 -0.22000E+00 1951 5 -0.59000E+00 1951 6 -0.16400E+01 1951 7 0.13700E+01 1951 8 -0.22000E+00 1951 9 -0.13600E+01 1951 10 0.18700E+01

slide-25
SLIDE 25

A d v a n c e d L

  • g

F i l e s

  • S

a me c

  • n

c e p t a b

  • u

t s t

  • r

i n g T S i n fi l e s

  • U

s e a s ma r t b i n a r y e n c

  • d

i n g f

  • r

ma t

  • A

l l

  • w

s l e s s p r

  • c

e s s i n g , a k a n

  • p

a r s i n g

  • S

t

  • r

e s d a t a mo r e e ffic i e n t l y f

  • r

s c a n r e a d i n g s

slide-26
SLIDE 26

A d v a n c e d L

  • g

F i l e s

  • L
  • t

s

  • f

b i n a r y f

  • r

ma t s l a t e l y

– T

h r i f t

– A

v r

  • – P

a r q u e t We c r e a t e d P a r q u e t t

  • ma

k e t h e a d v a n t a g e s

  • f

c

  • mp

r e s s e d , e ffic i e n t c

  • l

u mn a r d a t a r e p r e s e n t a t i

  • n

a v a i l a b l e t

  • a

n y p r

  • j

e c t i n t h e H a d

  • p

e c

  • s

y s t e m.

slide-27
SLIDE 27

R e l a t i

  • n

a l D a t a b a s e s

  • T

r u e a n d t e s t e d t e c h n

  • l
  • g

y v a l i d a t e d i n mu l t i t u d e

  • f

s c e n a r i

  • s
  • A

l l

  • w

s i n d e x i n g

  • u

t

  • f

t h e b

  • x
  • A

l l

  • w

s d a t a r e p l i c a t i

  • n

a n d s h a r d i n g ( t

  • s
  • me

e x t e n t )

slide-28
SLIDE 28

R e l a t i

  • n

a l D a t a b a s e s

  • U

s e t h e S t a r S c h e ma

  • T

h e f a c t t a b l e c

  • n

t a i n s t h e me a s u r e me n t s

  • T

h e d i me n s i

  • n

t a b l e s c

  • n

t a i n s i n f

  • a

b

  • u

t t h e s e r i e s

slide-29
SLIDE 29

R e l a t i

  • n

a l D a t a b a s e s

  • T

h e S t a r S c h e ma c a n w

  • r

k r e a s

  • n

a b l y t

  • t

h e h u n d r e d s

  • f

mi l l i

  • n

s

  • We

c a n e v e n i mp l e me n t t h e S t a r S c h e ma i n a N

  • S

Q L d a t a b a s e

  • Wh

e n d a t a g r

  • w

s t h i s s i z e s e v e r a l p r

  • b

l e ms a r i s e mo s t l y r e l a t e d t

  • t

h e S t a r S c h e ma i t s e l f .

slide-30
SLIDE 30

L i mi t a t i

  • n

s

  • f

t h e S t a r S c h e ma

  • I

t u s e s

  • n

e r

  • w

p e r me a s u r e me n t

  • L

i mi t a n t s

  • f

r e t r i e v a l s p e e d :

– n

u mb e r

  • f

r

  • w

s s c a n n e d ,

– t

  • t

a l n u mb e r

  • f

v a l u e s r e t r i e v e d

– t

  • t

a l v

  • l

u me

  • f

d a t a r e t r i e v e d

slide-31
SLIDE 31

N

  • S

Q L d a t a b a s e s

  • Mo

s t

  • f

T S D B s u s e a N

  • S

Q L e n g i n e

– O

p e n T S B H b a s e →

– I

n fl u x D B B

  • l

t D B →

– P

r

  • me

t h e u s L e v e l D B →

– N

e w t s C a s s a n d r a →

slide-32
SLIDE 32

N

  • S

Q L d a t a b a s e s

  • T

a l l a n d n a r r

  • w

v s S h

  • r

t a n d w i d e t a b l e d e s i g n s

  • S

h

  • r

t a n d w i d e d e n

  • r

ma l i z e s d a t a

  • S

h

  • r

t a n d w i d e p r

  • v

i d e s s e v e r a l a d v a n t a g e s

  • v

e r t h e c

  • l

u mn a r d a t a mo d e l

slide-33
SLIDE 33

N

  • S

Q L d a t a b a s e s

  • I

n d e x e d b y T S a n d t i me s t a mp t h e mo s t c

  • mmo

n a c c e s s p a t t e r n

  • R

e t r i e v i n g d a t a i s a n a l mo s t s e q u e n t i a l r e a d i n g f r

  • m

d i s k

slide-34
SLIDE 34

I mp r

  • v

e me n t s

  • v

e r t h e Wi d e T a b l e D e s i g n

  • C
  • l

l a p s e a l l t h e d a t a i n t

  • a

b l

  • b
  • C
  • mp

r e s s t h e b l

  • b

s

  • l

e s s d a t a h a s t

  • b

e r e a d

  • A

l l

  • w

c

  • e

x i s t e n c e

  • f

w i d e t a b l e c

  • l

u mn s a n d t h e b l

  • b
slide-35
SLIDE 35

I mp r

  • v

e me n t s

  • v

e r t h e Wi d e T a b l e D e s i g n

  • A

v

  • i

d t h e r e a d s i n

  • r

d e r t

  • v

e r c

  • me

i n s e r t b

  • t

t l e n e c k s

  • C

r e a t e a f a l l b a c k s y s t e m i n

  • r

d e r t

  • p

r e v e n t f a i l u r e s

  • A

l l

  • w

a c c e s s t

  • t

h e i n

  • me

mo r y d a t a

slide-36
SLIDE 36

Wh y n

  • t

w i t h R D B Ms ?

  • Wh

y u s e a R D B Ms w h e n y

  • u

' r e n

  • t

u s i n g a n y

  • f

i t s s t r

  • n

g p

  • i

n t s ?

  • A

l s

  • s
  • me

f e a t u r e s , i e . t r a n s a c t i

  • n

s , g e t i n y

  • u

r w a y f

  • r

s c a l i n g

slide-37
SLIDE 37

T i me S e r i e s D a t a b a s e s

slide-38
SLIDE 38

I n fl u x D B

slide-39
SLIDE 39
  • Wr

i t t e n i n G

  • U

s i n g B

  • l

t D B a i t s i n t e r n a l s t

  • r

a g e e n g i n e

  • S

Q L

  • l

i k e l a n g u a g e

  • H

T T P ( S ) A P I f

  • r

q u e r y i n g d a t a

  • S

t

  • r

e s me t r i c s a n d e v e n t d a t a

  • H
  • r

i z

  • n

t a l l y s c a l a b l e

F e a t u r e s

slide-40
SLIDE 40

A s e r i e s i s a c

  • l

l e c t i

  • n
  • f

d a t a p

  • i

n t s a l

  • n

g a t i me l i n e t h a t s h a r e a c

  • mmo

n k e y , e x p r e s s e d a s a me a s u r e me n t a n d t a g s e t p a i r i n g , g r

  • u

p e d u n d e r a r e t e n t i

  • n

p

  • l

i c y

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-41
SLIDE 41
  • Me

a s u r e me n t

  • I

t i s t h e v a l u e b e i n g r e c

  • r

d e d

  • C

a n b e s h a r e d a mo n g s t ma n y s e r i e s

  • A

l l s e r i e s u n d e r a g i v e n me a s u r e me n t h a v e t h e s a me fi e l d k e y s a n d d i ff e r

  • n

l y i n t h e i r t a g s e t

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-42
SLIDE 42
  • T

a g

  • I

t i s a k e y

  • v

a l u e p a i r .

  • A

me a s u r e me n t c

  • u

l d h a v e s e v e r a l t a g s

  • T

a g s a r e i n d e x e d

  • B
  • t

h t h e k e y a n d v a l u e a r e s t r i n g s

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-43
SLIDE 43
  • P
  • i

n t

  • A

p

  • i

n t i s a s i n g l e c

  • l

l e c t i

  • n
  • f

fi e l d s i n a s e r i e s .

  • I

t i s u n i q u e l y i d e n t i fi e d b y i t s s e r i e s a n d t i me s t a mp

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-44
SLIDE 44
  • F

i e l d

  • A

fi e l d i s a k e y

  • v

a l u e p a i r

  • I

t r e c

  • r

d s a n a c t u a l me t r i c f

  • r

a g i v e n p

  • i

n t

  • T

h e y a r e n

  • t

i n d e x e d

  • T

h e y a r e r e q u i r e d a t l e a s t 1

  • n

e a c h p

  • i

n t

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-45
SLIDE 45
  • D

a t a b a s e

  • s

i mi l a r i n c

  • n

c e p t t

  • R

D B S g r

  • u

p s s e r i e s

  • R

e t e n t i

  • n

p

  • l

i c y

  • d

e fi n e s w h a t t

  • d
  • w

i t h d a t a t h a t i s

  • l

d e r t h a n t h e p r e s c r i b e d r e t e n t i

  • n

p

  • l

i c y

K e y c

  • n

c e p t s

https://influxdb.com/docs/v0.9/concepts/key_concepts.html

slide-46
SLIDE 46

{ "database": "mydb", "points": [ { "measurement": "cpu_load", "tags": { "host": "server01", "core": "0" }, "time": "2009-11-10T23:00:00Z", "fields": { "value": 0.45 } }, { "measurement": "cpu_load", "tags": { "host": "server01", "core": "1" }, "time": "2009-11-10T23:00:00Z", "fields": { "value": 1.56 } } ] }

L

  • g

g i n g p

  • i

n t s i n t

  • I

n fl u x D B

slide-47
SLIDE 47

H T T P e n d p

  • i

n t

/ q u e r y G E T / w r i t eO P T I O N S / w r i t eP O S T / p i n g G E T / p i n g H E A D / d a t a / p r

  • c

e s s _ c

  • n

t i n u

  • u

s _ q u e r i e s P O S T

slide-48
SLIDE 48

Q u e r y e x p l

  • r

a t i

  • n

Q u e r i e s l i k e i n R D B Ms Q u e r y i n g b y t i me

slide-49
SLIDE 49

D e a l i n g w i t h T i me

  • Q

u e r y i n g u s i n g t i me s t r i n g s

  • R

e l a t i v e t i me

  • A

b s

  • l

u t e t i me

slide-50
SLIDE 50

D e a l i n g w i t h mi s s i n g v a l u e s

  • U

s e n u l l , p r e v i

  • u

s , n

  • n

e f

  • r

mi s s i n g v a l u e s

slide-51
SLIDE 51

Wr i t e d a t a

  • I

n g e s t d a t a i n t

  • I

n fl u x D B u s i n g t h e H T T P A P I

  • C

r e a t e t h e D a t a b a s e

curl -G http://localhost:8086/query --data-urlencode "q=CREATE DATABASE mydb

  • Wr

i t e d a t a i n t

  • t

h e d a t a b a s e

curl -i -XPOST 'http://localhost:8086/write?db=mydb' --data-binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000'

slide-52
SLIDE 52

H a n d s

  • n
  • I

mp

  • r

t d a t a f r

  • m

S t a n d a r d & P

  • r
  • E

x p l

  • r

e t h e p e r f

  • r

ma n c e

  • f

d i ff e r e n t e n c

  • d

i n g s :

– S

e v e r a l fi e l d s f

  • r

a s i n g l e p

  • i

n t

– E

a c h c

  • l

u mn a s a s e p a r a t e T S

  • C

r e a t e t h e f

  • l

l

  • w

i n g q u e r i e s :

– S

e l e c t ma x i mu m

  • p

e n i n g p r i c e

  • n

a g i v e n p e r i

  • d

f

  • r

e a c h q u

  • t

e

– S

e l e c t t h e mo n t h l y a v e r a g e

slide-53
SLIDE 53

H a n d s

  • n

( A d v a n c e d )

  • I

mp

  • r

t e x t r a d a t a s e t

  • C
  • mp

a r e l

  • a

d i n g a n d q u e r y i n g d a t a b e t w e e n My S Q L a n d I n fl u x D B

slide-54
SLIDE 54

S t r e a mi n g d a t a

slide-55
SLIDE 55

A l g

  • r

i t h ms f

  • r

p r

  • c

e s s i n g d a t a s t r e a ms i n w h i c h t h e i n p u t i s p r e s e n t e d a s a s e q u e n c e

  • f

i t e ms a n d c a n b e e x a mi n e d i n

  • n

l y a f e w p a s s e s

slide-56
SLIDE 56

E x a mp l e s

slide-57
SLIDE 57

E x a mp l e s : A n

  • ma

l y D e t e c t i

  • n
slide-58
SLIDE 58

R e a l T i me T e l e me t r y

slide-59
SLIDE 59

T r e n d s i n S

  • c

i a l N e t w

  • r

k s

slide-60
SLIDE 60

S t r e a mi n g a l g

  • r

i t h ms

slide-61
SLIDE 61

C h a r a c t e r i s t i c s

  • f

s t r e a mi n g a l g

  • r

i t h ms

  • O

p e r a t e s

  • n

a c

  • n

t i n u

  • s

s t r e a m

  • f

d a t a

  • U

n k n

  • w

n

  • r

i n fi n i t e s i z e

  • O

n l y

  • n

e p a s s , t h a t a l l

  • w

s f

  • l

l

  • w

i n g

  • p

t i

  • n

s :

  • S

t

  • r

e i t

  • L
  • s

e i t

  • S

t

  • r

e a n a p p r

  • x

i ma t i

  • n
  • f

i t

  • L

i mi t e d p r

  • c

e s s i n g t i me p e r i t e m

  • L

i mi t e d t

  • t

a l me mo r y

slide-62
SLIDE 62

T h e s e a l g

  • r

i t h ms p r

  • d

u c e a n a p p r

  • x

i ma t e a n s w e r b a s e d

  • n

a s u mma r y

  • r

" s k e t c h "

  • f

t h e d a t a s t r e a m i n me mo r y

slide-63
SLIDE 63

T h e y h a v e l i mi t e d me mo r y a v a i l a b l e t

  • t

h e m ( mu c h l e s s t h a n t h e i n p u t s i z e ) a n d a l s

  • l

i mi t e d p r

  • c

e s s i n g t i me p e r i t e m.

slide-64
SLIDE 64

Q u e s t i

  • n

s t

  • a

n s w e r

  • F

r e q u e n c y mo me n t s

  • C
  • u

n t i n g d i s t i n c t e l e me n t s

  • H

e a v y H i t t e r s

  • A

n

  • ma

l y d e t e c t i

  • n

/ Me mb e r s h i p q u e r y

  • O

n l i n e l e a r n i n g

slide-65
SLIDE 65

https://highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/

slide-66
SLIDE 66

C a r d i n a l i t y e s t i ma t i

  • n

L i n e a r C

  • u

n t i n g

L

  • a

d F a c t

  • r

i s t h e r a t i

  • f

d i s t i n c t e l e me n t s

  • v

e r t h e s i z e m

slide-67
SLIDE 67

C a r d i n a l i t y e s t i ma t i

  • n

L i n e a r C

  • u

n t i n g

slide-68
SLIDE 68

C a r d i n a l i t y e s t i ma t i

  • n

L

  • g

l

  • g

C

  • u

n t i n g

slide-69
SLIDE 69

C a r d i n a l i t y e s t i ma t i

  • n

L

  • g

l

  • g

C

  • u

n t i n g

slide-70
SLIDE 70

F r e q u e n c y E s t i ma t i

  • n

: C

  • u

n t

  • Mi

n S k e t c h

slide-71
SLIDE 71

F r e q u e n c y E s t i ma t i

  • n

: C

  • u

n t

  • Mi

n S k e t c h

slide-72
SLIDE 72

F r e q u e n c y E s t i ma t i

  • n

: C

  • u

n t

  • Me

a n

  • Mi

n S k e t c h

slide-73
SLIDE 73

H e a v y H i t t e r s C

  • u

n t

  • Mi

n S k e t c h

slide-74
SLIDE 74

H e a v y H i t t e r s S t r e a m- S u mma r y

slide-75
SLIDE 75

Me mb e r s h i p Q u e r y B l

  • m

F i l t e r

slide-76
SLIDE 76

O n l i n e L e a r n i n g

Data Prediction Parameters

slide-77
SLIDE 77

F e a t u r e H a s h i n g

  • John likes to watch movies.
  • Mary likes movies too.
  • John also likes football.
slide-78
SLIDE 78

F e a t u r e H a s h i n g

P r

  • s

:

  • E

x t r e me l y f a s t

  • N
  • me

mo r y f

  • t

p r i n t C

  • n

s

  • T

h e r e i s n

  • w

a y t

  • r

e v e r s e f e a t u r e s

C a n b e e x t e n d e d t

  • u

s e s i g n e d h a s h i n g f u n c t i

  • n

s

slide-79
SLIDE 79

S t

  • c

h a s t i c G r a d i e n t D e s c e n t s

slide-80
SLIDE 80

A p a c h e S p a r k

slide-81
SLIDE 81

S t

  • r

a g e S y s t e m S t

  • r

a g e S y s t e m P r

  • g

r a m Mo d e l P r

  • g

r a m Mo d e l

slide-82
SLIDE 82

Word Count

Hello cruel world Say hello! Hello! hello hello

1

cruel cruel

1

world world

1

say say

1

hello hello

2

hello hello

3

cruel cruel

1

world world

1

say say

1 Raw Map Reduce Result

slide-83
SLIDE 83

P r

  • b

l e m w i t h I t e r a t i v e A l g

  • s

Disk I/O is very expensive

slide-84
SLIDE 84

O p

  • r

t u n i t y f

  • r

a n e w a p p r

  • a

c h

  • K

e e p d a t a i n me mo r y

  • U

s e a n e w d i s t r i b u t i

  • n

mo d e l

slide-85
SLIDE 85

S p a r k S t r e a mi n g

slide-86
SLIDE 86

R e s i l i e n t D i s t r i b u t e d D a t a s e t ( R D D s )

  • A

d i s t r i b u t e d a n d i mmu t a b l e c

  • l

l e c t i

  • n
  • f
  • b

j e c t s

  • E

a c h R D D c a n b e s p l i t i n t

  • mu

l t i p l e p a r t i t i

  • n

s

  • R

D D s a l l

  • w

t w

  • t

y p e s

  • f
  • p

e r a t i

  • n

s :

  • T

r a n s f

  • r

ma t i

  • n

s ( l a z y )

  • A

c t i

  • n

s ( n

  • n
  • l

a z y )

slide-87
SLIDE 87

D S t r e a m

A s e q u e n c e

  • f

R D D s r e p r e s e n t i n g a s t r e a m

  • f

d a t a

slide-88
SLIDE 88

http://www.slideshare.net/spark-project/deep-divewithsparkstreaming-tathagatadassparkmeetup20130617

D S t r e a ms

slide-89
SLIDE 89

Wi n d

  • w

s

slide-90
SLIDE 90

Wi n d

  • w

i n g c

  • mp

u t a t i

  • n

s

slide-91
SLIDE 91

S t a t e f u l c

  • mp

u t a t i

  • n

s

slide-92
SLIDE 92

D S t r e a m A P I

http://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.stream ing.dstream.DStream

slide-93
SLIDE 93

H a n d s

  • n

S t r e a mi n g

  • S

t a r t t h e S p a r k s e r v e r

  • C

r e a t e a J

  • b
  • R

u n n e t c a t

  • S

e n d

slide-94
SLIDE 94

H a n d s

  • n

S t r e a mi n g ( A d v a n c e d )

  • I

mp l e me n t C

  • u

n t

  • L
  • g
  • n

b a s i c S p a r k

slide-95
SLIDE 95

S e t u p e n v i r

  • n

me n t

  • P

r e r e q u i s i t e :

– I

n s t a l l l a t e s t v e r s i

  • n
  • f

V a g r a n t h t t p s : / / w w w . v a g r a n t u p . c

  • m/

– I

n s t a l l l a t e s t s v e r s i

  • n
  • f

V i r t u a l b

  • x

h t t p s : / / w w w . v i r t u a l b

  • x

.

  • r

g /

  • C

r e a t e t h e V i r t u a l Ma c h i n e : vagrant init codezomb/trusty64-docker vagrant up

http://blog.scottlowe.org/2015/02/10/using-docker-with-vagrant/

slide-96
SLIDE 96

S e t u p e n v i r

  • n

me n t

  • L
  • g

i n i n t

  • t

h e V M ma c h i n e vagrant ssh

  • I

n s t a l l s

  • me

U b u n t u p a c k a g e s s u d

  • a

p t

  • g

e t u p d a t e sudo apt-get -y install docker

  • penjdk-7-jdk
  • P

u l l d

  • c

k e r i ma g e s docker pull tutum/influxdb docker pull sequenceiq/spark:1.3.0

http://old.blog.phusion.nl/2013/11/08/docker-friendly-vagrant-boxes/

slide-97
SLIDE 97

@ t

  • n

i c e b r i a n , @ E n e r b y t e t

  • n

i . c e b r i a n @ g ma i l . c

  • m

h t t p s : / / e s . l i n k e d i n . c

  • m/

i n / t

  • n

i c e b r i a n h t t p : / / w w w . t

  • n

i c e b r i a n . c

  • m