P r a c t i c a l L i n k e d D a t a A c c e - - PowerPoint PPT Presentation

p r a c t i c a l l i n k e d d a t a a c c e s s v i a s
SMART_READER_LITE
LIVE PREVIEW

P r a c t i c a l L i n k e d D a t a A c c e - - PowerPoint PPT Presentation

P r a c t i c a l L i n k e d D a t a A c c e s s v i a S P A R Q L : T h e C a s e o f W i k i d a t a A d r i a n B i e l e f e l d t J u l i u s G o n s i o r Ma r


slide-1
SLIDE 1

P r a c t i c a l L i n k e d D a t a A c c e s s v i a S P A R Q L : T h e C a s e

  • f

W i k i d a t a

A d r i a n B i e l e f e l d t J u l i u s G

  • n

s i

  • r

Ma r k u s K r ö t z s c h K n

  • w

l e d g e

  • B

a s e d S y s t e m s T U D r e s d e n

A l s

  • r

e p

  • r

t i n g

  • n

j

  • i

n t w

  • r

k w i t h

S t a s Ma l y s h e v ( Wi k i m e d i a )

a n d

L a r r y G

  • n

z a l e z ( T U D r e s d e n ) R e s e a r c h s u p p

  • r

t e d b y t h e Wi k i m e d i a F

  • u

n d a t i

  • n

F

  • r

t h e e p

  • n

y m

  • u

s L D O W 2 1 8 p a p e r , s e e h t t p s : / / i c c l . i n f . t u

  • d

r e s d e n . d e / w e b / I n p r

  • c

e e d i n g s 3 1 9 6 / e n

Slideset published under CC-By-SA 3.0 – slides without the title slide also published as CC-By 3.0

Background image by Phillip Maiwald, CC-By-SA 3.0

slide-2
SLIDE 2

2

Wi k i d a t a , t h e k n

  • w

l e d g e g r a p h

  • f

Wi k i p e d i a , u s e s S P A R Q L a s i t s ma i n q u e r y A P I

Wh

  • i

s u s i n g t h i s ?

Wh a t a r e t h

  • s

e S P A R Q L q u e r i e s l i k e ?

Wh a t c a n w e l e a r n f r

  • m

t h e m?

+ = ?

slide-3
SLIDE 3

3

W a i t ! – W i k i d a t a u s e s R D F ? !

slide-4
SLIDE 4

4

W a i t ! – W i k i d a t a u s e s R D F ? !

award received (P166)

Louis Néel

(Q155781) Nobel Prize in Physics (Q38104)

point in time (P585): 1970 together with (P1706): H. Alfvén (Q54945) prize money (P2121): 200000 SEK (Q122922)

H

  • w

d

  • e

s Wi k i d a t a ’ s r i c h g r a p h mo d e l r e l a t e t

  • R

D F ?

slide-5
SLIDE 5

5

W a i t ! – W i k i d a t a u s e s R D F ? !

Louis Néel

(Q155781) Nobel Prize in Physics (Q38104)

wdt:P166

O f fj c i a l R D F v e r s i

  • n

f

  • l

l

  • w

s E r x l e b e n e t a l . [ I S WC 2 1 4 ] :

slide-6
SLIDE 6

6

W a i t ! – W i k i d a t a u s e s R D F ? !

Louis Néel

(Q155781) Nobel Prize in Physics (Q38104)

wdt:P166

wds:Q155781-...

p:P166 ps:P166

“1970”^^xsd:gYear

pq:P585

Q54945

pq:P1706 pq:P2121

“200000”^^xsd:decimal

O f fj c i a l R D F v e r s i

  • n

f

  • l

l

  • w

s E r x l e b e n e t a l . [ I S WC 2 1 4 ] :

slide-7
SLIDE 7

7

R D F f

  • r

W i k i d a t a

Wi k i d a t a

  • f

f e r s a l l

  • f

i t s c

  • n

t e n t i n R D F

L i n k e d d a t a l i v e e x p

  • r

t s ( E x a mp l e : h t t p s : / / w w w . w i k i d a t a .

  • r

g / w i k i / S p e c i a l : E n t i t y D a t a / Q 4 2 . n t )

We e k l y d u mp s ( S e e h t t p s : / / d u mp s . w i k i me d i a .

  • r

g / w i k i d a t a w i k i / e n t i t i e s / )

C u r r e n t l y 4 . 9 B t r i p l e s

( a s

  • f

A p r i l 2 1 8 )

> 4 1 5 M Wi k i d a t a S t a t e me n t s

4 . 5 K Wi k i d a t a p r

  • p

e r t i e s > 4 8 K R D F p r

  • p

e r t i e s →

> 1 . 5 B l a b e l s / d e s c r i p t i

  • n

s / a l i a s e s

> 6 3 M l i n k s t

  • Wi

k i p e d i a a n d f r i e n d s

slide-8
SLIDE 8

8

W i k i d a t a S P A R Q L Q u e r y S e r v i c e

O f fj c i a l q u e r y s e r v i c e s i n c e mi d 2 1 5

U s e r i n t e r f a c e a t h t t p s : / / q u e r y . w i k i d a t a .

  • r

g /

A l l t h e d a t a ( 4 . 9 B t r i p l e s ) , l i v e ( l a t e n c y < 6 s )

N

  • l

i mi t s ( w e l l , a l mo s t ) :

6 s e c t i me

  • u

t

N

  • l

i mi t

  • n

r e s u l t s i z e ( ! )

N

  • l

i mi t

  • n

q u e r y n u mb e r s p e r I P

C l i e n t s mi g h t b e p a u s e d a f t e r t

  • ma

n y p a r a l l e l r e q u e s t s

slide-9
SLIDE 9

9

A s i m p l e S P A R Q L q u e r y

slide-10
SLIDE 10

10

A s i m p l e S P A R Q L q u e r y

slide-11
SLIDE 11

11

A n

  • t
  • s
  • s

i m p l e S P A R Q L q u e r y

slide-12
SLIDE 12

12

A n

  • t
  • s
  • s

i m p l e S P A R Q L q u e r y

slide-13
SLIDE 13

13

S

  • m

e m e t r i c s

 R

u n n i n g

  • n

B l a z e G r a p h d a t a b a s e e n g i n e

3 s e r v e r s ( + 3 a s b a c k u p )

I n t e l X e

  • n

E 5

  • 2

6 2 8 c

  • r

e / 1 2 8 G me m/ 8 G S S D

S t a n d a r d c a c h i n g ( V a r n i s h ) a n d l

  • a

d b a l a n c i n g ( L V S )

S

  • me

c u s t

  • m

t

  • l

s , e x t e n s i

  • n

a n d t u n i n g s A l l a v a i l a b l e

  • n

l i n e : h t t p s : / / g i t h u b . c

  • m/

w i k i me d i a / w i k i d a t a

  • q

u e r y

  • r

d f

slide-14
SLIDE 14

14

S

  • m

e m e t r i c s

 R

u n n i n g

  • n

B l a z e G r a p h d a t a b a s e e n g i n e

3 s e r v e r s ( + 3 a s b a c k u p )

I n t e l X e

  • n

E 5

  • 2

6 2 8 c

  • r

e / 1 2 8 G me m/ 8 G S S D

S t a n d a r d c a c h i n g ( V a r n i s h ) a n d l

  • a

d b a l a n c i n g ( L V S )

S

  • me

c u s t

  • m

t

  • l

s , e x t e n s i

  • n

a n d t u n i n g s A l l a v a i l a b l e

  • n

l i n e : h t t p s : / / g i t h u b . c

  • m/

w i k i me d i a / w i k i d a t a

  • q

u e r y

  • r

d f

 S

e r v i n g > 1 M r e q u e s t s / mo n t h ( 3 . 8 M/ d a y )

5 %

  • f

q u e r i e s a n s w e r e d i n < 4 ms ( 9 5 % i n < 4 4 ms ; 9 9 % i n < 4 s )

L e s s t h a n . 5 %

  • f

q u e r i e s t i me

  • u

t

S e r v i c e h a s n e v e r b e e n d

  • w

n s

  • f

a r

slide-15
SLIDE 15

15

A n a l y s i n g S P A R Q L l

  • g

s : T h e B

  • t

P r

  • b

l e m

slide-16
SLIDE 16

16

A n a l y s i n g S P A R Q L l

  • g

s : T h e B

  • t

P r

  • b

l e m

 Q

u e r y t r a f fj c i s r u l e d b y a f e w b

  • t

s

F i g . : Wi k i d a t a S P A R Q L t r a f fj c J u n

  • S

e p 2 1 7

slide-17
SLIDE 17

17

A n a l y s i n g S P A R Q L l

  • g

s : T h e B

  • t

P r

  • b

l e m

 Q

u e r y t r a f fj c i s r u l e d b y a f e w b

  • t

s

F i g . : Wi k i d a t a S P A R Q L t r a f fj c J u n

  • S

e p 2 1 7

 4

1 %

  • f

a l l Wi k i d a t a q u e r y t r a f fj c f r

  • m

J u n e – S e p t e mb e r 2 1 7 c a u s e d b y

  • n

e s u p e r

  • p
  • w

e r u s e r ( Ma g n u s Ma n s k e )

slide-18
SLIDE 18

18

A n a l y s i n g S P A R Q L l

  • g

s : T h e B

  • t

P r

  • b

l e m

 Q

u e r y t r a f fj c i s r u l e d b y a f e w b

  • t

s

F i g . : Wi k i d a t a S P A R Q L t r a f fj c J u n

  • S

e p 2 1 7

 4

1 %

  • f

a l l Wi k i d a t a q u e r y t r a f fj c f r

  • m

J u n e – S e p t e mb e r 2 1 7 c a u s e d b y

  • n

e s u p e r

  • p
  • w

e r u s e r ( Ma g n u s Ma n s k e )

 T

h e e f f e c t d

  • e

s n

  • t

a v e r a g e

  • u

t , a n d i t a f f e c t s

  • t

h e r s i t e s t

  • F

i g . : U s a g e

  • f

D I S T I N C T

  • n

D B p e d i a [ B

  • n

i f a t i e t a l . 2 1 7 ]

2012 2013 2014 2015 2016 0.00% 10.00% 20.00% 30.00% 40.00% 18.00% 8.00% 11.00% 38.00% 8.00%

slide-19
SLIDE 19

19

A n a l y s i n g S P A R Q L l

  • g

s : T h e B

  • t

P r

  • b

l e m

 Q

u e r y t r a f fj c i s r u l e d b y a f e w b

  • t

s

F i g . : Wi k i d a t a S P A R Q L t r a f fj c J u n

  • S

e p 2 1 7

 4

1 %

  • f

a l l Wi k i d a t a q u e r y t r a f fj c f r

  • m

J u n e – S e p t e mb e r 2 1 7 c a u s e d b y

  • n

e s u p e r

  • p
  • w

e r u s e r ( Ma g n u s Ma n s k e )

 T

h e e f f e c t d

  • e

s n

  • t

a v e r a g e

  • u

t , a n d i t a f f e c t s

  • t

h e r s i t e s t

  • F

i g . : U s a g e

  • f

D I S T I N C T

  • n

D B p e d i a [ B

  • n

i f a t i e t a l . 2 1 7 ]

2012 2013 2014 2015 2016 0.00% 10.00% 20.00% 30.00% 40.00% 18.00% 8.00% 11.00% 38.00% 8.00%

No t r e n d s ! No p r e d i c t a b i l i t y ! No i n s i g h t s !

slide-20
SLIDE 20

20

A r e S P A R Q L q u e r i e s i n t e r e s t i n g a f t e r a l l ?

 O

b s e r v a t i

  • n

: R

  • b
  • t

i c t r a f fj c d

  • mi

n a t e s

Ma y n

  • t

r e p r e s e n t a n y r e a l i n t e r e s t

G

  • v

e r n e d b y v e r y f e w s

  • u

r c e s

R a n d

  • m

c h a n g e s – n

  • t

u n i f

  • r

m

  • n

a n y

  • b

s e r v e d s c a l e

slide-21
SLIDE 21

21

A r e S P A R Q L q u e r i e s i n t e r e s t i n g a f t e r a l l ?

 O

b s e r v a t i

  • n

: R

  • b
  • t

i c t r a f fj c d

  • mi

n a t e s

Ma y n

  • t

r e p r e s e n t a n y r e a l i n t e r e s t

G

  • v

e r n e d b y v e r y f e w s

  • u

r c e s

R a n d

  • m

c h a n g e s – n

  • t

u n i f

  • r

m

  • n

a n y

  • b

s e r v e d s c a l e

 H

y p

  • t

h e s i s : O r g a n i c t r a f fj c a l s

  • e

x i s t s

R e p r e s e n t i n g h u ma n i n f

  • r

ma t i

  • n

n e e d d u r i n g s

  • me

i n t e r a c t i

  • n

C

  • mp
  • s

e d

  • f

ma n y d i v e r s e s

  • u

r c e s

C

  • n

t i n u

  • u

s c h a n g e

  • v

e r mo n t h s N

  • t

e : “ O r g a n i c ” ≠ “ h a n d

  • w

r i t t e n S P A R Q L ” ( u s e r a p p s mi g h t u s e S P A R Q L t

  • g

e t u s e r

  • r

e q u e s t e d d a t a w i t h

  • u

t u s e r s a c t u a l l y w r i t i n g q u e r i e s )

slide-22
SLIDE 22

22

E x t r a c t i n g

  • r

g a n i c t r a f fj c

Ma i n s i g n a l : U s e r A g e n t s

A s s u mp t i

  • n

:

  • r

g a n i c t r a f fj c g e n e r a l l y f r

  • m

b r

  • w

s e r

  • l

i k e a g e n t s

slide-23
SLIDE 23

23

E x t r a c t i n g

  • r

g a n i c t r a f fj c

Ma i n s i g n a l : U s e r A g e n t s

A s s u mp t i

  • n

:

  • r

g a n i c t r a f fj c g e n e r a l l y f r

  • m

b r

  • w

s e r

  • l

i k e a g e n t s

2 n d s i g n a l : q u e r y c

  • mme

n t s

S

  • me

b r

  • w

s e r

  • b

a s e d t

  • l

s ma r k q u e r i e s u s i n g c

  • mme

n t s

3 r d s i g n a l : a c t i v i t y s p i k e s

G r

  • u

p q u e r i e s b y q u e r y p a t t e r n ( f

  • l

l

  • w

i n g [ R a g h u v e e r , U S E WO D ’ 1 2 ] )

F i n d a g e n t

  • p

a t t e r n p a i r s t h a t s p i k e ( > 2 K r e q u e s t s / mo n t h )

Ma n u a l l y i n s p e c t t h e s e q u e r i e s t

  • d

e c i d e i f

  • r

g a n i c

  • r

r

  • b
  • t

i c → A b

  • u

t 3 f u r t h e r b r

  • w

s e r

  • b

a s e d s

  • u

r c e s c l a s s i fj e d “ r

  • b
  • t

i c ”

slide-24
SLIDE 24

24

R e s u l t s : O r g a n i c c

  • m

p

  • n

e n t

J u n – S e p 2 1 7 : 6 5 8 , 8 9 q u e r i e s ( < . 5 %)

Mo r e t r i p l e s

  • r

g a n i c 1 7 %: 1 , 9 7 %: ≤ 1 1 v s . r

  • b
  • t

i c 5 7 %: 1 , 9 6 %: ≤ 7

Mo r e v a r i e d ( v

  • c

a b u l a r y , S P A R Q L f e a t u r e s )

T e mp

  • r

a l d i s t r i b u t i

  • n
  • f
  • r

g a n i c q u e r i e s ( 1 2 w e e k s / t i me

  • f

d a y )

slide-25
SLIDE 25

25

I n s i g h t s

  • n

S P A R Q L U s a g e

G e n e r a l : mo r e f e a t u r e s t h a n r e p

  • r

t e d e l s e w h e r e

T y p i c a l l y

  • r

g a n i c : L I MI T , D I S T I N C T , O P T I O N A L , O R D E R B Y , s u b q u e r i e s , a g g r e g a t e s , s e r v i c e s

T y p i c a l l y r

  • b
  • t

i c : B I N D , U N I O N , V A L U E S

C

  • n

j u n c t i v e r e g u l a r p a t h q u e r i e s w i t h c

  • n

v e r s e ( C 2 R P Q s )

Ma i n q u e r y f r a g me n t f

  • r

r

  • b
  • t

i c q u e r i e s ( 7 5 % w h e n a l l

  • w

i n g V A L U E S )

O P T I O N A L :

I mp

  • r

t a n t mo s t l y f

  • r
  • r

g a n i c q u e r i e s

R e c e n t d a t a ( 2 1 8 ) a l s

  • s

h

  • w

s s h i f t t

  • C

2 R P Q + O P T I O N A L ( u p t

  • 8

2 %)

slide-26
SLIDE 26

26

I n s i g h t s

  • n

W i k i d a t a U s a g e

 R

  • b
  • t

i c t r a f fj c :

Ma i n l y i n f

  • r

ma t i

  • n

i n t e g r a t i

  • n

b

  • t

s ( c

  • mp

a r i n g d a t a b a s e c

  • n

t e n t s )

P

  • t

e n t i a l l y a l s

  • s

e l e c t i v e d a t a d

  • w

n l

  • a

d ( s p i d e r

  • l

i k e )

Mo s t q u e r i e s f r

  • m

a f e w d

  • mi

n a n t b

  • t

s ( > 6 % f r

  • m

t

  • p
  • t

h r e e b

  • t

s )

 O

r g a n i c t r a f fj c :

D a t a b r

  • w

s e r s (

  • f

t e n g e n e r a l

  • p

u r p

  • s

e )

Mo b i l e a p p s (

  • f

t e n t

  • p

i c a l )

Mo s t q u e r i e s f r

  • m
  • f

u n i d e n t i fj e d “ s ma l l ” s

  • u

r c e s

 R

e i fj e d s t a t e me n t s i n 4 %– 1 %

  • f

q u e r i e s

slide-27
SLIDE 27

27

C

  • n

c l u s i

  • n

a n d O u t l

  • k

Wi k i d a t a r e l i e s

  • n

R D F a n d S P A R Q L f

  • r

s

  • me
  • f

i t s c

  • r

e f e a t u r e s – a f a s c i n a t i n g u s e c a s e !

 C

  • n

c l u s i

  • n

s

S P A R Q L l

  • g

a n a l y s i s i s me t h

  • d
  • l
  • g

i c a l l y d i f fj c u l t

O r g a n i c t r a f fj c c a n b e e x t r a c t e d b a s e d

  • n

U s e r A g e n t a n d t i me s t a mp s

S P A R Q L q u e r i e s a r e mo r e v a r i e d a n d mo r e c

  • mp

l e x t h a n r e p

  • r

t e d e l s e w h e r e

A f t e r J

  • i

n s , p a t h q u e r i e s a r e t h e s e c

  • n

d mo s t i mp

  • r

t a n t f e a t u r e

 O

u t l

  • k

P u b l i s h i n g a n

  • n

y mi s e d d a t a s e t s : u n d e r r e v i e w ; s t a y t u n e d

D

  • c

u me n t i n g Wi k i d a t a ’ s S P A R Q L d e p l

  • y

me n t i n s i g h t s

Wi k i d a t a w i l l e x p a n d f u r t h e r … ( D i c t i

  • n

a r y c

  • n

t e n t ! Me d i a me t a

  • d

a t a ! )

slide-28
SLIDE 28

28

S P A R Q L F e a t u r e D i s t r i b u t i

  • n

( 2 1 7 / 2 1 8 )

slide-29
SLIDE 29

29

T r i p l e s p e r q u e r y :

  • r

g a n i c

( b l u e )

/ r

  • b
  • t

i c

( y e l l

  • w

)

slide-30
SLIDE 30

30

L a n g u a g e s

  • f

l a b e l s i n

  • r

g a n i c q u e r i e s

slide-31
SLIDE 31

31

S P A R Q L f e a t u r e c

  • c

c u r r e n c e