Acessing the Deep Web with Keywords: A Foundational Approach Andrea - - PowerPoint PPT Presentation

acessing the deep web with keywords a foundational
SMART_READER_LITE
LIVE PREVIEW

Acessing the Deep Web with Keywords: A Foundational Approach Andrea - - PowerPoint PPT Presentation

Acessing the Deep Web with Keywords: A Foundational Approach Andrea Cal and Martn Ugarte IKC 2017 Dish Pages country Dish Pages country If you search for a country, you get the typical dishes from that country, and the chefs who


slide-1
SLIDE 1

Acessing the Deep Web with Keywords: A Foundational Approach

Andrea Calí and Martín Ugarte

IKC 2017

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6
slide-7
SLIDE 7

slide-8
SLIDE 8

Dish Pages

country

slide-9
SLIDE 9

Dish Pages

country

If you search for a country, you get the typical dishes from that country, and the chefs who prepare them

slide-10
SLIDE 10

Dish Pages

country

If you search for a chef, you get his nationality and the amount of Michelin stars he has earned If you search for a country, you get the typical dishes from that country, and the chefs who prepare them

slide-11
SLIDE 11

Dish Pages

country

slide-12
SLIDE 12

Dish Pages

Italy

country

slide-13
SLIDE 13

Dish Pages

Italy

Dish Nation Chef risotto Italy Beck

country

slide-14
SLIDE 14

Dish Pages

Beck

country

slide-15
SLIDE 15

Dish Pages

Beck

Chef Stars Nation Beck 3 Germany

country

slide-16
SLIDE 16

Dish Pages

Germany

country

slide-17
SLIDE 17

Dish Pages

Germany

Dish Nation Chef späzle Germany Passard

country

slide-18
SLIDE 18

Dish Pages

Passard

country

slide-19
SLIDE 19

Dish Pages

Passard

Chef Stars Nation Passard 2 France

country

slide-20
SLIDE 20

Dish Pages

France

country

slide-21
SLIDE 21

Dish Pages

France

Dish Nation Chef foie gras France Bottura raclette France Elverfield

country

slide-22
SLIDE 22

Dish Pages

Bottura

country

slide-23
SLIDE 23

Dish Pages

Bottura

Chef Stars Nation Bottura 3 Italy

country

slide-24
SLIDE 24

Dish Pages

Bottura Elverfield

country

slide-25
SLIDE 25

Dish Pages

Elverfield

country

slide-26
SLIDE 26

Schema

Chef Stars Nation Dish Nation Chef

slide-27
SLIDE 27

Schema

Chef Stars Nation Dish Nation Chef

input

  • utput
  • utput

input output output

slide-28
SLIDE 28

Schema

Chef Stars Nation Dish Nation Chef

input

  • utput
  • utput

input output output

Italy

slide-29
SLIDE 29

Schema

Chef Stars Nation Dish Nation Chef risotto Italy Beck

input

  • utput
  • utput

input output output

Italy

slide-30
SLIDE 30

Schema

Chef Stars Nation Beck 3 Germany Dish Nation Chef risotto Italy Beck

input

  • utput
  • utput

input output output

Italy

slide-31
SLIDE 31

Schema

Chef Stars Nation Beck 3 Germany Dish Nation Chef risotto Italy Beck späzle Germany Passard

input

  • utput
  • utput

input output output

Italy

slide-32
SLIDE 32

Schema

Chef Stars Nation Beck 3 Germany Passard 2 France Dish Nation Chef risotto Italy Beck späzle Germany Passard

input

  • utput
  • utput

input output output

Italy

slide-33
SLIDE 33

Schema

Chef Stars Nation Beck 3 Germany Passard 2 France Dish Nation Chef risotto Italy Beck späzle Germany Passard foie gras France Bottura raclette France Elverfield

input

  • utput
  • utput

input output output

Italy

slide-34
SLIDE 34

Schema

Chef Stars Nation Beck 3 Germany Passard 2 France Bottura 3 Italy Dish Nation Chef risotto Italy Beck späzle Germany Passard foie gras France Bottura raclette France Elverfield

input

  • utput
  • utput

input output output

Italy

slide-35
SLIDE 35

Schema

Chef Stars Nation Beck 3 Germany Passard 2 France Bottura 3 Italy Dish Nation Chef risotto Italy Beck späzle Germany Passard foie gras France Bottura raclette France Elverfield

input

  • utput
  • utput

input output output

Italy

slide-36
SLIDE 36

Schema

Chef Stars Nation Beck 3 Germany Passard 2 France Bottura 3 Italy Dish Nation Chef risotto Italy Beck späzle Germany Passard foie gras France Bottura raclette France Elverfield

input

  • utput
  • utput

input output output Same Abstract Domain

Italy

slide-37
SLIDE 37

ρ1 : qa(C) ˆ r2(C, 3, italy). ρ2 : ˆ r1(D, N, C) domN(N), r1(D, N, C). ρ3 : ˆ r2(C, S, N) domC(C), r2(C, S, N). ρ4 : domC(C) ˆ r1(D, N, C). ρ5 : domN(N) ˆ r2(C, S, N). ρ6 : domN(italy).

slide-38
SLIDE 38

ρ1 : qa(C) ˆ r2(C, 3, italy). ρ2 : ˆ r1(D, N, C) domN(N), r1(D, N, C). ρ3 : ˆ r2(C, S, N) domC(C), r2(C, S, N). ρ4 : domC(C) ˆ r1(D, N, C). ρ5 : domN(N) ˆ r2(C, S, N). ρ6 : domN(italy).

CQ answering under access limitations

slide-39
SLIDE 39

ρ1 : qa(C) ˆ r2(C, 3, italy). ρ2 : ˆ r1(D, N, C) domN(N), r1(D, N, C). ρ3 : ˆ r2(C, S, N) domC(C), r2(C, S, N). ρ4 : domC(C) ˆ r1(D, N, C). ρ5 : domN(N) ˆ r2(C, S, N). ρ6 : domN(italy).

CQ answering under access limitations

Tuple t, Initial constants I, CQ Q, DB D, access limitations Is t in the answers to Q starting with constants I ?

slide-40
SLIDE 40

ρ1 : qa(C) ˆ r2(C, 3, italy). ρ2 : ˆ r1(D, N, C) domN(N), r1(D, N, C). ρ3 : ˆ r2(C, S, N) domC(C), r2(C, S, N). ρ4 : domC(C) ˆ r1(D, N, C). ρ5 : domN(N) ˆ r2(C, S, N). ρ6 : domN(italy).

CQ answering under access limitations t ∈ ans(Q1, I, D)

Tuple t, Initial constants I, CQ Q, DB D, access limitations Is t in the answers to Q starting with constants I ?

slide-41
SLIDE 41

Theorem: CQ answering under access limitations is NP-complete

CQ answering under access limitations

Tuple t, Initial constants I, CQ Q, DB D, access limitations Is t in the answers to Q starting with constants I ?

t ∈ ans(Q1, I, D)

slide-42
SLIDE 42

Theorem: CQ answering under access limitations is NP-complete

CQ answering under access limitations

Tuple t, Initial constants I, CQ Q, DB D, access limitations Is t in the answers to Q starting with constants I ?

t ∈ ans(Q1, I, D)

slide-43
SLIDE 43

Theorem: CQ answering under access limitations is NP-complete

CQ answering under access limitations

Tuple t, Initial constants I, CQ Q, DB D, access limitations Is t in the answers to Q starting with constants I ?

t ∈ ans(Q1, I, D)

slide-44
SLIDE 44

Star Pages

restaurant

slide-45
SLIDE 45

Star Pages

restaurant

If you input a chef and a restaurant, it will tell you how many stars that restaurant earned with that chef.

slide-46
SLIDE 46

Star Pages

restaurant

slide-47
SLIDE 47

Star Pages

La Pergola Beck

restaurant

slide-48
SLIDE 48

Star Pages

Chef Restaurant Stars Beck La Pergola 3

La Pergola Beck

restaurant

slide-49
SLIDE 49

Assume the initial set of constants is 100 chefs and 100 restaurants.

slide-50
SLIDE 50

Assume the initial set of constants is 100 chefs and 100 restaurants. We need to try all pairs <chef, restaurant> to

  • btain the accessible data (10000 queries).
slide-51
SLIDE 51

Assume the initial set of constants is 100 chefs and 100 restaurants.

Chef Restaurant Stars Beck La Pergola 3

Even on this database! We need to try all pairs <chef, restaurant> to

  • btain the accessible data (10000 queries).
slide-52
SLIDE 52

Assume the initial set of constants is 100 chefs and 100 restaurants.

Chef Restaurant Stars Beck La Pergola 3

Even on this database! We need to try all pairs <chef, restaurant> to

  • btain the accessible data (10000 queries).
slide-53
SLIDE 53

Assume the initial set of constants is 100 chefs and 100 restaurants.

Chef Restaurant Stars Beck La Pergola 3

Even on this database! We need to try all pairs <chef, restaurant> to

  • btain the accessible data (10000 queries).

In reality, the database is not part of the input

slide-54
SLIDE 54

Restricted case (Web Scraping) The database is not part of the input

slide-55
SLIDE 55

Restricted case (Web Scraping) The database is not part of the input Unrestricted case (Discoverability) The database is part of the input

slide-56
SLIDE 56
slide-57
SLIDE 57

I want to search this website starting from this set of keywords

Restricted case

slide-58
SLIDE 58

I want to search this website starting from this set of keywords

Restricted case

What can a user retrieve from my database if he starts from this set of keywords?

Unestricted case

slide-59
SLIDE 59

Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount.

slide-60
SLIDE 60

Proposition: There are settings for which the restricted case requires an exponential amount of queries, while the unrestricted case only requires a constant amount.

But they are equivalent in the worst case…

slide-61
SLIDE 61

Conclusions

slide-62
SLIDE 62

Conclusions

Querying the Deep Web with keywords

slide-63
SLIDE 63

Conclusions

Querying the Deep Web with keywords Recursive extraction needed

slide-64
SLIDE 64

Conclusions

Querying the Deep Web with keywords Recursive extraction needed Two scenarios:

  • restricted access (e.g. web forms)
  • unrestricted access
slide-65
SLIDE 65

Conclusions

Querying the Deep Web with keywords Recursive extraction needed Two scenarios:

  • restricted access (e.g. web forms)
  • unrestricted access

First results on computational complexity

slide-66
SLIDE 66

Future work

slide-67
SLIDE 67

Future work

Model the restricted case through oracles

slide-68
SLIDE 68

Future work

Model the restricted case through oracles Theoretical lower bounds

slide-69
SLIDE 69

Future work

Model the restricted case through oracles Theoretical lower bounds etc…