PatManQL: A language to manipulate patterns and data in hierarchical - - PowerPoint PPT Presentation

patmanql a language to manipulate patterns and data in
SMART_READER_LITE
LIVE PREVIEW

PatManQL: A language to manipulate patterns and data in hierarchical - - PowerPoint PPT Presentation

PatManQL: A language to manipulate patterns and data in hierarchical catalogs Panagiotis Bouros, Theodore Dalamagas, Timos Sellis, Manolis Terrovitis Knowledge and Database Systems Lab School of Electrical and Computer Engineering National


slide-1
SLIDE 1

PatManQL: A language to manipulate patterns and data in hierarchical catalogs

Panagiotis Bouros, Theodore Dalamagas, Timos Sellis, Manolis Terrovitis

Knowledge and Database Systems Lab School of Electrical and Computer Engineering National Technical University of Athens {pbour,dalamag,timos,mter}@dblab.ece.ntua.gr

slide-2
SLIDE 2

PatManQL 2

Outline

  • Introduction
  • Contribution
  • Structures
  • Operators
  • Prototype
  • Related work
  • Conclusion
slide-3
SLIDE 3

PatManQL 3

Introduction

  • Huge volumes of data on the Web
  • Hierarchical structures and catalogs
  • Paths → knowledge artifacts

n Represent group of data

ð Conceptual clustering of raw data based

  • n common properties

n Semantic guides

  • Example: Portal catalogs
slide-4
SLIDE 4

PatManQL 4

Introduction

root cameras & lenses lenses point & shoot negative filters film 35mm SLR APS printers cameras UV PL Close Up digital slide b&w brand model ppm hp 3820 12 hp 7350 17 hp 6122 20 brand model price Canon EOS-3 990 Nikon N65 300 Pentax ZX-M 350 root general filters tripods B&H bags printers cameras digital photography photo memory cards scanners film scanners flatbed scanners 35mm systems SLR cameras lenses

  • ther

formats APS medium adorama ... ... ... ... ... ... brand cam_model Canon 50 EOS-3 Canon 80 EOS-3 400 450 ... ... ... ... price focald Sigma 28 N65 150 (a) (b)

  • Paths → alternative pattern

versions for the same group of data

  • Example: searching for lenses

n /cameras & lenses/lenses (adorama) n /photo/35mm systems/lenses (B&H)

slide-5
SLIDE 5

PatManQL 5

Introduction

root cameras & lenses lenses point & shoot negative filters film 35mm SLR APS printers cameras UV PL Close Up digital slide b&w brand model ppm hp 3820 12 hp 7350 17 hp 6122 20 brand model price Canon EOS-3 990 Nikon N65 300 Pentax ZX-M 350 root general filters tripods B&H bags printers cameras digital photography photo memory cards scanners film scanners flatbed scanners 35mm systems SLR cameras lenses

  • ther

formats APS medium adorama ... ... ... ... ... ... brand cam_model Canon 50 EOS-3 Canon 80 EOS-3 400 450 ... ... ... ... price focald Sigma 28 N65 150 (a) (b)

  • Paths → complex pattern
  • Example: searching for integrated

photo systems n /cameras & lenses/35mm SLR (adorama) n /photo/35mm systems/lenses (B&H)

slide-6
SLIDE 6

PatManQL 6

Contribution

  • A model to represent paths as

knowledge artifacts

  • The PatManQL language:

n Operators to manipulate path-like patterns n Relational operators for data

  • A prototype
slide-7
SLIDE 7

PatManQL 7

Catalog Schema

  • A tree with:

n a root (⊗) n a set of non-leaf nodes (š) n a set of resource items as leaves (□)

  • Data: instances (records) of resource

item

n Resource item: Relation R(a1, a2, …, an), where a1, a2, … attributes

slide-8
SLIDE 8

PatManQL 8

Catalog Schema

X cameras & lenses lenses point & shoot negative filters film 35mm SLR APS printers cameras UV PL digital slide 2 10 SLR cameras 1 Digital printers brand model price Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.50 hp 3820 12 hp 7350 17 hp 6122 20 4 8 7 9 5 brand model ppm

Hierarchy Resource items Data Catalog schema

slide-9
SLIDE 9

PatManQL 9

Tree-Structure Relations (TSRs)

  • Combining catalog schemas with common

resource item

  • Tree-Structure Relation (AND/OR-like

graph):

n One resource item n Paths organized in OR components

  • OR component: group of one or more paths

(AND group)

  • OR components are alternative ways to access

the common resource item

n Paths = patterns

slide-10
SLIDE 10

PatManQL 10

Tree-Structure Relations (TSRs)

X 35mm SLR photo 35mm systems

(a)

model price brand SLR cameras X 35mm SLR photo lenses

(b)

model price brand SLR systems photo camera & lenses cameras photo 35mm systems bodies lens_id

OR #1 OR #2 OR #2 OR #1

slide-11
SLIDE 11

PatManQL 11

Operators

  • Select (σ)

n σ<attribute condition><path condition> (TSR)

ð attribute condition: {=, ≠, <} ð path condition: {=, ≠, ⊂, ∠}

n Filters instances of resource items and OR components

slide-12
SLIDE 12

PatManQL 12

Select example

'Select all non Pentax cameras with price greater than 200Euros, having "/photo/35mm systems" in their paths': σ<brand !="Pentax", price > 200><"/photo/35mm systems" ⊂ $_>(SLR systems)

(a)

SLR systems

(b)

SLR systems brand model price Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo photo 35mm systems bodies X photo 35mm systems lens_id 1 2 3 ... brand model price Canon EOS-3 990 Nikon N65 205 ... ... ... lens_id 1 2 ...

slide-13
SLIDE 13

PatManQL 13

Operators

  • Project (π)

n π<attribute list><variable list> (TSR)

ð attribute list: {attribute} ð variable list: {$i (path variable), #i (OR variable)}

n Keeps attributes of resource item and paths of each OR component or OR components on the whole

slide-14
SLIDE 14

PatManQL 14

Project example

'Cameras with only the model and lens_id attributes and the rightmost component': π<model, lens_id><#2>(SLR systems)

(a)

SLR systems

(b)

SLR systems brand model price Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo photo 35mm systems bodies X photo 35mm systems lens_id 1 2 2 ... model EOS-3 N65 ZX-M ... ... ... lens_id 1 2 2

slide-15
SLIDE 15

PatManQL 15

Operators

  • Cartesian product (X)

n (ΤSR1) Χ (TSR2) n Combine instances of resources and OR components

slide-16
SLIDE 16

PatManQL 16

Cartesian product example

(SLR systems) X (Lenses)

(a)

SLR systems

(b)

Lenses cbrand cmodel cprice Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo photo 35mm systems bodies X camera & lenses lenses clensid 1 1 2 ... lensid 1 2 ... lprice 200 100 ... lbrand Sigma Tamron ...

(c)

SLR systems cbrand cmodel cprice Canon EOS-3 990 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo photo 35mm systems bodies camera & lenses lenses clensid 1 2 ... lensid 1 ... lprice 200 ... lbrand Sigma ... Canon EOS-3 990 1 Nikon N65 205 1 Nikon N65 205 1 Pentax ZX-M 148.5 2 1 200 Sigma 1 200 Sigma 2 100 Tamron 2 100 Tamron 2 100 Tamron camera & lenses lenses

X =

slide-17
SLIDE 17

PatManQL 17

Operators

  • Union (U)

n (TSR) U (TSR) n Union of instances and all OR components

  • Intersection (∩)

n (TSR) ∩ (TSR) n Intersection of instances and all OR components

  • Difference (–)

n (ΤSR) – (TSR) n Instances of the first TSR not present in the second one and all OR components of the first TSR

slide-18
SLIDE 18

PatManQL 18

Union example

(SLR systems) U (SLR systems)

(a)

SLR systems cbrand cmodel cprice Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo bodies clensid 1 1 2 ...

(c) (b)

SLR systems Canon EOS-3 990 Nikon FM2 800 Pentax ZX-M 148.5 ... ... ... X photo 35mm systems 1 1 2 ... SLR systems Canon EOS-3 990 Nikon N65 205 Pentax ZX-M 148.5 ... ... ... X 35mm SLR photo lenses photo photo 35mm systems bodies 1 1 2 ... Nikon FM2 800 1

U =

cbrand cmodel cprice clensid cbrand cmodel cprice clensid

slide-19
SLIDE 19

PatManQL 19

Prototype

  • Interpreter
  • Query Execution Engine
  • Storage mechanism

n XML files n MySQL RDBMS

ð All-edges-in-one-table storage approach

  • Graphical Interface
slide-20
SLIDE 20

PatManQL 20

Related work

  • Pattern management (PANDA project) (S. Rizzi et al.)
  • Inductive databases framework (Tomasz Imielinski et

al.)

n DMQL (Jiawei Han et al.), MINE RULE(R.Meo et al.)

ð Descriptive rules

  • Tree algebras

n TAX (H. V. Jagadish et al.)

ð Selecting – reconstructing bulk XML data

n YAT (V. Christophides et al.)

ð Tuple-based, not tree-based

slide-21
SLIDE 21

PatManQL 21

Conclusion

  • A model to represent paths as

knowledge artifacts (patterns)

n Catalog schema n Tree-Structure Relations (TSRs)

  • The PatManQL language:

n Operators to manipulate paths as patterns and data

  • A prototype system
slide-22
SLIDE 22

PatManQL 22

Future Work

  • Properties of the Operators
  • Restructure operators
  • Join operator
slide-23
SLIDE 23

PatManQL 23

Questions (?)

slide-24
SLIDE 24

PatManQL 24

Tree-Structure Relations (TSRs)

X 35mm SLR photo 35mm systems

(a)

model price brand SLR cameras X 35mm SLR photo lenses

(b)

model price brand SLR systems photo camera & lenses cameras photo 35mm systems bodies lens_id

$1 $1 $1 $1 $2

slide-25
SLIDE 25

PatManQL 25

Storage mechanism

X 35mm SLR photo lenses model price brand SLR systems photo photo 35mm systems bodies lens_id

  • XML file

<tsr name="SLR systems"> <or> <and>/photo/35mm SLR/bodies</and> <and>/photo/lenses</and> </or> <or> <and>/photo/35mm systems</and> </or> <item> <attribute name="brand" type="…"/> <attribute name="model" type="…"/> … <tuple>…</tuple> … </item> </tsr>

slide-26
SLIDE 26

PatManQL 26

Storage mechanism

X 35mm SLR photo lenses model price brand SLR systems photo photo 35mm systems bodies lens_id

  • Database

brand model price lens_id … … … … tid orid andid path 1 1 1 /photo/35mm SLR/bodies 1 1 2 /photo/lenses 1 2 1 /photo/35mm systems tid name file 1 SLR systems portal.xml

slide-27
SLIDE 27

PatManQL 27

Catalog Schemas examples

root cameras & lenses lenses point & shoot negative filters film 35mm SLR APS printers cameras UV PL Close Up digital slide b&w brand model ppm hp 3820 12 hp 7350 17 hp 6122 20 brand model price Canon EOS-3 990 Nikon N65 300 Pentax ZX-M 350 root general filters tripods B&H bags printers cameras digital photography photo memory cards scanners film scanners flatbed scanners 35mm systems SLR cameras lenses

  • ther

formats APS medium adorama ... ... ... ... ... ... brand cam_model Canon 50 EOS-3 Canon 80 EOS-3 400 450 ... ... ... ... price focald Sigma 28 N65 150 (a) (b)

slide-28
SLIDE 28

PatManQL 28

Catalog Schema Manipulation

  • SLR integrated systems from X – fig. (a)
  • SLR cameras from Adorama – fig. (b)
  • Lenses from B&H – fig. (c)
  • Scenario for X:

n New lenses out in the market n Lenses provided by B&H, that fit in Canon bodies provided by Adorama n Above SLR systems not present in her stock

slide-29
SLIDE 29

PatManQL 29

Catalog Schema Manipulation

(b)

SLR cameras X camera & lenses 35mm SLR cbrand cmodel cprice Canon EOS-3 990 Nikon N65 300 Pentax ZX-M 350 ... ... ... lbrand cam_model Canon 50 EOS-3 Canon 80 EOS-3 400 450 ... ... ... ... lprice focald Sigma 28 N65 150 Lenses X 35mm systems photo lenses

(c)

SLR systems X photo 35mm systems cbrand cmodel cprice Canon EOS-3 990 Nikon N65 205 ... ... ... lmodel 100 340 ...

(a)

lmodel 100 110 340 lprice 390 160 ...

slide-30
SLIDE 30

PatManQL 30

Catalog Schema Manipulation

  • Systems with Canon bodies from Adorama and lenses

from B&H – fig. (d): n q1 = π<cbrand,cmodel,lmodel><> (σ<cmodel=cam_model, cbrand="Canon"><> ((SLR cameras) X (lenses)))

  • Systems with Canon bodies from Adorama and lenses

from B&H which are not in X's catalog – fig. (e): n q2 = (q1) – π<cbrand,cmodel,lmodel><>(SLR cameras)

  • Lenses only without the appropriate camera bodies –
  • fig. (f):

n π<lmodel><$2>(q2)

slide-31
SLIDE 31

PatManQL 31

Catalog Schema Manipulation

(d)

X camera & lenses 35mm SLR 35mm systems photo lenses SLR systems cmodel lmodel EOS-3 EOS-3 ... ... 100 110

(e)

X camera & lenses 35mm SLR 35mm systems photo lenses SLR systems cmodel lmodel EOS-3 ... ... 110

(f)

X 35mm systems photo lenses Lenses cbrand Canon Canon ... cbrand Canon ... lmodel ... 110

slide-32
SLIDE 32

PatManQL 32

Prototype Architecture

Database <root> <tsr> </tsr> </root> XML file Query Execution Engine (QE)

tsr>

Interpreter Database Manager (DM) XML File Manager (XFM) Graphic Result Interface (GRI)

slide-33
SLIDE 33

PatManQL 33

XML File Manager (XFM)

OR compontents, paths and resource items retrieval XML file OR compontents, paths and resource items storage TSR TSR Interpreter Graphic Result Interface

slide-34
SLIDE 34

PatManQL 34

Database Manager (DM)

OR compontents, paths and resource items retrieval Database OR compontets, paths and resource items storage TSR TSR Interpreter Graphic Result Interface

slide-35
SLIDE 35

PatManQL 35

Query Execution Engine (QE)

TSR Paths construction and OR components creation Resource item construction Attributes and resords TSR Parameters list Interpreter Interpreter

slide-36
SLIDE 36

PatManQL 36

Interpreter

Parser Print error message Parameter collection XML File Manager Database Manager TSR Query parameters Query Execution Engine Database Manager Graphic Result Interface TSR Query Error message

slide-37
SLIDE 37

PatManQL 37

Graphic Result Interface (GRI)

XML representation form construction TSR Graphical Interface XML File Manager Database Manager TSR XML representation form Interpreter