Structural and Computational Properties of Possibilistic Armstrong - - PowerPoint PPT Presentation

structural and computational properties of possibilistic
SMART_READER_LITE
LIVE PREVIEW

Structural and Computational Properties of Possibilistic Armstrong - - PowerPoint PPT Presentation

Structural and Computational Properties of Possibilistic Armstrong Databases Seyeong Jeong, Haoming Ma, Ziheng Wei, Sebastian Link The University of Auckland, New Zealand ER 2020 Vienna, Austria Context of the Work Apply possibility theory to


slide-1
SLIDE 1

Structural and Computational Properties of Possibilistic Armstrong Databases

Seyeong Jeong, Haoming Ma, Ziheng Wei, Sebastian Link The University of Auckland, New Zealand ER 2020 Vienna, Austria

slide-2
SLIDE 2

Context of the Work

Apply possibility theory to schema design for uncertain data

slide-3
SLIDE 3

Context of the Work

Apply possibility theory to schema design for uncertain data Develop computational support for acquiring possibilistic functional dependencies that serve as meaningful input to schema design

slide-4
SLIDE 4

Context of the Work

Apply possibility theory to schema design for uncertain data Develop computational support for acquiring possibilistic functional dependencies that serve as meaningful input to schema design

Mining constraints from data

slide-5
SLIDE 5

Context of the Work

Apply possibility theory to schema design for uncertain data Develop computational support for acquiring possibilistic functional dependencies that serve as meaningful input to schema design

Mining constraints from data Iterative example-based acquisition (Armstrong relations)

slide-6
SLIDE 6

Context of the Work

Apply possibility theory to schema design for uncertain data Develop computational support for acquiring possibilistic functional dependencies that serve as meaningful input to schema design

Mining constraints from data Iterative example-based acquisition (Armstrong relations)

Validation of integrity requirements for uncertain data

slide-7
SLIDE 7

The Big Picture

slide-8
SLIDE 8

The Big Picture

slide-9
SLIDE 9

The Big Picture

slide-10
SLIDE 10

The Big Picture

slide-11
SLIDE 11

The Big Picture

slide-12
SLIDE 12

The Big Picture

slide-13
SLIDE 13

The Big Picture

slide-14
SLIDE 14

The Big Picture

slide-15
SLIDE 15

The Big Picture

slide-16
SLIDE 16

The Big Picture

slide-17
SLIDE 17

The Big Picture

slide-18
SLIDE 18

The Big Picture

slide-19
SLIDE 19

The Big Picture

slide-20
SLIDE 20

The Big Picture

slide-21
SLIDE 21

The Big Picture

slide-22
SLIDE 22

The Big Picture

slide-23
SLIDE 23

Possibilistic Relations and Functional Dependencies

slide-24
SLIDE 24

Possibilistic Relations and Functional Dependencies

slide-25
SLIDE 25

Possibilistic Relations and Functional Dependencies

β1-cut Σ1 MT → R α4-cut r4 Tuples with α ≥ α4

slide-26
SLIDE 26

Possibilistic Relations and Functional Dependencies

β1-cut Σ1 MT → R α4-cut r4 Tuples with α ≥ α4 BCNF for Σ1 R1 = MTR with key MT R2 = MTP with key MTP α4-lossless no α4-redundancy β1-preserving

slide-27
SLIDE 27

Possibilistic Relations and Functional Dependencies

β2-cut Σ2 MT → R, RT → P α3-cut r3 Tuples with α ≥ α3

slide-28
SLIDE 28

Possibilistic Relations and Functional Dependencies

β2-cut Σ2 MT → R, RT → P α3-cut r3 Tuples with α ≥ α3 BCNF for Σ2 R1 = RTP with key RT R2 = RTM with key MT α3-lossless no α3-redundancy β2-preserving

slide-29
SLIDE 29

Possibilistic Relations and Functional Dependencies

β3-cut Σ3 MT → R, RT → P, PT → M α2-cut r2 Tuples with α ≥ α2

slide-30
SLIDE 30

Possibilistic Relations and Functional Dependencies

β3-cut Σ3 MT → R, RT → P, PT → M α2-cut r2 Tuples with α ≥ α2 BCNF for Σ3 PTMR with keys MT, RT, PT α2-lossless no α2-redundancy β3-preserving

slide-31
SLIDE 31

Possibilistic Relations and Functional Dependencies

β4-cut Σ4 MT → R, RT → P, P → M α1-cut r1 Tuples with α = α1

slide-32
SLIDE 32

Possibilistic Relations and Functional Dependencies

β4-cut Σ4 MT → R, RT → P, P → M α1-cut r1 Tuples with α = α1 BCNF for Σ4 R1 = PM with key P R2 = MTR with key MT R3 = RTP with key RT α1-lossless no α1-redundancy β4-preserving

slide-33
SLIDE 33

Computational Support for Finding Meaningful pFDs

Definition (Possibilistic Armstrong Relation) A p-relation r over a p-schema (R, α1 > · · · > αk+1) is Armstrong for a given set Σ of pFDs over the p-schema if and only if for all pFDs ϕ over the p-schema, r satisfies ϕ if and only if Σ implies ϕ.

slide-34
SLIDE 34

Computational Support for Finding Meaningful pFDs

Definition (Possibilistic Armstrong Relation) A p-relation r over a p-schema (R, α1 > · · · > αk+1) is Armstrong for a given set Σ of pFDs over the p-schema if and only if for all pFDs ϕ over the p-schema, r satisfies ϕ if and only if Σ implies ϕ.

R = {Project, Time, Manager, Room} β1 > β2 > β3 > β4 > β5 Σ consists of: (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

slide-35
SLIDE 35

Computational Support for Finding Meaningful pFDs

Definition (Possibilistic Armstrong Relation) A p-relation r over a p-schema (R, α1 > · · · > αk+1) is Armstrong for a given set Σ of pFDs over the p-schema if and only if for all pFDs ϕ over the p-schema, r satisfies ϕ if and only if Σ implies ϕ.

R = {Project, Time, Manager, Room} β1 > β2 > β3 > β4 > β5 Σ consists of: (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Possibilistic Armstrong Relation for Σ Project Time Manager Room αi Eagle Mon, 9am Ann Aqua α1 Hippo Mon, 1pm Ann Aqua α1 Kiwi Mon, 1pm Pete Buff α1 Kiwi Tue, 2pm Pete Buff α1 Lion Tue, 4pm Gill Buff α1 Lion Wed, 9am Gill Cyan α1 Lion Wed, 11am Bob Cyan α2 Lion Wed, 11am Jack Cyan α3 Lion Wed, 11am Pam Lava α3 Tiger Wed, 11am Pam Lava α4

slide-36
SLIDE 36

Computational Support for Finding Meaningful pFDs

Definition (Possibilistic Armstrong Relation) A p-relation r over a p-schema (R, α1 > · · · > αk+1) is Armstrong for a given set Σ of pFDs over the p-schema if and only if for all pFDs ϕ over the p-schema, r satisfies ϕ if and only if Σ implies ϕ.

R = {Project, Time, Manager, Room} β1 > β2 > β3 > β4 > β5 Σ consists of: (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Possibilistic Armstrong Relation for Σ Project Time Manager Room αi Eagle Mon, 9am Ann Aqua α1 Hippo Mon, 1pm Ann Aqua α1 Kiwi Mon, 1pm Pete Buff α1 Kiwi Tue, 2pm Pete Buff α1 Lion Tue, 4pm Gill Buff α1 Lion Wed, 9am Gill Cyan α1 Lion Wed, 11am Bob Cyan α2 Lion Wed, 11am Jack Cyan α3 Lion Wed, 11am Pam Lava α3 Tiger Wed, 11am Pam Lava α4

slide-37
SLIDE 37

Computational Support for Finding Meaningful pFDs

Definition (Possibilistic Armstrong Relation) A p-relation r over a p-schema (R, α1 > · · · > αk+1) is Armstrong for a given set Σ of pFDs over the p-schema if and only if for all pFDs ϕ over the p-schema, r satisfies ϕ if and only if Σ implies ϕ.

R = {Project, Time, Manager, Room} β1 > β2 > β3 > β4 > β5 Σ consists of: (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Perhaps, (Project → Manager, β3) should hold? Possibilistic Armstrong Relation for Σ Project Time Manager Room αi Eagle Mon, 9am Ann Aqua α1 Hippo Mon, 1pm Ann Aqua α1 Kiwi Mon, 1pm Pete Buff α1 Kiwi Tue, 2pm Pete Buff α1 Lion Tue, 4pm Gill Buff α1 Lion Wed, 9am Gill Cyan α1 Lion Wed, 11am Bob Cyan α2 Lion Wed, 11am Jack Cyan α3 Lion Wed, 11am Pam Lava α3 Tiger Wed, 11am Pam Lava α4

slide-38
SLIDE 38

Computing Possibilistic Armstrong Relations

maximal sets maxΣi(R) of R for FD set Σi the subsets X ⊆ R such that for some attribute A ∈ R, X is maximal with the property that X → A is not implied by Σi

slide-39
SLIDE 39

Computing Possibilistic Armstrong Relations

maximal sets maxΣi(R) of R for FD set Σi the subsets X ⊆ R such that for some attribute A ∈ R, X is maximal with the property that X → A is not implied by Σi As classically: introduce tuples that realize maximal sets

slide-40
SLIDE 40

Computing Possibilistic Armstrong Relations

maximal sets maxΣi(R) of R for FD set Σi the subsets X ⊆ R such that for some attribute A ∈ R, X is maximal with the property that X → A is not implied by Σi As classically: introduce tuples that realize maximal sets Strategy:

For i = 1, . . . , k, compute maxi(R) for Σi − Σi−1 with Σ0 = ∅

slide-41
SLIDE 41

Computing Possibilistic Armstrong Relations

maximal sets maxΣi(R) of R for FD set Σi the subsets X ⊆ R such that for some attribute A ∈ R, X is maximal with the property that X → A is not implied by Σi As classically: introduce tuples that realize maximal sets Strategy:

For i = 1, . . . , k, compute maxi(R) for Σi − Σi−1 with Σ0 = ∅ For i = k, . . . , 1, realize sets in maxi(R) with tuples of degree αk+1−i

slide-42
SLIDE 42

Computing Possibilistic Armstrong Relations

maximal sets maxΣi(R) of R for FD set Σi the subsets X ⊆ R such that for some attribute A ∈ R, X is maximal with the property that X → A is not implied by Σi As classically: introduce tuples that realize maximal sets Strategy:

For i = 1, . . . , k, compute maxi(R) for Σi − Σi−1 with Σ0 = ∅ For i = k, . . . , 1, realize sets in maxi(R) with tuples of degree αk+1−i

Algorithm returns a p-Armstrong relation for input pFD set Σ since every maximal set of Σi is an agree set in rk+1−i

slide-43
SLIDE 43

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project Time Manager Room

slide-44
SLIDE 44

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} Time Manager Room

slide-45
SLIDE 45

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} Time {MRP} Manager Room

slide-46
SLIDE 46

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} Time {MRP} Manager {PTR} Room

slide-47
SLIDE 47

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} Time {MRP} Manager {PTR} Room {MP, PT}

slide-48
SLIDE 48

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} Time {MRP} Manager {PTR} Room {MP, PT}

slide-49
SLIDE 49

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} Time {MRP} {MRP} Manager {PTR} Room {MP, PT}

slide-50
SLIDE 50

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} Time {MRP} {MRP} Manager {PTR} {PTR} Room {MP, PT}

slide-51
SLIDE 51

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} Time {MRP} {MRP} Manager {PTR} {PTR} Room {MP, PT} {MP, PT}

slide-52
SLIDE 52

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} Time {MRP} {MRP} Manager {PTR} {PTR} Room {MP, PT} {MP, PT}

slide-53
SLIDE 53

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} Manager {PTR} {PTR} Room {MP, PT} {MP, PT}

slide-54
SLIDE 54

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} Room {MP, PT} {MP, PT}

slide-55
SLIDE 55

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} Room {MP, PT} {MP, PT} {MP, T}

slide-56
SLIDE 56

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} Room {MP, PT} {MP, PT} {MP, T}

slide-57
SLIDE 57

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} Room {MP, PT} {MP, PT} {MP, T}

slide-58
SLIDE 58

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T}

slide-59
SLIDE 59

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-60
SLIDE 60

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-61
SLIDE 61

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-62
SLIDE 62

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-63
SLIDE 63

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-64
SLIDE 64

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-65
SLIDE 65

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-66
SLIDE 66

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-67
SLIDE 67

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-68
SLIDE 68

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T}

slide-69
SLIDE 69

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T} α1

slide-70
SLIDE 70

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T} α2 α1

slide-71
SLIDE 71

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T} α3 α2 α1

slide-72
SLIDE 72

Example for Computing the Maximal Set Families

R = {Project, Time, Manager, Room} where Σ is:

(Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

Maximal set computation:

A maxΣ1(A) maxΣ2(A) maxΣ3(A) maxΣ4(A) Project {MTR} {MR, T} {MR, T} {MR, T} Time {MRP} {MRP} {MRP} {MRP} Manager {PTR} {PTR} {PR, T} {R, T} Room {MP, PT} {MP, PT} {MP, T} {MP, T} α4 α3 α2 α1

slide-73
SLIDE 73

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4)

slide-74
SLIDE 74

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi

slide-75
SLIDE 75

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1

slide-76
SLIDE 76

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1

slide-77
SLIDE 77

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1

slide-78
SLIDE 78

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1

slide-79
SLIDE 79

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1

slide-80
SLIDE 80

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1

slide-81
SLIDE 81

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1 4 6 6 5 α2

slide-82
SLIDE 82

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1 4 6 6 5 α2 4 6 7 5 α3

slide-83
SLIDE 83

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1 4 6 6 5 α2 4 6 7 5 α3 4 6 8 8 α3

slide-84
SLIDE 84

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1 4 6 6 5 α2 4 6 7 5 α3 4 6 8 8 α3 9 6 8 8 α4

slide-85
SLIDE 85

Example for Computing a Possibilistic Armstrong Relation

R = {Project, Time, Manager, Room} with Σ (Manager, Time → Room, β1) (Room, Time → Project, β2) (Project, Time → Manager, β3) (Project → Manager, β4) Computation of p-Armstrong relation max4(R): MR, T, MRP, R, MP max3(R) − max4(R): PR max2(R) − ∪j=3,4 maxj(R): PTR, PT max1(R) − ∪j=2,3,4 maxj(R): MTR Project Time Manager Room αi α1 1 1 α1 2 1 2 2 α1 2 3 2 2 α1 4 4 4 2 α1 4 5 4 5 α1 4 6 6 5 α2 4 6 7 5 α3 4 6 8 8 α3 9 6 8 8 α4 Project Time Manager Room αi Eagle Mon, 9am Ann Aqua α1 Hippo Mon, 1pm Ann Aqua α1 Kiwi Mon, 1pm Pete Buff α1 Kiwi Tue, 2pm Pete Buff α1 Lion Tue, 4pm Gill Buff α1 Lion Wed, 9am Gill Cyan α1 Lion Wed, 11am Bob Cyan α2 Lion Wed, 11am Jack Cyan α3 Lion Wed, 11am Pam Lava α3 Tiger Wed, 11am Pam Lava α4

slide-86
SLIDE 86

Experiments for fixed number k of p-degrees in schema size n

For fixed k, the output size and times are both low-degree polynomial in n (same average behavior as exhibited for certain data)

slide-87
SLIDE 87

Experiments for fixed schema size n in number k of p-degrees

For fixed n, there is logarithmic output size and constant time in k Size growth results from significant number of maximal sets realized by small k Computation time agnostic to k as each FD is visited only once

slide-88
SLIDE 88

Tool

slide-89
SLIDE 89

Summary

Perfect sample: pFDs exhibit highest certainty degree by which they are implied Computational support for identifying meaningful possibilistic FDs The latter serve as input to schema design algorithms for uncertain data Size of samples growths logarithmically in number of available degrees Computation time for samples is constant in number of available degrees Future work

How to combine sampling with discovery to identify dirty data and meaningful rules? How do human experts best benefit from the computational support?

slide-90
SLIDE 90

Some Literature

Sebastian Link, Henri Prade: Relational database schema design for uncertain data. Inf. Syst. 84: 88-110 (2019) Henning Koehler, Sebastian Link: Qualitative Cleaning of Uncertain Data. CIKM 2016: 2269-2274 Sebastian Link, Henri Prade: Possibilistic Functional Dependencies and Their Relationship to Possibility Theory. IEEE Trans. Fuzzy Syst. 24(3): 757-763 (2016) Neil Hall, Henning Koehler, Sebastian Link, Henri Prade, Xiaofang Zhou: Cardinality constraints

  • n qualitatively uncertain data. Data Knowl. Eng. 99: 126-150 (2015)

Nishita Balamuralikrishna, Yingnan Jiang, Henning Koehler, Uwe Leck, Sebastian Link, Henri Prade: Possibilistic keys. Fuzzy Sets Syst. 376: 1-36 (2019) Ziheng Wei, Sebastian Link: A Fourth Normal Form for Uncertain Data. CAiSE 2019: 295-311 Ziheng Wei, Sebastian Link: DataProf: Semantic Profiling for Iterative Data Cleansing and Business Rule Acquisition. SIGMOD Conference 2018: 1793-1796

slide-91
SLIDE 91

Contact details Professor Sebastian Link

School of Computer Science The University of Auckland s.link@auckland.ac.nz

http://www.science.auckland.ac.nz/people/ profile/s-link

Associate Dean International Science

auckland.ac.nz/en/science.html

Director of Data Science in the Home of R

auckland.ac.nz/en/study/study-options/ find-a-study-option/data-science.html r-project.org