Week 5 Video 4 Relationship Mining Sequential Pattern Mining - - PowerPoint PPT Presentation

week 5 video 4
SMART_READER_LITE
LIVE PREVIEW

Week 5 Video 4 Relationship Mining Sequential Pattern Mining - - PowerPoint PPT Presentation

Week 5 Video 4 Relationship Mining Sequential Pattern Mining Association Rule Mining Try to automatically find if-then rules within the data set Sequential Pattern Mining Try to automatically find temporal patterns within the data set


slide-1
SLIDE 1

Relationship Mining Sequential Pattern Mining

Week 5 Video 4

slide-2
SLIDE 2

Association Rule Mining

¨ Try to automatically find if-then rules within the

data set

slide-3
SLIDE 3

Sequential Pattern Mining

¨ Try to automatically find temporal patterns within

the data set

slide-4
SLIDE 4

ARM Example

¨ If person X buys diapers, ¨ Person X buys beer ¨ Purchases occur at the same time

slide-5
SLIDE 5

SPM Example

¨ If person X takes Intro Stats now, ¨ Person X takes Advanced Data Mining in a later

semester

¨ Conclusion: recommend Advanced Data Mining to

students who have previously taken Intro Stats

¨ Doesn’t matter if they take other courses in between

slide-6
SLIDE 6

SPM Example

¨ Learners in virtual environments have different

sequences of behavior depending on their degree of self-regulated learning

¨ High self-regulated learning: Tend to gather

information and then immediately record it carefully

¨ Low self-regulated learning: Tend to gather more

information without pausing to record it (Sabourin, Mott, & Lester, 2011)

slide-7
SLIDE 7

Different Constraints than ARM

¨ If-then elements do not need to occur in the same

data point

¨ Instead

¤ If-then elements should involve the same student (or

  • ther organizing variable, like teacher or school)

¤ If elements can be within a certain time window of each

  • ther

¤ Then element time should be within a certain window

after if times

slide-8
SLIDE 8

Sequential Pattern Mining

¨ Find all subsequences in data with high support ¨ Support calculated as number of sequences that

contain subsequence, divided by total number of sequences

slide-9
SLIDE 9

GSP (Generalized Sequential Pattern)

¨ Classic Algorithm for SPM ¨ (Srikant & Agrawal, 1996)

slide-10
SLIDE 10

Data pre-processing

¨ Data transformed from individual actions to

sequences by user

¨ Bob: {GAMING and BORED, OFF-TASK and BORED,

ON-TASK and BORED, GAMING and BORED, GAMING and FRUSTRATED, ON-TASK and BORED}

slide-11
SLIDE 11

Data pre-processing

¨ In some cases, time also included ¨ Bob: {GAMING and BORED 5:05:20, OFF-TASK

and BORED 5:05:40, ON-TASK and BORED 5:06:00, GAMING and BORED 5:06:20, GAMING and FRUSTRATED 5:06:40, ON-TASK and BORED 5:07:00}

slide-12
SLIDE 12

Algorithm

¨ Take the whole set of sequences of length 1

¤ May include “ANDed” combinations at same time

¨ Find which sequences of length 1 have support over

pre-chosen threshold

¨ Compose potential sequences out of pairs of sequences

  • f length 1 with acceptable support

¨ Find which sequences of length 2 have support over

pre-chosen threshold

¨ Compose potential sequences out of triplets of

sequences of length 1 and 2 with acceptable support

¨ Continue until no new sequences found

slide-13
SLIDE 13

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

slide-14
SLIDE 14

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f

slide-15
SLIDE 15

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-16
SLIDE 16

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-17
SLIDE 17

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-18
SLIDE 18

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-19
SLIDE 19

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-20
SLIDE 20

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-21
SLIDE 21

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-22
SLIDE 22

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-23
SLIDE 23

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-24
SLIDE 24

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-25
SLIDE 25

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-26
SLIDE 26

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-27
SLIDE 27

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-28
SLIDE 28

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-29
SLIDE 29

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac

slide-30
SLIDE 30

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac(14/40=35%)

slide-31
SLIDE 31

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae

slide-32
SLIDE 32

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad,

slide-33
SLIDE 33

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-34
SLIDE 34

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-35
SLIDE 35

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-36
SLIDE 36

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-37
SLIDE 37

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-38
SLIDE 38

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-39
SLIDE 39

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad

slide-40
SLIDE 40

Let’s execute GPS algorithm

¨ With min support = 20% ¨ Chuck: a, abc, ac, de, cef ¨ Darlene: af, ab, acd, dabc, ef ¨ Egoberto: aef, ab, aceh, d, ae ¨ Francine: a, bc, acf, d, abeg

a, b, c, d, e, f, ac, ad, ae, aad, aae, ade

slide-41
SLIDE 41

Let’s execute GPS algorithm

¨ From ¨ ac, ad, ae, aad, aae, ade ¨ To ¨ a à c, a à d, a à e, a à ad, a à ae, ad à e

slide-42
SLIDE 42

Other algorithms

¨ Free-Span ¨ Prefix-Span ¨ Select sub-sets of data to search within ¨ Faster, but same basic idea as in GPS

slide-43
SLIDE 43

Differential Sequence Mining (Kinnebrew et al., 2013)

¨ Compares the support for sequential patterns

between two groups

¨ Such as high-performing and low-performing

students

¨ To find the patterns that are much more common in

  • ne group than the other
slide-44
SLIDE 44

Process Mining

¨ Related algorithm ¨ Rather than just finding small, local patterns ¨ Tries to find overarching processes that occur over

the course of a set of events, or tries to find discrepancies in approved processes

¤ For example, do students’ self-regulatory processes

  • ver time match theoretical models? (Bannert et al.,

2014)

slide-45
SLIDE 45

Next lecture

¨ Network Analysis