Administrative notes Proposal resubmissions are graded, and - - PowerPoint PPT Presentation

administrative notes
SMART_READER_LITE
LIVE PREVIEW

Administrative notes Proposal resubmissions are graded, and - - PowerPoint PPT Presentation

Administrative notes Proposal resubmissions are graded, and feedback sent. If you resubmitted your proposal and didnt receive an email, please contact your TA. The Connect grade centre has a column called Project Rubric. This


slide-1
SLIDE 1

Computational Thinking ct.cs.ubc.ca

Administrative notes

  • Proposal resubmissions are graded, and feedback
  • sent. If you resubmitted your proposal and didn’t

receive an email, please contact your TA.

  • The Connect grade centre has a column called

“Project Rubric”. This tells you which rubric we will be using to grade your project. Find your rubric at http://www.ugrad.cs.ubc.ca/~cs100/2016W2/proje ct-grading.html#projectMarkingScheme. If you have any questions about the rubric, please email your project TA (also listed on Connect).

slide-2
SLIDE 2

Computational Thinking ct.cs.ubc.ca

Administrative notes

  • Sometime within the next two weeks, we will email

you which projects you should review. Please make sure you have a working CS ID and that email forwarding for your CS email (CS_ID@ugrad.cs.ubc.ca) works (you should have set this up in Lab 0).

slide-3
SLIDE 3

Computational Thinking ct.cs.ubc.ca

Administrative notes

  • March 14: Midterm 2: this will cover all

lectures, labs and readings between Tue Jan 31 and Thu Mar 9 inclusive

  • March 17: In the News call #3
  • March 30: Project deliverables and individual

report due

slide-4
SLIDE 4

Computational Thinking ct.cs.ubc.ca

Data Mining 3

Mining by Association: The Apriori algorithm

slide-5
SLIDE 5

Computational Thinking ct.cs.ubc.ca

Learning Goals

  • [CT Building Block] Students will be able to demonstrate that

they understand the Apriori algorithm by describing what the

  • utput would be for a small input.
  • [CT Building Block] Students will be able to create English

language descriptions of algorithms to analyze data and show how their algorithms would work on an input data set.

slide-6
SLIDE 6

Computational Thinking ct.cs.ubc.ca

A quote from the NY Times article

“We have the capacity to send every customer an ad booklet, specifically designed for them, that says, ‘Here’s everything you bought last week and a coupon for it,’ ” one Target executive told me. ‘We do that for grocery products all the time.’ But for pregnant women, Target’s goal was selling them baby items they didn’t even know they needed yet.”

slide-7
SLIDE 7

Computational Thinking ct.cs.ubc.ca

Target can identify pregnant women and send them individual mailings

In a group of 3-4 discuss whether, and why, you think this is cool, creepy, or both

slide-8
SLIDE 8

Computational Thinking ct.cs.ubc.ca

Target can identify pregnant women and send them individual mailings

In a group of 3-4 discuss whether, and why, you think this is cool, creepy, or both

  • A. Cool
  • B. Creepy
  • C. Both
slide-9
SLIDE 9

Computational Thinking ct.cs.ubc.ca

Loyalty cards pros and cons Group discussion | Student responses

In a group of 3-4, list pros and cons of loyalty cards Pros:

  • Discounts
  • Being able to return stuff, reprint receipts

Cons:

  • The degree to which information is tracked – how

will it be used?

slide-10
SLIDE 10

Computational Thinking ct.cs.ubc.ca

Loyalty cards and credit cards Clicker question

After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases:

  • A. More likely
  • B. Less likely
  • C. The same
slide-11
SLIDE 11

Computational Thinking ct.cs.ubc.ca

How to predict the future? One way: Association rules

  • An association rule X à Y links two sets of

items X and Y, if the people who buy the items in X (cause) also tend to buy the items in Y (effect)

  • Example: Diapers à Beer
slide-12
SLIDE 12

Computational Thinking ct.cs.ubc.ca

How to predict the future? One way: Association rules

  • Association rules are useful for stores

because they can improve stock

  • They’ve also been used in many areas,

including medical diagnoses, protein sequence composition, health insurance claim analysis and census data

slide-13
SLIDE 13

Computational Thinking ct.cs.ubc.ca

Suggest other uses for association rules Group exercise | Student responses

  • When buying a computer, you get suggestions as to

what else to buy; thisc an be helpful, if you are a rational buyer. Not if you are a compulsive buyer!

  • Amazon can display what else people with tastes

similar to yours bought

  • Association rules enable sellers to provide a

bundled deal. That can be a win both for the seller and consumer.

slide-14
SLIDE 14

Computational Thinking ct.cs.ubc.ca

How are association rules derived?

  • Stores keep track of all the items that people

buy at a time

  • By looking at all of the different purchases, we

can figure out which items were bought at the same time

  • Then we can figure out which one was the

“cause” and which one was the “effect”

slide-15
SLIDE 15

Computational Thinking ct.cs.ubc.ca

Let’s look at some sample data

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

Each row is a transaction – one person’s grocery order. In T2 the person bought Sushi and Bread

slide-16
SLIDE 16

Computational Thinking ct.cs.ubc.ca

What association rules can you find? Why? Group discussion

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-17
SLIDE 17

Computational Thinking ct.cs.ubc.ca

Where we’re headed

  • We’ll identify two key properties of items in

transaction data that will enable us to identify valid association rules

  • These properties are called support and confidence
  • We’ll first look at support
slide-18
SLIDE 18

Computational Thinking ct.cs.ubc.ca

Support: The degree to which items appear together

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

The support of a set of items is the fraction of transactions that contain all items in the set. Here, the set {Chicken, Ramen, Milk} has support 3/7

slide-19
SLIDE 19

Computational Thinking ct.cs.ubc.ca

Support Clicker question

What is the support of {Sushi, Bread}? (Reminder: The support of a set is the fraction of transactions that contain all items in the set.)

  • A. 3/7
  • B. 3/4
  • C. 4/7
  • D. None of the above

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-20
SLIDE 20

Computational Thinking ct.cs.ubc.ca

A frequent itemset

A frequent itemset is a set of whose support is at least some specified minimum threshold. Example: If the minimum threshold is 3/7 then {Chicken, Milk, Ramen} is a frequent itemset

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-21
SLIDE 21

Computational Thinking ct.cs.ubc.ca

Back to support Group exercise

What is the support of {Apple, Corn}? (Reminder: The support of a set is the fraction of transactions that contain all items in the set.)

  • A. 1/4
  • B. 2/4
  • C. 3/4
  • D. 4/4

Transaction Items T1 apple, dates, rice, corn T2 corn, dates, tuna T3 apple, corn, dates, tuna T4 corn, tuna

slide-22
SLIDE 22

Computational Thinking ct.cs.ubc.ca

Group exercise written down:

Create an algorithm that, given as input a list of t transactions, finds all itemsets with a minimum support of s/t.

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

Sample data to check your algorithm, with the threshold set to s/t = 3/7:

slide-23
SLIDE 23

Computational Thinking ct.cs.ubc.ca

Swap algorithms with a group near you

Use the new algorithm to find all the frequent itemsets with support of 2/4 in the following data Which are the frequent itemsets?

  • A. apple, corn
  • B. apple, dates
  • C. corn, dates
  • D. apple, corn, dates
  • E. all of the above

Transaction Items T1 apple, dates, rice, corn T2 corn, dates, tuna T3 apple, corn, dates, tuna T4 corn, tuna

slide-24
SLIDE 24

Computational Thinking ct.cs.ubc.ca

Did you get the other team’s algorithm to work?

  • A. Yes
  • B. No
slide-25
SLIDE 25

Computational Thinking ct.cs.ubc.ca

Comparing algorithms

Get together with the group that you swapped algorithms with. Which algorithm would scale better as you add more items/transactions/items per transaction? Why?

slide-26
SLIDE 26

Computational Thinking ct.cs.ubc.ca

As a whole class: what are some things that could help algorithms to scale well?

slide-27
SLIDE 27

Computational Thinking ct.cs.ubc.ca

Recall: Where we’re headed

  • We’ll identify two key properties of items in

transaction data that will enable us to identify valid association rules

  • These properties are called support and confidence
  • We’ll next look at confidence
slide-28
SLIDE 28

Computational Thinking ct.cs.ubc.ca

Confidence: Which items suggest that

  • thers will be there too (cause à effect)

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

Formally: The confidence of rule XàY is the fraction of transactions containing all items in X that also contain all items in Y Both Ramen à {Milk, Chicken} and {Ramen, Chicken} à Milk have confidence 3/3 = 1

slide-29
SLIDE 29

Computational Thinking ct.cs.ubc.ca

Confidence Clicker question

What is the confidence of Sushi à Chicken? (Reminder: The confidence of rule XàY is the fraction

  • f transactions containing X that also contain Y)
  • A. 3/7
  • B. 3/4
  • C. 3/5
  • D. None of the above

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-30
SLIDE 30

Computational Thinking ct.cs.ubc.ca

When is a rule valid?

A rule X à Y is valid if X∪Y is a frequent itemset and the confidence of X à Y is at least another given threshold (minimum confidence).