Administrative Notes October 20, 2016 If youve been using a - - PowerPoint PPT Presentation

administrative notes october 20 2016
SMART_READER_LITE
LIVE PREVIEW

Administrative Notes October 20, 2016 If youve been using a - - PowerPoint PPT Presentation

Administrative Notes October 20, 2016 If youve been using a clicker and dont see your scores on Connect, send me mail with your clicker (hex!) ID there are some unclaimed/unregistered clickers Project proposal grades should


slide-1
SLIDE 1

Computational Thinking ct.cs.ubc.ca

Administrative Notes October 20, 2016

  • If you’ve been using a clicker and don’t see

your scores on Connect, send me mail with your clicker (hex!) ID – there are some unclaimed/unregistered clickers

  • Project proposal grades should be back by

Friday

  • Optional project proposal resubmission due

next Friday

slide-2
SLIDE 2

Computational Thinking ct.cs.ubc.ca

Data Mining

slide-3
SLIDE 3

Computational Thinking ct.cs.ubc.ca

3

Learning Goals

  • CT Building Block: Students will be able to demonstrate that

they understand the Apriori algorithm by describing what the

  • utput would be for a small input.
  • CT Building Block: Students will be able to create English

language descriptions of algorithms to analyze data and show how their algorithms would work on an input data set.

  • CT Application: Students will be able to use computing to

examine datasets and facilitate exploration in order to gain insight and knowledge (data and information).

  • CT Impact: Students will be able to give examples of privacy

and security issues that arise as a result of data mining

slide-4
SLIDE 4

Computational Thinking ct.cs.ubc.ca

A quote from the NY Times article

“We have the capacity to send every customer an ad booklet, specifically designed for them, that says, ‘Here’s everything you bought last week and a coupon for it,’ ” one Target executive told me. ‘We do that for grocery products all the time.’ But for pregnant women, Target’s goal was selling them baby items they didn’t even know they needed yet.”

slide-5
SLIDE 5

Computational Thinking ct.cs.ubc.ca

Target can identify pregnant women and send them individual mailings

In a group of 3-4 discuss whether you think this is cool, creepy, or both

slide-6
SLIDE 6

Computational Thinking ct.cs.ubc.ca

Target: Cool, creepy or both

Cool Creepy

They're making my life easier They're watching us (unsettling) - predicting our needs

  • e
slide-7
SLIDE 7

Computational Thinking ct.cs.ubc.ca

Target can identify pregnant women and send them individual mailings

In a group of 3-4 discuss whether you think this is cool, creepy, or both

  • A. Cool
  • B. Creepy
  • C. Both
slide-8
SLIDE 8

Computational Thinking ct.cs.ubc.ca

Things for Target to figure out

Cool Creepy

If I'm going to die - sure, why not What movies and video games you like If you have a certain medical condition based on what drugs you buy. Your one month anniversary is coming up - you didn't say you had an SO Lots of demographic information

slide-9
SLIDE 9

Computational Thinking ct.cs.ubc.ca

Group Discussion Loyalty Card Pros and Cons

In a group of 3-4, list pros and cons of loyalty cards Pros: Cons:

Most loyalty cards give you discounts/points free stuff is good! once you get one, they stop bugging you You can build a relationship with the store - sometimes you can avoid needing receipts Sometimes you have to remember them Stupid e-mails They can make it really hard for people to shop without loyalty cards Share information across different stores

slide-10
SLIDE 10

Computational Thinking ct.cs.ubc.ca

Group Discussion: Loyalty cards and credit cards

After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases? Why?

slide-11
SLIDE 11

Computational Thinking ct.cs.ubc.ca

Clicker question: Loyalty cards and credit cards

After reading these articles, are you more or less likely to use a credit card/loyalty card for purchases:

  • A. More likely
  • B. Less likely
  • C. The same
slide-12
SLIDE 12

Computational Thinking ct.cs.ubc.ca

As we discussed, cookies tell information about you. But how do pages that you’ve visited predict the future?

slide-13
SLIDE 13

Computational Thinking ct.cs.ubc.ca

Data Mining

  • Data mining is the process of looking for

patterns in large data sets

  • There are many different kinds for many

different purposes

  • We’ll do an in depth exploration of one of

them

slide-14
SLIDE 14

Computational Thinking ct.cs.ubc.ca

  • One type of data mining rules is Association Rules
  • An example rule is “people who by diapers tend to

buy beer”

  • This is useful for stores because they can improve

stock

  • They’ve also been used in many areas, including

medical diagnoses, protein sequence composition, health insurance claim analysis and census data

Association Rules

slide-15
SLIDE 15

Computational Thinking ct.cs.ubc.ca

Group exercise: list examples of what association rules could be used for

Stores could use them to prey on addictions - chips around weed locations people with spinal cord injuries tend to get pneumonia Insurance companies, sports cars --> accidents

slide-16
SLIDE 16

Computational Thinking ct.cs.ubc.ca

Here’s the plan

  • Stores keep track of all the items that people

bought at a time

  • By looking at all of the different purchases,

we can figure out which items were bought at the same time

  • Then we can figure out which one was the

“cause” and which one was the “effect”

slide-17
SLIDE 17

Computational Thinking ct.cs.ubc.ca

Each row is a transaction – one person’s grocery order So in T2 the person bought Sushi and Bread Now we need to decide whether there are any items that people tend to buy when they buy other items. We refer to this as a rule (e.g., diapers  beer is one rule (not supported by this data!)).

Let’s look at some sample data

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-18
SLIDE 18

Computational Thinking ct.cs.ubc.ca

Looking at this example, intuitively (not algorithmically) are there any rules that you think hold? If so, what? Why?

Group discussion

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

If sushi -> milk or bread - seems to hold in all cases if ramen --> milk (everyone who bought ramen also bought milk) chicken -> sushi - grocery store layout

slide-19
SLIDE 19

Computational Thinking ct.cs.ubc.ca

Informally: support measures if items appear together a lot of times Formally: A rule XY holds with support sup if sup% of transactions contain X AND Y. For example, {Chicken, Ramen, Milk} occurs with 3/7= 42% support

Support

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-20
SLIDE 20

Computational Thinking ct.cs.ubc.ca

Support question

What is the support of Sushi  Bread (express as a fraction – no need for math)? (Reminder: a rule XY holds with support sup if sup%

  • f transactions contain

X AND Y. )

  • A. 3/7
  • B. 3/4
  • C. 4/7
  • D. None of the above

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-21
SLIDE 21

Computational Thinking ct.cs.ubc.ca

Confidence

Informally: confidence measures which items suggest the others will be there, too. Formally: A rule XY holds with confidence conf% if conf% of transactions that contain X also contain Y Ramen  Milk, Chicken [conf = 3/3 = 100%] Ramen, Chicken  Milk [conf = 3/3 = 100%]

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-22
SLIDE 22

Computational Thinking ct.cs.ubc.ca

Confidence question

What is the confidence of Sushi  Chicken (express as a fraction – no need for math)? (Reminder: A rule XY holds with confidence conf% if conf% of transactions that contain X also contain Y)

  • A. 3/7
  • B. 3/4
  • C. 3/5
  • D. None of the above

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-23
SLIDE 23

Computational Thinking ct.cs.ubc.ca

So when is a rule valid?

A rule is valid if its support is above a given threshold (minimum support) and its confidence is over another given threshold (minimum confidence). A frequent itemset is a set

  • f items that has at least

minimum support In this example, chicken, milk, ramen is a frequent itemset if the minimum support is less than 3/7.

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-24
SLIDE 24

Computational Thinking ct.cs.ubc.ca

Group exercise on a piece of paper:

Create an algorithm to find itemsets with a minimum support of 3/7. Sample data to check your algorithm:

T1 Sushi, Chicken, Milk T2 Sushi, Bread T3 Bread, Vegetables T4 Sushi, Chicken, Bread T5 Sushi, Chicken, Ramen, Bread, Milk T6 Chicken, Ramen, Milk T7 Chicken, Milk, Ramen

slide-25
SLIDE 25

Computational Thinking ct.cs.ubc.ca

Swap algorithms with a group near you

Use the new algorithm to find all the frequent itemsets with support of 2/4=50% in the following data Which are frequent itemsets?

  • A. apple, corn
  • B. apple, dates
  • C. corn, dates
  • D. apple, corn, dates
  • E. All of the above

Transaction Items T1 apple, dates, rice, corn T2 corn, dates, tuna T3 apple, corn, dates, tuna T4 corn, tuna

slide-26
SLIDE 26

Computational Thinking ct.cs.ubc.ca

Did you get the other team’s algorithm to work?

  • A. Yes
  • B. No