The Goal Find patterns: local regularities that occur more often - PowerPoint PPT Presentation

The Goal • Find “patterns”: local regularities that occur more often than you would expect. Examples: • If a person buys wine at a supermarket, they also Association Rules buy cheese. (confidence: 20%) • If a person likes Lord of the Rings and Star Wars, they like Star Trek (confidence: 90%) Charles Sutton • Look like they could be used for classification, but Data Mining and Exploration Spring 2012 • There is not a single class label in mind. They can predict any attribute or a set of attributes. They are unsupervised • Not intended to be used together as a set • Often mined from very large data sets Based on slides by Chris Williams and Amos Storkey Thursday, 8 March 12 Thursday, 8 March 12 Other Examples Example Data Market basket analysis, e.g., supermarket Item • Collaborative-filtering type data: e.g., Films a person has watched Chicken Onion Rocket Caviar Haggis 1 1 1 • Rows: patients, columns: medical tests (Cabena et al, 1998) 1 1 1 Transactions • Survey data (Impact Resources, Inc., Columbus OH, 1987) 1 1 1 trip to market 1 1 1 Feature Demographic # Values Type 1 1 1 Sex 2 Categorical 2 Marital status 5 Categorical 3 Age 7 Ordinal . . . . 4 Education 6 Ordinal 5 Occupation 9 Categorical 6 Income 9 Ordinal 7 Years in Bay Area 5 Ordinal 8 Dual incomes 3 Categorical 9 Number in household 9 Ordinal 10 Number of children 9 Ordinal 11 Householder status 3 Categorical 12 Type of home 5 Categorical 13 Ethnic classification 8 Categorical These are databases that companies have already. 14 Language in home 3 Categorical Thursday, 8 March 12 Thursday, 8 March 12

Itemsets, Coverage, etc Toy Example • Call each column an attribute A 1 , A 2 , . . . A m • An item set is a set of attribute value pairs Day Outlook Temperature Humidity Wind PlayTennis D1 Sunny Hot High False No ( A i 1 = a j 1 ) ∧ ( A i 2 = a j 2 ) ∧ . . . ( A i k = a j k ) D2 Sunny Hot High True No D3 Overcast Hot High False Yes • Example: In the Play Tennis data D4 Rain Mild High False Yes ( Humidity = Normal ∧ Play = Yes ∧ Windy = False ) = D5 Rain Cool Normal False Yes D6 Rain Cool Normal True No • The support of an item set is its frequency in the data set D7 Overcast Cool Normal True Yes D8 Sunny Mild High False No • Example: D9 Sunny Cool Normal False Yes D10 Rain Mild Normal False Yes • support ( ) = 4 ( Humidity = Normal ∧ Play = Yes ∧ Windy = False ) = D11 Sunny Mild Normal True Yes D12 Overcast Mild High True Yes • The confidence of an association rule if Y=y then Z=z is D13 Overcast Hot Normal False Yes P ( Z = z | Y = y ) • Example: D14 Rain Mild High True No P ( Windy = False ∧ Play = Yes | Humidity = Normal ) = 4 / 7 Thursday, 8 March 12 Thursday, 8 March 12 Item sets to rules Finding Frequent Itemsets • First: We will find frequent item sets • Then: We convert them to rules • Task: Find all item sets with support • An itemset of size k can give rise to 2 k -1 rules • Insight: A large set can be no more frequent than • Example: itemset its subsets, e.g., support(Wind = False) ≥ support(Wind = False, Outlook = Sunny) Windy=False, Play=Yes, Humidity=Normal • Results in 7 rules including: • So search through itemsets in order of number of items IF Windy=False and Humidity=Normal THEN Play=Yes (4/4) IF Play=Yes THEN Humidity=Normal and Windy=False (4/9) IF True THEN Windy=False and Play=Yes and Humidity=Normal (4/14) • An efficient algorithm for this is APRIORI (Agarwal • We keep rules only whose confidence is greater and Srikant, 1994; Mannila et al, 1994) than a threshold Thursday, 8 March 12 Thursday, 8 March 12

APRIORI Algorithm � Single database pass is linear in | C i | n , make a pass for each i until C i is empty � Candidate formation (for binary variables) � Find all pairs of sets { U , V } from L i such that U ∪ V has i = 1 size i + 1 and test if this union is really a potential C i = {{ A }| A is a variable } candidate. O ( | L i | 3 ) while C i is not empty database pass: � Example: 5 three-item sets for each set in C i test if it is frequent (ABC), (ABD), (ACD), (ACE), (BCD) let L i be collection of frequent sets from C i Candidate four-item sets candidate formation: (ABCD) ok let C i + 1 be those sets of size i + 1 all of whose subsets are frequent (ACDE) not ok because (CDE) is not present above end while Thursday, 8 March 12 Thursday, 8 March 12 Comments • Some association rules will be trivial, some interesting. Need to sort through them • Example: pregnant => female (confidence: 1) • Also can miss “interesting but rare” rules • Example: vodka --> caviar (low support) • Really this is a type of exploratory data analysis • For rule A -->B, can be useful to compare P(B|A) to P(B) • APRIORI can be generalised to structures like subsequences and subtrees Thursday, 8 March 12

The Goal Find patterns: local regularities that occur more often - PowerPoint PPT Presentation

The Goal Find patterns: local regularities that occur more often than you would expect. Examples: If a person buys wine at a supermarket, they also Association Rules buy cheese. (confidence: 20%) If a person likes Lord of the

2025 Goal 61% of the way to our goal in 2011 2025 Goal 75% of the way to our goal

A3 - STRATEGIC PRIORITY/AREA: Date: OBJECTIVE: FOCUS/GOAL: Version: Goal GOAL TARGET

RE-CREATING HEALTHCARE Project Goal Next Steps February 27, 2019 Project Goal Project Goal Re-

December 13, 2018 1 LCAP Goals Goal 1 : College and Career Readiness Goal 2: Graduation Rate

2020-21 ANNUAL BUDGET January 6, 2020 2020-21 ADMINISTRATIVE BUDGET GOALS GOAL #1 GOAL #3 GOAL

Benchmark Data Table of Contents Student Cohort Profile Goal 1: Developmental courses

6. Goal Ascription butterfillS@ceu.hu butterfillS@ceu.hu How could pure goal ascription work? How

6. Goal Ascription butterfillS@ceu.hu butterfillS@ceu.hu How could pure goal ascription work? How

Object Detection and Tracking in 3D World Xinshuo Weng 3D Object Detection Goal Goal Inputs:

Update on 2017 Work Plan Goals Goal 1 - Increase Student Success Goal 2 - Enrichment of Academic

Local Control Accountability Plan (LCAP) Goal 1: Progress Update November 2, 2016 Board of

Disadvantaged Business Enterprise Program: Goal Setting Methodology July 26, 2019 DBE Goal

Welcome To Mindset Mastery Lesson 5.2 SUBTITLE Your One Thing. Laser Focus Your Attention align

Goal Setting Noel Wiedman, HR Partner Goal Setting Any open talent related FermiWorks inbox

Goal-Directed Design User Goals Models Goal-Directed Design Jrg Cassens References SoSe

Goal Setting Options for 2020-2022 Savings Goal Setting Previously Discussed A. Same ramp

Digital Innovations in Healthcare 2017 2006 2008 2010 2014 2011 2013 2005 2015 2012 2009

Technology Considerations for Creating Great (well good) Conversations Chris Haggis | SVP

Disclosures Nothing to Disclose Interstitial Lung Disease: Decoding the Alphabet Soup Rupal

primary care in Slovakia Michaela Machov MD SLOVAKIA EPCCS Council Meeting, Stratford Upon

GAME-BASED LEARNING Grant agreement 732332 AND THE SURMISED MOTIVATING POWER OF GAMES

The Traveling Salesman Problem, Data Parametrization and Multi-resolution Analysis Raanan Schul

On the fate of cosmic no-hair conjecture in an anisotropically inflating model Tuan Q. Do

HAIR: Hierarchical Architecture for Internet Routing Re-Architecting the Internet ReArch 09