Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 - PowerPoint PPT Presentation

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 Tova Milo 1 1 Tel Aviv University, Tel Aviv, Israel 2 ´ Ecole normale sup´ erieure, Paris, France 1/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Data mining Data mining – discovering interesting patterns in large databases Database – a (multi)set of transactions Transaction – a set of items (aka. an itemset) A simple kind of pattern to identify are frequent itemsets. � An itemset is frequent if it D = occurs in at least Θ = 50% { beer , diapers } , of transactions. { beer , bread , butter } , { salad } is not frequent. { beer , bread , diapers } , { beer , diapers } is { salad , tomato } frequent. Thus, { beer } is � also frequent. 2/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Human knowledge mining Standard data mining assumption: the data is materialized in a database. Sometimes, no such database exists! Leisure activities: Traditional medicine: � � D = D = { chess , saturday , garden } , { hangover , coffee } , { cinema , friday , evening } , { cough , honey } , . . . . . . � � This data only exists in the minds of people! 3/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Harvesting this data We cannot collect such data in a centralized database and use classical data mining, because: It’s impractical to ask all users to surrender their data. 1 “Let’s ask everyone to give the detail of all their activities in the last three months.” People do not remember the information. 2 “What were you doing on July 16th, 2013?” However, people remember summaries that we could access. “Do you often play tennis on weekends?” To find out if an itemset is frequent or not, we can just ask people directly. 4/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Crowdsourcing Crowdsourcing – solving hard problems through elementary queries to a crowd of users Find out if an itemset is frequent with the crowd: Draw a sample of users from the crowd. 1 (black box) Ask each user: is this itemset frequent? 2 (“Do you often play tennis on weekends?”) Corroborate the answers to eliminate bad answers. 3 (black box, see existing research) Reward the users. 4 (usually, monetary incentive, depending on the platform) ⇒ An oracle that takes an itemset and finds out if it is frequent or not by asking crowd queries. 5/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Taxonomies Having a taxonomy over the items can save us work! item sport sickness cough fever back pain tennis running biking If { sickness , sport } is infrequent then all itemsets such as { cough , biking } are infrequent too. Without the taxonomy, we need to test all combinations! Also avoids redundant itemsets like { sport , tennis } . 6/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Cost How to evaluate the performance of a strategy to identify the frequent itemsets? Crowd complexity – the number of itemsets we ask about (monetary cost, latency...) Computational complexity – the complexity of computing the next question to ask There is a tradeoff between the two: Asking random questions is computationally inexpensive but the crowd complexity is bad. Asking clever questions to obtain optimal crowd complexity is computationally expensive. 7/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus The problem We can now describe the problem: We have: A known item domain I (set of items). A known taxonomy Ψ on I (is-A relation, partial order). A crowd oracle freq to decide if an itemset is frequent or not. We want to find out, for all itemsets, whether they are frequent or infrequent, i.e., learn freq exactly. We want to achieve a good balance between crowd complexity and computational complexity. What is a good interactive algorithm to solve this problem? 8/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Table of contents Background 1 Preliminaries 2 Crowd complexity 3 Computational complexity 4 Conclusion 5 9/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Itemset taxonomy Itemsets I(Ψ) – the sets of pairwise incomparable items. (e.g. { coffee , tennis } but not { coffee , drink } .) If an itemset is frequent then its subsets are also frequent. If an itemset is frequent then itemsets with more general items are also frequent. We define an order relation � on itemsets: A � B for “ A is more general than B ”. Formally, ∀ i ∈ A , ∃ j ∈ B s.t. i is more general than j . freq is monotone: if A � B and B is frequent then A also is. 10/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Itemset taxonomy example Taxonomy Ψ Itemset taxonomy I(Ψ) Solution taxonomy S(Ψ) nil nil {nil} {item} item {chess} {drink} item {chess} {coffee} {tea} chess drink {drink} {chess} {chess} {coffee} {chess, drink} {coffee} {tea} {tea} chess drink chess {chess} coffee tea {chess, drink} {chess, drink} {coffee} {coffee, tea} drink {coffee} {tea} {tea} {chess, drink} {chess} {chess, coffee} {chess, tea} {coffee} {coffee, tea} coffee tea {tea} chess chess coffee coffee tea tea {chess, coffee} {chess, tea} {chess, drink} {tea} {coffee} {coffee, tea} {chess, coffee} {chess, coffee} {chess, tea} {chess, tea} {coffee, tea} {coffee, tea} chess {chess, coffee} coffee {chess, tea} {coffee, tea} tea {chess, coffee, tea} 11/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Maximal frequent itemsets nil item Maximal frequent itemset (MFI): a frequent itemset with no frequent chess drink descendants. Minimal infrequent itemset (MII). chess coffee tea drink The MFIs (or MIIs) concisely represent freq. chess chess coffee ⇒ We can study complexity as a coffee tea tea function of the size of the output. chess coffee tea 12/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Solution taxonomy Conversely, (we can show) any set of pairwise incomparable itemsets is a possible MFI representation. Hence, the set of all possible solutions has a similar structure to the “itemsets” of the itemset taxonomy I(Ψ). ⇒ We call this the solution taxonomy S(Ψ) = I(I(Ψ)). Identifying the freq predicate amounts to finding the correct node in S(Ψ) through itemset frequency queries. 13/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Solution taxonomy example Taxonomy Ψ Itemset taxonomy I(Ψ) Solution taxonomy S(Ψ) nil nil {nil} {item} item {chess} {drink} {chess} item {coffee} {tea} chess drink {drink} {chess} {chess} {coffee} {chess, drink} {coffee} {tea} {tea} chess drink chess {chess} coffee tea {chess, drink} {chess, drink} {coffee} {coffee, tea} drink {coffee} {tea} {tea} {chess, drink} {chess} {chess, coffee} {chess, tea} {coffee} {coffee, tea} coffee tea {tea} chess chess coffee coffee tea tea {chess, coffee} {chess, tea} {chess, drink} {tea} {coffee} {coffee, tea} {chess, coffee} {chess, coffee} {chess, tea} {chess, tea} {coffee, tea} {coffee, tea} chess {chess, coffee} coffee {chess, tea} {coffee, tea} tea {chess, coffee, tea} 14/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Table of contents Background 1 Preliminaries 2 Crowd complexity 3 Computational complexity 4 Conclusion 5 15/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Lower bound Each query yields one bit of information. Information-theoretic lower bound: we need at least Ω(log | S(Ψ) | ) queries. This is bad in general, because | S(Ψ) | can be doubly exponential in Ψ. As a function of the original taxonomy Ψ, we can write: � � 2 width [Ψ] / � Ω width[Ψ] . 16/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Upper bound 6 / 7 nil We can achieve the information-theoretic 5 / 7 a1 bound if is there always an unknown itemset that is frequent in about half of the possible 4 / 7 a2 solutions. A result from order theory shows that there is a constant δ 0 ≈ 1 / 5 such that some 3 / 7 a3 element always achieves a split of at least δ 0 . Hence, the previous bound is tight: we need 2 / 7 a4 Θ(log | S(Ψ) | ) queries. 1 / 7 a5 17/27

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Lower bound, MFI/MII nil a1 To describe the solution, we need the MFIs or the MIIs. a2 However, we need to query both the MFIs and the MIIs to identify the result uniquely: Ω( | MFI | + | MII | ) queries. � 2 | MII | � We can have | MFI | = Ω and vice-versa. a3 This bound is not tight (e.g., chain). a4 a5 18/27

Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 - PowerPoint PPT Presentation

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 Tova Milo 1 1 Tel Aviv University, Tel Aviv, Israel 2 Ecole normale sup erieure,

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Managing General and Individual Knowledge in Crowd Mining Applications Yael Amsterdamer, Susan

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane

participatory governance syros_14.07.2012 the power of the crowd some facts crowd (people)

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

TEA: Enabling State-Intensive Network Functions on Programmable Switches Daehyeok Kim

Virtual Connection & Collaboration Reframing action for sustainability Kaz McGrath | Fi

DoD and Railroads Partnering for Safe Military Loads Michael Bartosiak 9 July 2019 T R U S T E

Subgrant Program 2018 TEXAS CONFERENCE ON ENDING HOMELESSNESS SEPTEMBER 26 - 28, 2018 Agenda

Region 10 Federal Program Directors Meeting November 10, 2015 Welcome Todays Presenters

Modelling and Verification Hennessy-Milner Logic Hennessy-Milner logic Syntax and semantics

Introduction to High Performance Computing for Life Scientists Funding Partners bioexcel.eu

I-526 Petition Anatomy Chad Ellsworth, Fragomen W orldwide Jenny Thorvaldsen, IMPLAN Group LLC

Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 - PowerPoint PPT Presentation

Background Preliminaries Crowd complexity Computational complexity Conclusion Bonus Taxonomy-Based Crowd Mining Antoine Amarilli 1 , 2 Yael Amsterdamer 1 Tova Milo 1 1 Tel Aviv University, Tel Aviv, Israel 2 Ecole normale sup erieure,

NCTracks Taxonomy Presentation Agenda Taxonomy Code Information Using Taxonomy Codes in

Web Mining Web Mining Web Mining Web Mining Web mining is the use of data mining techniques

Introduction to Plant Taxonomy Introduction to Plant Taxonomy (See P. 1169) (See P. 1169)

Taxonomy Jrg Cassens Data and Process Visualization SoSe 2017 SoSe 2017 Jrg Cassens

Towards a Taxonomy of Approaches Towards a Taxonomy of Approaches for for Mining of Source Code

Utilizing Crowd Funding Utilizing Crowd Funding for Support SMEs funding for Support SMEs

How are living Taxonomy things classified? the classification of living things Taxonomy

BLOOMS TAXONOMY At the end of this workshop you will be able to: Explain what a Taxonomy

Flynns Taxonomy Prof. Mike Flynns famous taxonomy of parallel computers 1 Flynns

AmI Taxonomy AmI Taxonomy Network Characteristics of the technologies allowing devices to

Web Mining Web Mining Web mining is the use of data mining techniques to automatically

Managing General and Individual Knowledge in Crowd Mining Applications Yael Amsterdamer, Susan

How to Stand Out from the Crowd on How to Stand Out from the Crowd on LinkedIn LinkedIn Maureen

POV &amp; EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE &amp; KARNA CROWDPOWER DREAM TEAM Sloane

participatory governance syros_14.07.2012 the power of the crowd some facts crowd (people)

CrowdsFunding Gilad Ravid, PhD Crowd Sourcing Pooling Collective Knowledge Ushahidi

TEA: Enabling State-Intensive Network Functions on Programmable Switches Daehyeok Kim

Virtual Connection &amp; Collaboration Reframing action for sustainability Kaz McGrath | Fi

DoD and Railroads Partnering for Safe Military Loads Michael Bartosiak 9 July 2019 T R U S T E

Subgrant Program 2018 TEXAS CONFERENCE ON ENDING HOMELESSNESS SEPTEMBER 26 - 28, 2018 Agenda

Region 10 Federal Program Directors Meeting November 10, 2015 Welcome Todays Presenters

Modelling and Verification Hennessy-Milner Logic Hennessy-Milner logic Syntax and semantics

Introduction to High Performance Computing for Life Scientists Funding Partners bioexcel.eu

I-526 Petition Anatomy Chad Ellsworth, Fragomen W orldwide Jenny Thorvaldsen, IMPLAN Group LLC

POV & EXPERIENCE PROTOTYPES SLOANE, TINA, MARIE & KARNA CROWDPOWER DREAM TEAM Sloane

Virtual Connection & Collaboration Reframing action for sustainability Kaz McGrath | Fi