SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier - PowerPoint PPT Presentation

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hüllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany eyke@upb.de TFML 2017, Krakow, 15-FEB-2017

O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation What it is about .... A general approach to Using superset learning superset learning .... for weighted learning ... 2

SUPERSET LEARNI NG ... is a specific type of weakly supervised learning , studied under different names in machine learning: - learning from partial labels - multiple label learning - learning from ambiguously labeled examples - ... ... also connected to learning from coarse data in statistics (Rubin, 1976; Heitjan and Rubin, 1991), missing values, data augmentation (Tanner and Wong, 2012). 3

SUPERSET LEARNI NG • Consider a standard setting of supervised learning with instance space X , output space Y , and hypothesis space H • Output values y n 2 Y associated with training instances x n , n = 1 , . . . , N , are not necessarily observed precisely but only characterised in terms of supersets Y n 3 y n . • Set of imprecise/ambiguous/coarse observations is denoted � O = ( x 1 , Y 1 ) , . . . , ( x N , Y N ) • An instantiation of O , denoted D , is obtained by replacing each Y n with a candidate y n ∈ Y n . 4

EXAM PLE: CLASSI FI CATI O N Classes 5

EXAM PLE: CLASSI FI CATI O N Classes one of many instantiations 6

EXAM PLE: REG RESSI O N one of infinitely many instantiations 7

DATA DI SAM BI G UATI O N How to learn from (super)set-valued data? 8

DATA DI SAM BI G UATI O N Classes 9

DATA DI SAM BI G UATI O N 12

DATA DI SAM BI G UATI O N MORE PLAUSIBLE LESS PLAUSIBLE A plausible instantiation that can be fitted A less plausible instantiation, because reasonably well with a LINEAR model! there is no LINEAR model with a good fit! 14

DATA DI SAM BI G UATI O N PLAUSIBLE PLAUSIBLE A plausible instantiation that can be fitted A plausible instantiation that can be fitted quite well with a QUADRATIC model! quite well with a QUADRATIC model! I t all depends on how y ou look at t he dat a! 15

DATA DI SAM BI G UATI O N 8 = { , } 7 6 5 4 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 16

DATA DI SAM BI G UATI O N 8 = { , } plausible 7 instantiation 6 5 quadratic 4 discriminant 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 17

DATA DI SAM BI G UATI O N 8 = { , } implausible 7 instantiation 6 5 4 3 2 1 0 − 1 − 2 − 3 − 4 − 2 0 2 4 6 8 assume both class distributions to be Gaussian 18

DATA DI SAM BI G UATI O N Model identification and data disambiguation should be performed simultaneously: identification DATA MODEL disambiguation ... quite natural from a Bayesian perspective: P ( h, D ) = P ( h ) P ( D | h ) = P ( D ) P ( h | D ) 19

O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation 20

M AXI M UM LI KELI HO O D ESTI M ATI O N Likelihood of a model h ∈ H : ` ( h ) = P ( O , D | h ) = P ( D | h ) P ( O | D , h ) = P ( D | h ) P ( O | D ) Imprecise observation only depends on true data, not on the model. precise imprecisiation imprecise generation MODEL DATA DATA coarsening 21

SUPERSET ASSUM PTI O N Imprecise data is a superset, but no other assumption. precise imprecisiation imprecise generation MODEL DATA DATA ambiguation 22

G ENERALI ZED ERM We derive a principle of generalized empirical risk minimization with the empirical risk N R emp ( h ) = 1 X L ∗ � � Y n , h ( x n ) N n =1 and the optimistic superset loss (OSL) function � L ∗ ( Y, ˆ y ) = min L ( y, ˆ y ) | y ∈ Y . how well the (precise) model fits the imprecise data 23

SPECI AL CASES 24

G ENERALI ZATI O N TO FUZZY DATA 1 1 interval fuzzy interval 25

G ENERALI ZATI O N TO FUZZY DATA LO SS 1 α α -cut 26

G ENERALI ZATI O N TO FUZZY DATA Z 1 L ∗ ⇣ ⌘ LO SS L ∗∗ ( Y, ˆ y ) = [ Y ] α , ˆ y d α 0 N R emp ( h ) = 1 L ∗∗ ⇣ ⌘ X Y n , h ( x n ) RI SK N n =1 27

G ENERALI ZATI O N TO FUZZY DATA L ∗∗ ( Y, ˆ y ) à Huber loss ! 28

G ENERALI ZATI O N TO FUZZY DATA à (generalized) Huber loss ! 29

STRUCTURED O UTPUT PREDI CTI O N Superset learning naturally applies to learning problems with structured outputs , which are often only partially specified and can then be associated with the set of all consistent completions . 30

LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A B C D 31

LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A D C B � � � (0,37,46,325,1,0) ... likes more ... reads more ... recommends more ... 32

LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: A C � (0,37,46,325,1,0) Tr aining dat a is t y pic ally inc om plet e! 33

LABEL RANKI NG ... is the problem to learn a model that maps instances to TOTAL ORDERS over a fixed set of alternatives/labels: : set of linear extensions (0,37,46,325,1,0) Tr aining dat a is t y pic ally inc om plet e! 34

LABEL RANKI NG LO SSES K E N D A L L S P E A R M A N 35

EXPERI M ENTAL STUDI ES - Cheng and H. (2015) compare an approach to label ranking based on superset learning with a state-of-the-art label ranker based on the Plackett-Luce model (PL). - Two missing label scenarios: missing at random, top-rank - General conclusion: more robust toward incompleteness authorship glass 1 0.9 0.8 0.5 0.7 0 0 30 60 0 30 60 iris pendigits 1 0.9 0.8 0.5 0.7 0 0 30 60 0 30 60 segement 1 1 0.8 0.5 0.6 0 0 30 60 0 30 60 vovel wine 1 0.95 0.5 0.9 0 0.85 36 0 30 60 0 30 60

O UTLI NE PART 1 PART 2 PART 3 Superset learning Optimistic loss Data minimization imprecisiation 37

DATA I M PRECI SI ATI O N So far: Observations are imprecise/incomplete, and we have to deal with that! Now: Deliberately turn precise into imprecise data, so as to modulate the influence of an observation on the learning process! 38

EXAM PLE W EI G HI NG 39

EXAM PLE W EI G HI NG We suggest an alternative way of weighing examples, namely, via „data imprecisiation“ ... 1 1 1 full support for precise observation 41

EXAM PLE W EI G HI NG weighing through „imprecisiation“ 43

EXAM PLE W EI G HI NG OSL l o s s weighted loss Different ways of (individually) discounting the loss function. In (Lu and H., 2015), we empirically compared standard locally weighted linear regression with this approach and essentially found no difference. 44

EXAM PLE W EI G HI NG We suggest an alternative way of weighing examples, namely, via „data imprecisiation“ ... 1 certainly positive less certainly positive 45

FUZZY M ARG I N LO SSES w=1 w=3/4 w=1/2 w=1/4 w=0 G E N E R A L I Z E D H I N G E L O S S 46

FUZZY M ARG I N LO SSES w=1 w=1 OSL weighted loss w=3/4 w=3/4 w=1/2 w=1/2 w=1/4 w=1/4 w=0 w=0 Different ways of (individually) discounting the loss function. 47

THE HAT LO SS 48

EXPERI M ENTS Robust loss minimization techniques: § Robust truncated-hinge-loss support vector machines (RSVM) trains SVMs with the a truncated version of the hinge loss in order to be more robust toward outliers and noisy data (Wu and Liu, 2007). § One-step weighted SVM (OWSVM) first trains a standard SVM. Then, it weighs each training example based on its distance to the decision boundary and retrains using the weighted hinge loss (Wu and Liu, 2013). § Our approach (FLSVM) is the same as OWSVM, except for the weighted loss: instead of using a simple weighting of the hinge loss, we use the optimistic fuzzy loss. Non-convex optimization problem solved by concave-convex procedure (Yuille and Rangaraja, 2002). 52

EXPERI M ENTAL RESULTS 53

THEO RETI CAL FO UNDATI O NS Under what conditions is (successful) learning in the superset setting actually possible? 54

THEO RETI CAL FO UNDATI O NS 55

THEO RETI CAL FO UNDATI O NS systematic imprecisiation 56

THEO RETI CAL FO UNDATI O NS non-systematicimprecisiation 57

THEO RETI CAL FO UNDATI O NS Liu and Dietterich (2014) consider the ambiguity degree , which is defined as the largest probability that a particular distractor label co-occurs with the true label in multi-class classification: n o � = sup P Y ∼ D s ( x ,y ) ( ` 2 Y ) | ( x , y ) 2 X ⇥ Y , ` 2 Y , p ( x , y ) > 0 , ` 6 = y 58

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier - PowerPoint PPT Presentation

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany eyke@upb.de TFML 2017, Krakow, 15-FEB-2017 O UTLI NE PART 1 PART 2 PART 3 Superset

Airflow Summit Advanced Apache Superset for Data Engineers A passion for building data tools!

The history and anatomy of Apache Superset Maxime Beauchemin Open source leader &

What is TypeScript, and Should You Care? TJ VanToll How big was MSUs first graduating class

Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics Erick Bauman 1 ,

Data science methods for online teaching Matthew Brett What is data science not? The

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE DATA 2 DEEP LEARNING IS

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution Xiangyang Jia (Wuhan

Marketing and CS 2. Where to advertise? TV, radio, newspaper, magazine, internet, 3. Who

Efficiently Mining Long Patterns from Databases Roberto Bayardo IBM Almaden Research Center 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Hardware Design with Generalized Arrows Adam Megacz megacz@cs.berkeley.edu 03.Oct.2011 1/51

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

ECE 3060 VLSI and Advanced Digital Design Lecture 10 Two Level Logic Minimization Motivation

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier - PowerPoint PPT Presentation

SUPERSET LEARNING AND DATA IMPRECISIATION Eyke Hllermeier Intelligent Systems Group Department of Computer Science University of Paderborn, Germany eyke@upb.de TFML 2017, Krakow, 15-FEB-2017 O UTLI NE PART 1 PART 2 PART 3 Superset

Airflow Summit Advanced Apache Superset for Data Engineers A passion for building data tools!

The history and anatomy of Apache Superset Maxime Beauchemin Open source leader &amp;

What is TypeScript, and Should You Care? TJ VanToll How big was MSUs first graduating class

Superset Disassembly: Statically Rewriting x86 Binaries Without Heuristics Erick Bauman 1 ,

Data science methods for online teaching Matthew Brett What is data science not? The

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE DATA 2 DEEP LEARNING IS

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

CS378 Introduction to Data Mining Data Exploration and Data Preprocessing Li Xiong Data

Machine Learning Anders Holst SICS Big Data Analytics Analysis Big Data Big Value Big Data

2 Preliminaries of the same size and covered by the same superset. For ex- ample, ab , ac and ad

Data Preparation Discretization Data cleaning (Data pre-processing) Data

Environmental Health Science Data Streams Data Streams Health Data Health Data Brian S.

Enhancing Reuse of Constraint Solutions to Improve Symbolic Execution Xiangyang Jia (Wuhan

Marketing and CS 2. Where to advertise? TV, radio, newspaper, magazine, internet, 3. Who

Efficiently Mining Long Patterns from Databases Roberto Bayardo IBM Almaden Research Center 1

CSE 105 THEORY OF COMPUTATION Fall 2016 http://cseweb.ucsd.edu/classes/fa16/cse105-abc/ Today's

Hardware Design with Generalized Arrows Adam Megacz megacz@cs.berkeley.edu 03.Oct.2011 1/51

ClickHouse In Real Life Case Studies and Best Practices Alexander Zaitsev, LifeStreet/Altinity

CLOSET+:Searching for the Best Strategies for Mining Frequent Closed Itemsets Jianyong Wang,

ECE 3060 VLSI and Advanced Digital Design Lecture 10 Two Level Logic Minimization Motivation

The history and anatomy of Apache Superset Maxime Beauchemin Open source leader &