Learning Context-dependent Label Permutations for Multi-label - PowerPoint PPT Presentation

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon Alexa AI Joint work with Young-Bum Kim, Eneldo Loza Mencía, Sunghyun Park, Ruhi Sarikaya and Johannes Fürnkranz

Mu Multi-lab label el Clas lassif ific icatio tion (MLC) • Goal : learn a function f that maps instances to a subset of labels Sea Desert Building f − − − − − − → Sky Cloud Mountain • It is important to take into account label dependencies . • Joint probability of labels L Y P ( y 1 , y 2 , · · · , y L | x ) = P ( y i | y <i , x ) i =1

Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector

Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector • Error-propagation at test time Limitations • Effect of label orders in the chain

Ma Maximi mization on of of t the j joi oint p prob obability • Traditional approaches for minimizing subset 0/1 loss : • (Probabilistic) classifier chain (Dembczyński et al., ICML 2010; Read et al., MLJ 2011) Y = {Sea, Desert, Building, Sky, Cloud, Mountain} 1. Creates a chain of L labels Desert Sea Cloud Mountain Sky Building f 1 f 2 f 3 f 4 f 5 f 6 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Desert = 0 Sea = 1 Sea = 1 Sea = 1 Sea = 1 Sea = 0 Sea = 0 Sea = 0 Sea = 0 Additional input features Cloud = 0 Cloud = 0 Cloud = 0 2. Train L independent classifiers Mountain = 1 Mountain = 1 Sky = 1 given input and partial label vector • Error-propagation at test time Limitations • Effect of label orders in the chain

Re Recurrent Neural Networks for MLC • Learning from a set of relevant labels in a sequential manner (Nam et al., NIPS 2017) • Number of relevant labels is much smaller than the total number of labels Sea Building Sky Mountain END o 1 o 2 o 3 o 4 o 5 h 0 h 1 h 2 h 3 h 4 h 5 Sea Building Sky Mountain

Re Recurrent Neural Networks for MLC • Learning from a set of relevant labels in a sequential manner (Nam et al., NIPS 2017) • Number of relevant labels is much smaller than the total number of labels Sea Building Sky Mountain END o 1 o 2 o 3 o 4 o 5 h 0 h 1 h 2 h 3 h 4 h 5 Sea Building Sky Mountain • Question : The effect of label permutation remain! How to determine the target label permutation?

Target label permutations for RNN training Ta • Static label permutation for all instances • Arbitrary label sequence randomly chosen at the beginning • Label frequency distribution: freq2rare , rare2freq • Label structures (e.g., pairwise label dependencies) ➜ Suboptimal choice; learn from only one permutation • Different label permutations for individual instances • Choosing randomly every time • Learning from all possible label permutations ➜ More robust to the effect of label permutation; Computational complexity We need MLC algorithms that learn context-dependent label permutations efficiently !

Mod Model ba based ed label bel per permut utation ⑴ ⑵ False positive True positive False negative prediction prediction prediction computing errors & label sequence sampling updating parameters 2 1 S 2 1 4 3 5 S x B x x 2 x x x x 1 x x B x 2 x 1 x 4 x 3 x 5 sampled target label true target label set : 1 2 3 4 5 2 1 4 3 5 permutation :

<latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> <latexit sha1_base64="IgrCFsrhHA8C8hGl8eyMxhFpsME=">ACfHicbVFdaxNBFJ1dv2r8aGwfBiFDZIwq6I9qVQEF8itK0hey63JnMJkNnP5i5Wwjr/or+M9/8Kb6Is0lQ23ph4Mw587HubzSylIY/vD8W7fv3L23c7/34OGjx7v9J3sntqyNkFNR6tKcbRSq0JOSZGWZ5WRmHMtT/n5+04/vZDGqrI4plUlkxwXhcqUQHJU2r+MC+Qa0yampSRsPwUbMIRDiHOkJefNhzZtJn8cXx3Aum0h1jKjGcS2ztNGHYZOR5Fjr96ovOVC/jbDwGmCr6BTdUQgi8Oj4AH3W4YG7VYUpL2B+E4XBfcBNEWDNi2Jmn/ezwvRZ3LgoRGa2dRWFHSoCEltGx7cW1lheIcF3LmYIG5tEmzDq+Fl46ZQ1YatwqCNftvR4O5taucO2eXh72udeT/tFlN2UHSqKqSRZic1FWa6ASuknAXBkpSK8cQGUeyuIJRoU5ObVcyFE1798E5y8HkfhOPr8ZnB0sI1jhz1lz1nAIvaOHbGPbMKmTLCf3jMv8IbeL/+F/8ofbay+t+3Z1fKf/sbepLAlg=</latexit> Po Policy gr gradi dient " T − 1 # X r θ J ( θ ) = E P τ r θ log P θ ( a i | s i )( R i � b ( s i )) θ i =0 Label policy distribution 2 1 S Model prediction evaluation Model parameter updates Generated label permutation: 2 1 true target label set: 1 2 3 4 5 x B x x 2 x x x x 1 x

Learning Context-dependent Label Permutations for Multi-label - PowerPoint PPT Presentation

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon Alexa AI Joint work with Young-Bum Kim, Eneldo Loza Menca, Sunghyun Park, Ruhi Sarikaya and Johannes Frnkranz Mu Multi-lab label el Clas

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

MATH 105: Finite Mathematics 6-4: Permutations Prof. Jonathan Duncan Walla Walla College Winter

Random permutations with logarithmic cycle Random permutations weights Classical measures The

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Posets and Permutations in the Duplication-Loss Model: Minimal Permutations with d Descents.

The probability of planarity of a random graph near the critical point Marc Noy, Vlady

Extracting Seeds from (Hardware) Wallets 9th of June, 2019 - Breaking Bitcoin - Charles GUILLEMET

Class logistics Object recognition and Computer Vision 2020

Operators vs Arguments The Ins and Outs of Reification Antony Galton University of Exeter, UK

Random Utility without Regularity Michel Regenwetter Department of Psychology, University of

randregret : A command for fitting random regret minimization models using Stata UK Stata

Static execution-time analysis CPU speed model Bounds on Static (Sub)program code exec time

Optimizing Mobile Performance with Real User Monitoring Brit Young Mobile Product Manager New

Learning Context-dependent Label Permutations for Multi-label - PowerPoint PPT Presentation

Learning Context-dependent Label Permutations for Multi-label Classification Jinseok Nam Amazon Alexa AI Joint work with Young-Bum Kim, Eneldo Loza Menca, Sunghyun Park, Ruhi Sarikaya and Johannes Frnkranz Mu Multi-lab label el Clas

Blue Label Pilot-plant Reactor 1 Product Line-up Platinum Label Gold Label Blue Label Blue

AG! Blue Label Bench-top Reactor 1 Product line up Platinum Label Gold Label Blue Label Blue

Extreme Classification A New Paradigm for Ranking &amp; Recommendation Manik Varma Microsoft

Why Dependent Origination? So what is dependent origination? Dependent on ignorance, there

On-line Hierarchical Multi-label Text Classification Jesse Read Supervised by Bernhard (and Eibe

MATH 105: Finite Mathematics 6-4: Permutations Prof. Jonathan Duncan Walla Walla College Winter

Random permutations with logarithmic cycle Random permutations weights Classical measures The

Work on Multi-label Classification Jesse Read Supervised by Bernhard Pfahringer

Club Med Bintan Island, Indonesia A HOLISTIC WELLNESS ESCAPE JUST OFF SINGAPORE Image label

Presentation of the label Certicold WHY A CERTICOLD LABEL? A European conformity label For

IETF 78 TPA-Label for ADSP DKIM Third-Party Authorization Label draft-otis-dkim-tpa-label By

MPLS Source Label draft-chen-mpls-source-label-02 Mach Chen, Xiaohu Xu Zhenbin Li, Luyuan Fang

On-line Hierarchical Multi-label Classification last 6 months Jesse Read jesse.read@gmail.com

Dependent Eligibility Audit Dependent Eligibility Audit Purpose: The dependent eligibility audit

Time- -dependent Similarity Measure dependent Similarity Measure Time Time-dependent Similarity

Posets and Permutations in the Duplication-Loss Model: Minimal Permutations with d Descents.

The probability of planarity of a random graph near the critical point Marc Noy, Vlady

Extracting Seeds from (Hardware) Wallets 9th of June, 2019 - Breaking Bitcoin - Charles GUILLEMET

Class logistics Object recognition and Computer Vision 2020

Operators vs Arguments The Ins and Outs of Reification Antony Galton University of Exeter, UK

Random Utility without Regularity Michel Regenwetter Department of Psychology, University of

randregret : A command for fitting random regret minimization models using Stata UK Stata

Static execution-time analysis CPU speed model Bounds on Static (Sub)program code exec time

Optimizing Mobile Performance with Real User Monitoring Brit Young Mobile Product Manager New

Extreme Classification A New Paradigm for Ranking & Recommendation Manik Varma Microsoft