Task Understanding From Confusing Multi-task Data Yizhou JIANG - - PowerPoint PPT Presentation

task understanding from confusing multi task data
SMART_READER_LITE
LIVE PREVIEW

Task Understanding From Confusing Multi-task Data Yizhou JIANG - - PowerPoint PPT Presentation

Task Understanding From Confusing Multi-task Data Yizhou JIANG Shangqi GUO Feng CHEN Xin SU Tsinghua University Tsinghua University Tsinghua University Tsinghua University 2 Motivation: From Narrow AI to AGI Narrow AI: A specific task


slide-1
SLIDE 1

Task Understanding From Confusing Multi-task Data

Xin SU Tsinghua University Yizhou JIANG Tsinghua University Shangqi GUO Tsinghua University Feng CHEN Tsinghua University

slide-2
SLIDE 2

2

Motivation: From Narrow AI to AGI

Manual Task Definition —— Do not exist in natural raw data

Multi-Task Learning :

Comprehensive problems in different semantic space

“Sweet” “Sour” “Sour”

“Lemon” “Apple”

“Apple”

“Red” “Green”

Task 1 (Color)

“Yellow”

Task Annotation Label Annotation

“Yellow” “Banana” “Sweet”

Task 2 (Name) Task 3 (Taste)

AGI Problem: How can we learn task concept from original raw data?

 Narrow AI: A specific task in the determined environment.

slide-3
SLIDE 3

3

Confusing Supervised Learning (CSL)

Confusing Data

“Red” “Green” “Yellow” “Apple” “Apple” “Banana” “Lemon” “Sweet” “Sour” “Sour”

Data De-confuse

“Red” “Green” “Yellow” “Apple” “Apple” “Banana” “Lemon” “Sweet” “Sour” “Sour”

Task Understanding (Deconfusing Function) Multi-Task Learning (Mapping Function)

Confusing Supervised Learning

“Apple” “Red” “Sweet”

 Without task annotation: Mapping conflicts between multi-task  CSL: Learning task concepts by reducing mapping conflicts

slide-4
SLIDE 4

4

Method: CSL-Net

Mapping Net Deconfusing Net min

𝑕𝑙 𝑀𝑛𝑏𝑞 𝑕𝑙 = ෍ 𝑗=1 𝑛𝑙

𝑧𝑗

𝑙 − 𝑕𝑙 𝑦𝑗 𝑙 2 , 𝑙 = 1, … , 𝑜

min

ℎ 𝑀𝑒𝑓𝑑 ℎ = ෍ 𝑗=1 𝑛

ℎ 𝑦𝑗, 𝑧𝑗 − ෠ ℎ 𝑦𝑗, 𝑧𝑗

2

Mapping-Net Training 𝑕𝑙(𝑦𝑗)

𝑦𝑗 𝑧𝑗

Deconfusing-Net

Sample Assignment

𝑦𝑗

Mapping Net

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

Multi-task Outputs

𝑴𝒏𝒃𝒒

𝑧𝑗 Deconfusing-Net Training

𝑦𝑗 𝑧𝑗

Deconfusing-Net

𝑦𝑗

Mapping Net

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

argmin

𝑙

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

Temporary Ground-truth

𝑴𝒆𝒇𝒅 ℎ(𝑦𝑗,𝑧𝑗) ෠ ℎ(𝑦𝑗,𝑧𝑗) ℎ(𝑦𝑗,𝑧𝑗)

slide-5
SLIDE 5

5

Motivation: From Narrow AI to AGI

 Narrow AI: A specific task in the determined environment.  AI Success: Exceeded human-level performance on various problems.

slide-6
SLIDE 6

6

Motivation: From Narrow AI to AGI

Manual Task Definition —— Do not exist in natural raw data

Multi-Task Learning :

Comprehensive problems in different semantic space

“Sweet” “Sour” “Sour”

“Lemon” “Apple”

“Apple”

“Red” “Green”

Task 1 (Color)

“Yellow”

Task Annotation Label Annotation

“Yellow” “Banana” “Sweet”

Task 2 (Fruit) Task 3 (Taste)

AGI Problem: How can we learn task concept from original raw data?

slide-7
SLIDE 7

7

Confusing Data

 Multi-tasks cannot be represented by a single mapping function.  Task understanding is vital for multi-task learning.

Multi-task data without Task Annotation Confusing Data:

“Red” “Green” “Yellow” “Apple” “Apple” “Banana” “Lemon” “Sweet” “Sour” “Sour”

“Apple” “Red” “Sweet”

Mixed task is confusing!

“Sweet”

slide-8
SLIDE 8

8

Comparison of Existing Methods

 Supervised Learning & Latent Variable Learning:

Mapping Confusing.

 Multi-Task Learning: Task annotation is needed.  Multi-Label Learning: Multiple labels are allocated.  Confusing Supervised Learning: No task annotation or samples allocation.

A novel learning problem!

slide-9
SLIDE 9

9

Confusing Supervised Learning (CSL)

Confusing Data

“Red” “Green” “Yellow” “Apple” “Apple” “Banana” “Lemon” “Sweet” “Sour” “Sour”

Data De-confuse

“Red” “Green” “Yellow” “Apple” “Apple” “Banana” “Lemon” “Sweet” “Sour” “Sour”

Task Understanding (Deconfusing Function) Multi-Task Learning (Mapping Function)

Confusing Supervised Learning

 Without task annotation: Mapping conflicts between multi-task

“Apple” “Red” “Sweet” Task 1 (Color) Task 2 (Fruit) Task 3 (Taste)

slide-10
SLIDE 10

10

Learning Objective: Risk Functional of CSL

Model Traditional Supervised Learning Confusing Supervised Learning Risk Functional Solution

min 𝑆 𝑕∗ > 0 min 𝑆 𝑕∗, ℎ∗ = 0

𝑌

𝑍

𝑌

𝑍

𝑌

𝑍

𝑌 𝑍

Mapping 𝑕(𝑦) Deconfusing ℎ(𝑦, 𝑔, 𝑕)

slide-11
SLIDE 11

X Y

function 1 function 2

11

Feasibility: Loss → 0

 Wrong allocation of confusing samples leads to unavoidable loss.

Loss > 0 Loss ≈ 0

X Y

Sam pl es

C onf usi ng Sam pl es

X Y

function 1 function 2 function 3

√ ×

 Task concept driven by global loss: Empirical risk should go towards 0!

slide-12
SLIDE 12

12

Training Target & CSL-Net

 Optimization Target:  Expected Result:  Constraint:

The output of Deconfusing-Net is one-hot!

 Difficulty:

Approximation of Softmax leads to a trivial solution. Joint BP is not available.

slide-13
SLIDE 13

13

Training Algorithm of CSL-Net

Training of Mapping Net Training of Deconfusing Net min

𝑕𝑙 𝑀𝑛𝑏𝑞 𝑕𝑙 = ෍ 𝑗=1 𝑛𝑙

𝑧𝑗

𝑙 − 𝑕𝑙 𝑦𝑗 𝑙 2 , 𝑙 = 1, … , 𝑜

min

ℎ 𝑀𝑒𝑓𝑑 ℎ = ෍ 𝑗=1 𝑛

ℎ 𝑦𝑗, 𝑧𝑗 − ෠ ℎ 𝑦𝑗, 𝑧𝑗

2

Mapping-Net Training 𝑕𝑙(𝑦𝑗)

𝑦𝑗 𝑧𝑗

Deconfusing-Net

Sample Assignment

𝑦𝑗

Mapping Net

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

Multi-task Outputs

𝑴𝒏𝒃𝒒

𝑧𝑗 Deconfusing-Net Training

𝑦𝑗 𝑧𝑗

Deconfusing-Net

𝑦𝑗

Mapping Net

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

argmin

𝑙

𝑕1(𝑦𝑗) 𝑕𝑜(𝑦𝑗)

Temporary Ground-truth

𝑴𝒆𝒇𝒅 ℎ(𝑦𝑗,𝑧𝑗) ෠ ℎ(𝑦𝑗,𝑧𝑗) ℎ(𝑦𝑗,𝑧𝑗)

slide-14
SLIDE 14

14

Experiment: Function Regression

 Supervised learning fails to fit multiple functions.  Incorrect task number leads to confusing fitting results.  CSL-Net learns reasonable task concepts and complete

multi-task mapping.

Results in the training process

slide-15
SLIDE 15

15

Experiment: Pattern Recognition

 Each sample represents the classification result of only one task.  Two Learning Goal:

  • Task Understanding
  • Classification of Multi-Task

 Two Evaluation Metrics:

  • Task Understanding
  • Classification of Multi-Task

Color Name Taste

“Red” “Green” “Yellow” “Lemon” “Apple” “Banana” “Sweet” “Sour” “Spicy”

slide-16
SLIDE 16

16

Experiment: Pattern Recognition

 Results on two confusing supervised datasets.

slide-17
SLIDE 17

Before After Before After

17

Experiment: Pattern Recognition

 Feature Visualization of Deconfusing Net.  Deconfusing Net could separate confusing samples to reasonable task groups.

slide-18
SLIDE 18

18

Conclusion

 A novel learning problem for general raw data:

  • Task annotation is unknown in natural raw data.
  • Understanding task concept from raw data (confusing data).

 A novel learning paradigm: Confusing Supervised Learning

  • Deconfusing Function: Samples allocation for tasks
  • Mapping Function: Multi-task mappings.
  • Global Risk Functional: Over all risk of representation for raw data.

 A novel network: CSL-Net

  • Algorithm of alternating two-stage training to realize the task constraint.

 A novel application: learning system towards general intelligence.

  • The agent autonomously defines task concepts and learns multi-task

mapping without manual task annotation.

slide-19
SLIDE 19

Thanks!

Xin Su, Tsinghua University suxin16@mails.tsinghua.edu.cn

19