EXPLOITING STRUCTURE FOR META-LEARNING NeurIPS Metalearning - - PowerPoint PPT Presentation

exploiting structure for meta learning
SMART_READER_LITE
LIVE PREVIEW

EXPLOITING STRUCTURE FOR META-LEARNING NeurIPS Metalearning - - PowerPoint PPT Presentation

EXPLOITING STRUCTURE FOR META-LEARNING NeurIPS Metalearning Workshop | December 8, 2018 Lise Getoor | UC Santa Cruz | @lgetoor STRUCTURE STRUCTURE IN STRUCTURE IN INPUTS OUTPUTS STRUCTURE IN META-LEARNING MODEL THIS TALK Structure &


slide-1
SLIDE 1

EXPLOITING STRUCTURE FOR META-LEARNING

NeurIPS Metalearning Workshop | December 8, 2018

Lise Getoor | UC Santa Cruz | @lgetoor
slide-2
SLIDE 2

STRUCTURE STRUCTURE IN INPUTS STRUCTURE IN OUTPUTS STRUCTURE IN META-LEARNING MODEL

slide-3
SLIDE 3

THIS TALK

Structure & Meta-learning

slide-4
SLIDE 4

STATISTICAL RELATIONAL LEARNING

Make use of logical structure Handle uncertainty Perform collective inference

[GETOOR & TASKAR ’07]

1 2 3

slide-5
SLIDE 5 A probabilistic programming language for collective inference

problems

  • Predicate = relationship or property
  • Ground Atom = (continuous) random variable
  • Weighted Rules = capture dependency or constraint
PSL Program = Rules + Input DB

PROBABILISTIC SOFT LOGIC (PSL)

psl.linqs.org

KEY REFERENCE: Hinge-Loss Markov Random Fields and Probabilistic Soft Logic, Stephen Bach, Matthias Broecheler, Bert Huang, Lise Getoor, JMLR 2017
slide-6
SLIDE 6

COLLECTIVE

Reasoning

  • utputs depend
  • n each other
slide-7
SLIDE 7

COLLECTIVE

Classification Pattern

local-predictor(x,l) à label(x,l) label(x,l) & link(x,y) à label(y,l)

slide-8
SLIDE 8

COLLECTIVE

Classification Pattern

local-predictor(x,l) à label(x,l) label(x,l) & link(x,y) à label(y,l)

slide-9
SLIDE 9

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND
  • r
? QUESTION:
slide-10
SLIDE 10

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND
  • r
? QUESTION:
slide-11
SLIDE 11

COLLECTIVE CLASSIFICATION

? ? ?

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND
  • r
? QUESTION:
slide-12
SLIDE 12

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND Local rules:
  • “If X donates to party P, X votes for P”
  • “If X tweets party P slogans, X votes for P”
Relational rules:
  • “If X is linked to Y, and X votes for P, Y

votes for P”

slide-13
SLIDE 13

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND

Donates(X,P) Votes(X,P)

Local rules:
  • “If X donates to party P, X votes for P”
  • “If X tweets party P slogans, X votes for P”
Relational rules:
  • “If X is linked to Y, and X votes for P, Y

votes for P”

slide-14
SLIDE 14 Local rules:
  • “If X donates to party P, X votes for P”
  • “If X tweets party P slogans, X votes for P”
Relational rules:
  • “If X is linked to Y, and X votes for P, Y

votes for P”

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND

Tweets(X,“Affordable Health”) Votes(X,“Democrat”)

slide-15
SLIDE 15

COLLECTIVE CLASSIFICATION

SPOUSE SPOUSE COLLEAGUE COLLEAGUE SPOUSE FRIEND FRIEND FRIEND FRIEND

Votes(X,P) & Friends(X,Y) Votes(Y,P) Votes(X,P) & Spouse(X,Y) Votes(Y,P)

Local rules:
  • “If X donates to party P, X votes for P”
  • “If X tweets party P slogans, X votes for P”
Relational rules:
  • “If X is linked to Y, and X votes for P, Y

votes for P”

slide-16
SLIDE 16

COLLECTIVE

Activity Recognition

inferring activities in video sequence

slide-17
SLIDE 17

ACTIVITY RECOGNITION

crossing waiting queueing walking talking dancing jogging
slide-18
SLIDE 18

COLLECTIVE

Pattern

local-predictor(x,l,f) à activity(x,l,f) activity(x,l,f) & same-frame(x,y,f) à activity(y,l,f) activity(x,l,f) & next-frame(f,f’) à activity(x,l,f’)

slide-19
SLIDE 19 Improved activity recognition in video:

EMPIRICAL HIGHLIGHTS

5 Activities 6 Activities HOG 47.4% .481 F1 59.6% .582 F1 HOG + PSL 59.8% .603 F1 79.3% .789 F1 ACD 67.5% .678 F1 83.5% .835 F1 ACD + PSL 69.2% .693 F1 86.0% .860 F1 London et al., Collective Activity Detection using Hinge-loss Markov Random Fields, CVPR WS 13
slide-20
SLIDE 20

COLLECTIVE

Stance Prediction

Inferring users’ stance in

  • nline debates
slide-21
SLIDE 21

DEBATE STANCE CLASSIFICATION

TASK:

Jointly infer users’ attitude on topics and interaction polarity TOPIC: Climate Change Pro Anti Anti Pro Disagree Disagree Disagree Agree Sridhar, Foulds, Huang, Getoor & Walker, Joint Models of Disagreement and Stance, ACL 2015 DHANYA SRIDHAR
slide-22
SLIDE 22 // local text classifiers w1: LocalPro(U,T) -> Pro(U,T) w1: LocalDisagree(U1,U2) -> Disagrees(U1,U2) //Rules for stance w2: Pro(U1,T) & Disagrees(U1,U2) -> !Pro(U2,T) w2: Pro(U1,T) & !Disagrees(U1,U2) -> Pro(U2,T) //Rules for disagreement w3: Pro(U1,T) & Pro(U1,T) -> !Disagrees(U1,U2) w3: !Pro(U1,T) & Pro(U2,T) -> Disagrees(U1,U2)

PSL FOR STANCE CLASSIFICATION

bitbucket.org/linqs/psl-joint-stance
slide-23
SLIDE 23 4FORUMS.COM ACCURACY Text-only Baseline 69.0 PSL 80.3 CREATEDEBATE.ORG ACCURACY Text-only Baseline 62.7 PSL 72.7

PREDICTING STANCE IN ONLINE FORUMS

Task: Predict post and user stance from two online debate forums
  • 4Forums.com: ~300 users,~6000 posts
  • CreateDebate.org: ~300 users, ~1200 posts
Sridhar, Foulds, Huang, Getoor & Walker, Joint Models of Disagreement and Stance, ACL 2015
slide-24
SLIDE 24

LINK

Prediction Pattern

link(x,y) & similar(y,z) à link(x,z)

slide-25
SLIDE 25

CLUSTERING

Pattern

link(x,y) & link(y,z) à link(x,z)

slide-26
SLIDE 26

MATCHING

Pattern

link(x,y) & !same(y,z) à !link(x,z)

slide-27
SLIDE 27

THIS TALK

Structure & Meta-learning

slide-28
SLIDE 28

SRL Concepts

Templated Models Weight Learning Structure Learning Latent Variables Logical rules

Meta-learning Concepts

Tied Hyperparameters Hyperparameter Optimization Feature & Algorithm Selection Landmarks Few/Zero-shot learning

SRL <-> META-LEARN

slide-29
SLIDE 29

Probabilistic programming language for defining distributions

TEMPLATING

+ =

/* Local rules */ wd: Donates(A, P) -> Votes(A, P) wt: Mentions(A, “Affordable Health”) -> Votes(A, “Democrat”) wt: Mentions(A, “Tax Cuts”) -> Votes(A, “Republican”) /* Relational rules */ ws: Votes(A,P) & Spouse(B,A) -> Votes(B,P) wf: Votes(A,P) & Friend(B,A) -> Votes(B,P) wc: Votes(A,P) & Colleague(B,A) -> Votes(B,P) /* Range constraint */ Votes(A, “Republican”) + Votes(A, “Democrat”) = 1.0 .
slide-30
SLIDE 30

LEARN

when structural patterns hold across many instantiations

slide-31
SLIDE 31

STRUCTURE LEARNING

  • Large subfield of statistical relational learning
  • Friedman et al. IJCAI 99, Getoor et al. JMLR 02, Kok & Domingos ICML05,

Mihalkova & Mooney ICML07, DeRaedt et al. MLJ 2008, Khosravi et al AAAI10, Khot et al. ICDM 11, Van Haaren et al. MLJ15, among others

  • NIPS Relational Representation Learning Workshop
  • Basic Idea
  • Search model space
  • Model space is very rich
  • Optimize parameters
  • Information theoretic criteria, likelihood-based, and Bayesian approaches
slide-32
SLIDE 32

META LEARN

when structural patterns hold across many learning tasks

slide-33
SLIDE 33

META LEARNING

Works Tasks Configurations
slide-34
SLIDE 34

META LEARNING

? ? ? Similar Works Similar Rules express:
  • “If configuration C works well for task

T1, and task T2 is similar to T1, C will work well for T2”

  • “If configuration C1 works well for task

T, and configuration C2 similar to C1, C2 will work well for T”

slide-35
SLIDE 35

META LEARNING

? ? ? Similar Works Similar Rules express:
  • “If configuration C works well for

task T1, and task T2 is similar to T1, C will work well for T2”

  • “If configuration C1 works well for task

T, and configuration C2 similar to C1, C2 will work well for T”

Works(C,T1) & SimilarTask(T1,T2) Works(C,T2)

slide-36
SLIDE 36

META LEARNING

? ? ? Similar Works Similar Rules express:
  • “If configuration C works well for task

T1, and task T2 is similar to T1, C will work well for T2”

  • “If configuration C1 works well for

task T, and configuration C2 similar to C1, C2 will work well for T”

Works(C1,T) & SimilarConfig(C1,C2) Works(C2,T)

slide-37
SLIDE 37

META-LEARNING

  • Challenge: defining similarity
  • Advantages:
  • can make use of multiple similarity measures
  • can use domain knowledge for defining task and

configuration similarity

  • Research questions:
  • Are there benefits from using this approach?
  • What are opportunities for collective reasoning?
slide-38
SLIDE 38

LANDMARKING

  • Can be described using latent variables
  • E.g., Task-Area and Learner-Expertise as latent variables
  • Research questions:
  • Are there benefits from using SRL approach?
  • What are opportunities for collective reasoning?
slide-39
SLIDE 39

ALGORITHM & MODEL SELECTION

  • Can be described using (probabilistic/soft) logical rules
  • Research questions:
  • Are there benefits from using SRL approach?
  • What are opportunities for collective reasoning?
slide-40
SLIDE 40

PIPELINE CONSTRUCTION

  • Can be described using logical rules and constraints
  • Research questions:
  • Are there benefits from using SRL approach?
  • What are opportunities for collective reasoning?
slide-41
SLIDE 41

CLOSING

slide-42
SLIDE 42

STRUCTURE AND META-LEARNING CLOSING THE LOOP

slide-43
SLIDE 43

CLOSING COMMENTS

Provided some examples of structure and collective reasoning Opportunity for Meta-Learning methods that can mix:
  • probabilistic & logical inference
  • data-driven & knowledge-driven modeling
  • Meta-modeling for meta-modeling
Compelling applications abound!

OPPORTUNITY!

slide-44
SLIDE 44

PROBABILISTIC SOFT LOGIC

THANK YOU!

Contact information: getoor@ucsc.edu

psl.linqs.org

| @lgetoor
slide-45
SLIDE 45
  • MAP Inference in PSL translates into

convex optimization problem à inference is really fast

  • Inference further enhanced with state-of-the-art optimization and

distributed graph processing paradigms àinference even faster

  • Learning methods for rule weights & latent variables
  • PSL is open-source, code, data, tutorials available online

PSL SUMMARY IN A SLIDE

psl.linqs.org