review CS 540 Yingyu Liang Propositional logic Logic If the - - PowerPoint PPT Presentation

review
SMART_READER_LITE
LIVE PREVIEW

review CS 540 Yingyu Liang Propositional logic Logic If the - - PowerPoint PPT Presentation

Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions. Several types of logic:


slide-1
SLIDE 1

Logic and machine learning review

CS 540 Yingyu Liang

slide-2
SLIDE 2

Propositional logic

slide-3
SLIDE 3

slide 3

Logic

  • If the rules of the world are presented formally, then a

decision maker can use logical reasoning to make rational decisions.

  • Several types of logic:

▪ propositional logic (Boolean logic) ▪ first order logic (first order predicate calculus)

  • A logic includes:

▪ syntax: what is a correctly formed sentence ▪ semantics: what is the meaning of a sentence ▪ Inference procedure (reasoning, entailment): what sentence logically follows given knowledge

slide-4
SLIDE 4

slide 4

Propositional logic syntax

Sentence ฀AtomicSentence | ComplexSentence AtomicSentence ฀True | False | Symbol Symbol ฀P | Q | R | . . . ComplexSentence ฀Sentence | ( Sentence  Sentence ) | ( Sentence  Sentence ) | ( Sentence  Sentence ) | ( Sentence  Sentence )

BNF (Backus-Naur Form) grammar in propositional logic

P  ((True  R)Q))  S) well formed P  Q)   S) not well formed

slide-5
SLIDE 5

Summary

  • Interpretation, semantics, knowledge base
  • Entailment
  • model checking
  • Inference, soundness, completeness
  • Inference methods
  • Sound inference, proof
  • Resolution, CNF
  • Chaining with Horn clauses, forward/backward chaining
slide-6
SLIDE 6

Example

slide-7
SLIDE 7

First order logic

slide-8
SLIDE 8

slide 9

FOL syntax Summary

  • Short summary so far:

▪ Constants: Bob, 2, Madison, … ▪ Variables: x, y, a, b, c, … ▪ Functions: Income, Address, Sqrt, … ▪ Predicates: Teacher, Sisters, Even, Prime… ▪ Connectives:  ▪ Equality: = ▪ Quantifiers: "$

slide-9
SLIDE 9

slide 10

More summary

  • Term: constant, variable, function. Denotes an object. (A ground term

has no variables)

  • Atom: the smallest expression assigned a truth value. Predicate and =
  • Sentence: an atom, sentence with connectives, sentence with
  • quantifiers. Assigned a truth value
  • Well-formed formula (wff): a sentence in which all variables are

quantified

slide-10
SLIDE 10

Example

slide-11
SLIDE 11

Example

slide-12
SLIDE 12

Machine learning basics

slide-13
SLIDE 13

What is machine learning?

  • “A computer program is said to learn from experience E with respect

to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.”

  • ------ Machine Learning, Tom Mitchell, 1997
slide-14
SLIDE 14

Example 1: image classification

Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

slide-15
SLIDE 15

Example 1: image classification

indoor

  • utdoor

Experience/Data: images with labels Indoor

slide-16
SLIDE 16

Example 1: image classification

  • A few terminologies
  • Training data: the images given for learning
  • Test data: the images to be classified
  • Binary classification: classify into two classes
slide-17
SLIDE 17

Example 2: clustering images

Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

slide-18
SLIDE 18

Example 2: clustering images

  • A few terminologies
  • Unlabeled data vs labeled data
  • Supervised learning vs unsupervised learning
slide-19
SLIDE 19

Unsupervised learning

slide-20
SLIDE 20

Unsupervised learning

  • Training sample 𝑦1, 𝑦2, … , 𝑦𝑜
  • No teacher providing supervision as to how individual instances

should be handled

  • Common tasks:
  • clustering, separate the n instances into groups
  • novelty detection, find instances that are very different from the rest
  • dimensionality reduction, represent each instance with a lower dimensional

feature vector while maintaining key characteristics of the training samples

slide-21
SLIDE 21

Clustering

  • Group training sample into k clusters
  • How many clusters do you see?
  • Many clustering algorithms
  • HAC (Hierarchical

Agglomerative Clustering)

  • k-means
slide-22
SLIDE 22

Hierarchical Agglomerative Clustering

  • Euclidean (L2) distance
slide-23
SLIDE 23

K-means algorithm

  • Input: x1…xn, k
  • Step 1: select k cluster centers c1 … ck
  • Step 2: for each point x, determine its cluster: find the closest center

in Euclidean space

  • Step 3: update all cluster centers as the centroids

ci = {x in cluster i} x / SizeOf(cluster i)

  • Repeat step 2, 3 until cluster centers no longer change
slide-24
SLIDE 24

Example

slide-25
SLIDE 25

Example

slide-26
SLIDE 26

Supervised learning

slide-27
SLIDE 27

Math formulation

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from some unknown

distribution 𝐸

  • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data
  • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸
  • If label 𝑧 discrete: classification
  • If label 𝑧 continuous: regression
slide-28
SLIDE 28

k-nearest-neighbor (kNN)

  • 1NN for little green men:

Decision boundary

slide-29
SLIDE 29

Math formulation

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸
  • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 that minimizes ෠

𝑀 𝑔 =

1 𝑜 σ𝑗=1 𝑜

𝑚(𝑔, 𝑦𝑗, 𝑧𝑗)

  • s.t. the expected loss is small

𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸[𝑚(𝑔, 𝑦, 𝑧)]

  • Examples of loss functions:
  • 0-1 loss for classification: 𝑚 𝑔, 𝑦, 𝑧 = 𝕁[𝑔 𝑦 ≠ 𝑧] and 𝑀 𝑔 = Pr[𝑔 𝑦 ≠ 𝑧]
  • 𝑚2 loss for regression: 𝑚 𝑔, 𝑦, 𝑧 = [𝑔 𝑦 − 𝑧]2 and 𝑀 𝑔 = 𝔽[𝑔 𝑦 − 𝑧]2
slide-30
SLIDE 30

Maximum likelihood Estimation (MLE)

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸
  • Let {𝑄𝜄 𝑦, 𝑧 : 𝜄 ∈ Θ} be a family of distributions indexed by 𝜄
  • MLE: negative log-likelihood loss

𝜄𝑁𝑀 = argmaxθ∈Θ σ𝑗 log(𝑄𝜄 𝑦𝑗, 𝑧𝑗 ) 𝑚 𝑄𝜄, 𝑦𝑗, 𝑧𝑗 = − log(𝑄𝜄 𝑦𝑗, 𝑧𝑗 ) ෠ 𝑀 𝑄𝜄 = − σ𝑗 log(𝑄𝜄 𝑦𝑗, 𝑧𝑗 )

slide-31
SLIDE 31

MLE: conditional log-likelihood

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸
  • Let {𝑄𝜄 𝑧 𝑦 : 𝜄 ∈ Θ} be a family of distributions indexed by 𝜄
  • MLE: negative conditional log-likelihood loss

𝜄𝑁𝑀 = argmaxθ∈Θ σ𝑗 log(𝑄𝜄 𝑧𝑗|𝑦𝑗 ) 𝑚 𝑄𝜄, 𝑦𝑗, 𝑧𝑗 = − log(𝑄𝜄 𝑧𝑗|𝑦𝑗 ) ෠ 𝑀 𝑄𝜄 = − σ𝑗 log(𝑄𝜄 𝑧𝑗|𝑦𝑗 )

Only care about predicting y from x; do not care about p(x)

slide-32
SLIDE 32

Linear regression with regularization: Ridge regression

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸
  • Find 𝑔

𝑥 𝑦 = 𝑥𝑈𝑦 that minimizes ෢

𝑀𝑆 𝑔

𝑥 = 1 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦2 2

  • By setting the gradient to be zero, we have

w = 𝑌𝑈𝑌 −1𝑌𝑈𝑧 𝑚2 loss: Normal + MLE

slide-33
SLIDE 33

Linear classification: logistic regression

  • Given training data 𝑦𝑗, 𝑧𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸
  • Assume 𝑄

𝑥(𝑧 = 1|𝑦) = 𝜏 𝑥𝑈𝑦 = 1 1+exp(−𝑥𝑈𝑦)

𝑄

𝑥 𝑧 = 0 𝑦 = 1 − 𝑄 𝑥 𝑧 = 1 𝑦 = 1 − 𝜏 𝑥𝑈𝑦

  • Find 𝑥 that minimizes

෠ 𝑀 𝑥 = − 1 𝑜 ෍

𝑗=1 𝑜

log 𝑄

𝑥 𝑧𝑗 𝑦𝑗

slide-34
SLIDE 34

Example

slide-35
SLIDE 35

Example