review
play

review CS 540 Yingyu Liang Propositional logic Logic If the - PowerPoint PPT Presentation

Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions. Several types of logic:


  1. Logic and machine learning review CS 540 Yingyu Liang

  2. Propositional logic

  3. Logic • If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions. • Several types of logic: ▪ propositional logic (Boolean logic) ▪ first order logic (first order predicate calculus) • A logic includes: ▪ syntax: what is a correctly formed sentence ▪ semantics: what is the meaning of a sentence ▪ Inference procedure (reasoning, entailment): what sentence logically follows given knowledge slide 3

  4. Propositional logic syntax ฀ AtomicSentence | ComplexSentence Sentence ฀ True | False | Symbol AtomicSentence ฀ P | Q | R | . . . Symbol ฀ Sentence ComplexSentence ( Sentence  Sentence ) | ( Sentence  Sentence ) | ( Sentence  Sentence ) | ( Sentence  Sentence ) | BNF (Backus-Naur Form) grammar in propositional logic  P  ((True  R)  Q))  S) well formed  P  Q)   S) not well formed slide 4

  5. Summary • Interpretation, semantics, knowledge base • Entailment • model checking • Inference, soundness, completeness • Inference methods • Sound inference, proof • Resolution, CNF • Chaining with Horn clauses, forward/backward chaining

  6. Example

  7. First order logic

  8. FOL syntax Summary • Short summary so far: ▪ Constants: Bob, 2, Madison, … ▪ Variables: x, y, a, b, c , … ▪ Functions: Income, Address, Sqrt, … ▪ Predicates: Teacher, Sisters, Even, Prime… ▪ Connectives:  = ▪ Equality: "$ ▪ Quantifiers: slide 9

  9. More summary • Term : constant, variable, function. Denotes an object. (A ground term has no variables) • Atom : the smallest expression assigned a truth value. Predicate and = • Sentence : an atom, sentence with connectives, sentence with quantifiers. Assigned a truth value • Well-formed formula (wff): a sentence in which all variables are quantified slide 10

  10. Example

  11. Example

  12. Machine learning basics

  13. What is machine learning? • “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T as measured by P, improves with experience E.” ------- Machine Learning , Tom Mitchell, 1997

  14. Example 1: image classification Task: determine if the image is indoor or outdoor Performance measure: probability of misclassification

  15. Example 1: image classification Experience/Data: images with labels indoor Indoor outdoor

  16. Example 1: image classification • A few terminologies • Training data: the images given for learning • Test data: the images to be classified • Binary classification: classify into two classes

  17. Example 2: clustering images Task: partition the images into 2 groups Performance: similarities within groups Data: a set of images

  18. Example 2: clustering images • A few terminologies • Unlabeled data vs labeled data • Supervised learning vs unsupervised learning

  19. Unsupervised learning

  20. Unsupervised learning • Training sample 𝑦 1 , 𝑦 2 , … , 𝑦 𝑜 • No teacher providing supervision as to how individual instances should be handled • Common tasks: • clustering, separate the n instances into groups • novelty detection, find instances that are very different from the rest • dimensionality reduction, represent each instance with a lower dimensional feature vector while maintaining key characteristics of the training samples

  21. Clustering • Group training sample into k clusters • How many clusters do you see? • Many clustering algorithms • HAC (Hierarchical Agglomerative Clustering) • k-means • …

  22. Hierarchical Agglomerative Clustering • Euclidean (L2) distance

  23. K-means algorithm • Input: x 1 … x n , k • Step 1 : select k cluster centers c 1 … c k • Step 2 : for each point x, determine its cluster: find the closest center in Euclidean space • Step 3 : update all cluster centers as the centroids c i =  {x in cluster i} x / SizeOf(cluster i) • Repeat step 2, 3 until cluster centers no longer change

  24. Example

  25. Example

  26. Supervised learning

  27. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from some unknown distribution 𝐸 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 using training data • s.t. 𝑔 correct on test data i.i.d. from distribution 𝐸 • If label 𝑧 discrete: classification • If label 𝑧 continuous: regression

  28. k-nearest-neighbor (kNN) • 1NN for little green men: Decision boundary

  29. Math formulation • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 • Find 𝑧 = 𝑔(𝑦) ∈ 𝓘 that minimizes ෠ 𝑜 𝑜 σ 𝑗=1 𝑀 𝑔 = 𝑚(𝑔, 𝑦 𝑗 , 𝑧 𝑗 ) • s.t. the expected loss is small 𝑀 𝑔 = 𝔽 𝑦,𝑧 ~𝐸 [𝑚(𝑔, 𝑦, 𝑧)] • Examples of loss functions: • 0-1 loss for classification: 𝑚 𝑔, 𝑦, 𝑧 = 𝕁[𝑔 𝑦 ≠ 𝑧] and 𝑀 𝑔 = Pr[𝑔 𝑦 ≠ 𝑧] • 𝑚 2 loss for regression: 𝑚 𝑔, 𝑦, 𝑧 = [𝑔 𝑦 − 𝑧] 2 and 𝑀 𝑔 = 𝔽[𝑔 𝑦 − 𝑧] 2

  30. Maximum likelihood Estimation (MLE) • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Let {𝑄 𝜄 𝑦, 𝑧 : 𝜄 ∈ Θ} be a family of distributions indexed by 𝜄 • MLE: negative log-likelihood loss 𝜄 𝑁𝑀 = argmax θ∈Θ σ 𝑗 log(𝑄 𝜄 𝑦 𝑗 , 𝑧 𝑗 ) 𝑚 𝑄 𝜄 , 𝑦 𝑗 , 𝑧 𝑗 = − log(𝑄 𝜄 𝑦 𝑗 , 𝑧 𝑗 ) ෠ 𝑀 𝑄 𝜄 = − σ 𝑗 log(𝑄 𝜄 𝑦 𝑗 , 𝑧 𝑗 )

  31. MLE: conditional log-likelihood • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 • Let {𝑄 𝜄 𝑧 𝑦 : 𝜄 ∈ Θ} be a family of distributions indexed by 𝜄 Only care about predicting y • MLE: negative conditional log-likelihood loss from x; do not care about p(x) 𝜄 𝑁𝑀 = argmax θ∈Θ σ 𝑗 log(𝑄 𝜄 𝑧 𝑗 |𝑦 𝑗 ) 𝑚 𝑄 𝜄 , 𝑦 𝑗 , 𝑧 𝑗 = − log(𝑄 𝜄 𝑧 𝑗 |𝑦 𝑗 ) ෠ 𝑀 𝑄 𝜄 = − σ 𝑗 log(𝑄 𝜄 𝑧 𝑗 |𝑦 𝑗 )

  32. Linear regression with regularization: Ridge regression • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 𝑥 𝑦 = 𝑥 𝑈 𝑦 that minimizes ෢ 2 • Find 𝑔 𝑀 𝑆 𝑔 𝑥 = 𝑜 ⃦𝑌𝑥 − 𝑧 ⃦ 2 • By setting the gradient to be zero, we have w = 𝑌 𝑈 𝑌 −1 𝑌 𝑈 𝑧 𝑚 2 loss: Normal + MLE

  33. Linear classification: logistic regression • Given training data 𝑦 𝑗 , 𝑧 𝑗 : 1 ≤ 𝑗 ≤ 𝑜 i.i.d. from distribution 𝐸 1 • Assume 𝑄 𝑥 (𝑧 = 1|𝑦) = 𝜏 𝑥 𝑈 𝑦 = 1+exp(−𝑥 𝑈 𝑦) 𝑥 𝑧 = 1 𝑦 = 1 − 𝜏 𝑥 𝑈 𝑦 𝑄 𝑥 𝑧 = 0 𝑦 = 1 − 𝑄 • Find 𝑥 that minimizes 𝑜 𝑀 𝑥 = − 1 ෠ 𝑜 ෍ log 𝑄 𝑥 𝑧 𝑗 𝑦 𝑗 𝑗=1

  34. Example

  35. Example

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend