machine learning algorithms for classification machine
play

Machine Learning Algorithms for Classification Machine Learning - PowerPoint PPT Presentation

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Rob


  1. Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification Rob Schapire Princeton University www.cs.princeton.edu/ ∼ schapire

  2. Machine Learning Machine Learning Machine Learning Machine Learning Machine Learning • studies how to automatically learn automatically learn automatically learn to make accurate predictions automatically learn predictions predictions predictions based automatically learn predictions on past observations • classification classification problems: classification classification classification • classify examples into given set of categories new example labeled classification machine learning training rule algorithm examples predicted classification

  3. Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems Examples of Classification Problems • text categorization • e.g.: spam filtering • e.g.: categorize news articles by topic • fraud detection • optical character recognition • natural-language processing • e.g.: part-of-speech tagging • e.g.: spoken language understanding • market segmentation • e.g.: predict if customer will respond to promotion • e.g.: predict if customer will switch to competitor • medical diagnosis . . .

  4. Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? Why Use Machine Learning? • advantages advantages advantages advantages advantages: accurate • often much more accurate accurate accurate than human-crafted rules accurate (since data driven) • humans often incapable of expressing what they know (e.g., rules of English, or how to recognize letters), but can easily classify examples • don’t need a human expert or programmer • flexible flexible flexible flexible — can apply to any learning task flexible • cheap cheap cheap cheap — can use in applications requiring many many many many classifiers cheap many (e.g., one per customer, one per product, one per web page, ...) • disadvantages disadvantages disadvantages disadvantages disadvantages labeled • need a lot of labeled labeled labeled labeled data error prone • error prone error prone error prone error prone — usually impossible to get perfect accuracy

  5. Machine Learning Algorithms Machine Learning Algorithms Machine Learning Algorithms Machine Learning Algorithms Machine Learning Algorithms • this talk this talk this talk this talk this talk: • decision trees • boosting • support-vector machines • neural networks • others not not not not not covered: • nearest neighbor algorithms • Naive Bayes • bagging . . .

  6. Decision Trees Decision Trees Decision Trees Decision Trees Decision Trees

  7. Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil Example: Good versus Evil • problem problem problem problem problem: identify people as good or bad from their appearance sex mask cape tie ears smokes class training data training data training data training data training data batman male yes yes no yes no Good robin male yes yes no no no Good alfred male no no yes no no Good penguin male no no yes no yes Bad catwoman female yes no no yes no Bad joker male no no no no no Bad test data test data test data test data test data batgirl female yes yes no yes no ?? riddler male yes no no no no ??

  8. Example (cont.) Example (cont.) Example (cont.) Example (cont.) Example (cont.) tie yes no cape smokes yes no no yes bad bad good good

  9. How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees How to Build Decision Trees • choose rule to split on • divide data using splitting rule into disjoint subsets • repeat recursively for each subset • stop when leaves are (almost) “pure”

  10. Choosing the Splitting Rule Choosing the Splitting Rule Choosing the Splitting Rule Choosing the Splitting Rule Choosing the Splitting Rule • choose rule that leads to greatest increase in “purity”: � � �� �� � � � � �� �� �� �� � � � � �� �� � � � � �� �� � � �� �� �� �� � � �� �� � � � � � � �� �� � � � � � � � � � � �� �� �� �� � � � � �� �� � � �� �� � � �� �� � � � � �� �� �� �� �� �� �� �� � � �� �� �� �� �� �� �� �� �� �� �� �� � � � � � � � � �� �� �� �� � � � � � � � � � � �� �� � � � � � � � � � � �� �� �� �� � � �� �� � � � � � � � � �� �� �� �� �� �� �� �� � � � � � � �� �� �� �� � � �� �� � � � � �� �� � � �� �� �� �� � � �� �� � � �� �� �� ��

  11. Choosing the Splitting Rule (cont.) Choosing the Splitting Rule (cont.) Choosing the Splitting Rule (cont.) Choosing the Splitting Rule (cont.) Choosing the Splitting Rule (cont.) • (im)purity measures: • entropy: − p + ln p + − p − ln p − • Gini index: p + p − where p + / p − = fraction of positive / negative examples impurity 0 1 1/2 p = 1 − p + −

  12. Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates Kinds of Error Rates • training error training error training error training error training error = fraction of training examples misclassified • test error test error test error test error test error = fraction of test examples misclassified • generalization error generalization error generalization error generalization error = probability of misclassifying new random generalization error example

  13. Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy Tree Size versus Accuracy ��� ��� ������������������������������ ������������������������������ 0.5 ��� ��� 50 ��� ��� ��� ��� ��� ��� 0.55 ���������� ���������� ��� ��� ���������� ���������� ��� ��� ���������� ���������� On test data ��� ��� ���������� ���������� ��� ��� 0.6 On training data 40 ���������� ���������� ��� ��� ���������� ���������� ��� ��� ���������� ���������� ��� ��� ��� ��� 0.65 error (%) test ��� ��� Accuracy ��� ��� ��� ��� 0.7 30 ��� ��� ��� ��� ��� ��� ��� ��� 0.75 ��� ��� ��� ��� ��� ��� ��� ��� 0.8 20 ��� ��� ��� ��� train ��� ��� ��� ��� 0.85 ��� ��� ��� ��� ��� ��� 10 ��� ��� 0.9 ��� ��� 0 50 100 tree size ������������������������������� ������������������������������� ������������������������������� ������������������������������� ������������������������������� ������������������������������� • trees must be big enough to fit training data (so that “true” patterns are fully captured) • BUT: trees that are too big may overfit overfit overfit overfit overfit (capture noise or spurious patterns in the data) • significant problem significant problem significant problem significant problem significant problem: can’t tell best tree size from training error

  14. Overfitting Example Overfitting Example Overfitting Example Overfitting Example Overfitting Example • fitting points with a polynomial underfit ideal fit overfit (degree = 1) (degree = 3) (degree = 20)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend