cse 291d 234 data systems for machine learning
play

CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: - PowerPoint PPT Presentation

CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: Feature Engineering and Model Selection Systems DL book; Chapters 8.2 and 8.3 of MLSys book 1 Model Selection in the Lifecycle Feature Engineering Data acquisition Serving


  1. CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: Feature Engineering and Model Selection Systems DL book; Chapters 8.2 and 8.3 of MLSys book 1

  2. Model Selection in the Lifecycle Feature Engineering Data acquisition Serving Training & Inference Data preparation Monitoring Model Selection 2

  3. Model Selection in the Big Picture 3

  4. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 4

  5. Bias-Variance-Noise Decomposition ML (Test) Error = Bias + Variance + Bayes Noise Complexity of model/ Discriminability of hypothesis space examples x = (a,b,c); y = +1 vs x = (a,b,c); y = -1 5

  6. <latexit sha1_base64="3nwsg8hxgtnGtmOBHQ5mjABxB8g=">ACMnicbVDLSgMxFM34rPU16tJNsAiuyowPFdFXeiugn1Ip5RMmlDM5khuaOUod/kxi8RXOhCEbd+hOm0C9t6IHA4515y7vFjwTU4zps1N7+wuLScW8mvrq1vbNpb21UdJYqyCo1EpOo+0UxwySrAQbB6rBgJfcFqfu9y6NcemNI8knfQj1kzJB3JA04JGKl3wTn2AsJdCkR6dWglXE/SGsD7AEPmZ5w69hTvNMFolT0OHc45ZdcIpOBjxL3DEpoDHKLfvFa0c0CZkEKojWDdeJoZkSBZwKNsh7iWYxoT3SYQ1DJTFpml28gDvG6WNg0iZJwFn6t+NlIRa90PfTA5T6mlvKP7nNRIzpopl3ECTNLR0EiMER42B9uc8UoiL4hCpusmLaJYpQMC3nTQnu9MmzpHpYdI+KJ7fHhdLFuI4c2kV76AC56BSV0DUqowqi6Am9og/0aT1b79aX9T0anbPGOztoAtbPLyQXq1A=</latexit> Hypothesis Space of Functions ❖ A trained ML model is a parametric prediction function: f : D W × D X → D Y ❖ Hypothesis Space: The set of all possible functions f <latexit sha1_base64="XH3I3s1WlW/61E5YOPkn4yHCZA=">AB8nicbVDLSsNAFL2pr1pfVZdugkVwVRJRdFl02UF+4A2lMl0g6dzISZG6GEfoYbF4q49Wvc+TdO2iy0emDgcM69zLknTAQ36HlfTmltfWNzq7xd2dnd2z+oHh51jEo1ZW2qhNK9kBgmuGRt5ChYL9GMxKFg3XB6l/vdR6YNV/IBZwkLYjKWPOKUoJX6g5jghBKRNefDas2rewu4f4lfkBoUaA2rn4ORomnMJFJBjOn7XoJBRjRyKti8MkgNSwidkjHrWypJzEyQLSLP3TOrjNxIafskugv150ZGYmNmcWgn84hm1cvF/7x+itFNkHGZpMgkX4UpcJF5eb3uyOuGUxs4RQzW1Wl06IJhRtSxVbgr968l/Suaj7V3Xv/rLWuC3qKMJnMI5+HANDWhC9pAQcETvMCrg86z8+a8L0dLTrFzDL/gfHwDfEORYg=</latexit> H that can be represented by a model ❖ Training: Picks one f from hypo. space; needs estimation procedure (e.g. optimization, greedy, etc.) ❖ Factors that determine hypo. space: ❖ Feature representation ❖ Inductive bias of model ❖ Regularization 6

  7. Another View of Bias-Variance ❖ Bias arise because hypo. space does not hold “truth” ❖ Shrinking hypo. space raises bias ❖ Variance arises due to finite training sample ❖ Estimation approximately nears truth ❖ Shrinking hypo. space lowers variance 7

  8. 3 Ways to Control Learning/Accuracy ❖ Reduce Bayes Noise: ❖ Augment with new useful features from appl. ❖ Reduce Bias: ❖ Enhance hypo. space: derive different features; more complex model ❖ Reduce shrinkage (less regularization) ❖ Reduce Variance: ❖ Shrink hypo. space: derive different features; drop features; less complex model ❖ Enhance shrinkage (more regularization) 8

  9. The Double Descent Phenomenon ❖ DL and some other ML families can get arbitrarily complex ❖ Can “memorize” entire training set ❖ Curiously, variance can drop after rising; bias goes to 0! ❖ “Interpolation regime” is open question in ML theory 9 https://arxiv.org/pdf/1812.11118.pdf

  10. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 10

  11. Unpredictability of Model Selection ❖ Recall 3 ways to control ML accuracy: reduce bias, reduce variance, reduce Bayes noise ❖ Alas, the exact raises/drops in errors on given training task and sample are not predictable ❖ Need empirical comparisons of configurations on data ❖ Train-validation-test splits; cross-validation procedures 11

  12. The Model Selection Triple ❖ The data scientist/AutoML procedure must steer 3 key activities to alter the Model Selection Triple (MST): 1. Feature Engineering (FE): What is/are the domain(s) of the hypo. space(s) to consider? 2. Algorithm/Architecture Selection (AS): What exact hypo. space to use (model type/ANN architecture)? 3. Hyper-parameter Tuning (HT): How to configure hypo. space shrinkage and estimation procedure approx.? 12 https://adalabucsd.github.io/papers/2015_MSMS_SIGMODRecord.pdf

  13. The Model Selection Triple ❖ The data scientist/AutoML procedure must steer 3 key activities to explore the Model Selection Triple (MST) FE1 AS1 HT1 Train and test model config(s) FE2 AS2 HT2 on ML system … … … Next iteration Post-process and consume results ❖ Stopping criterion is application-specific / user-specific on Pareto surface: time, cost, accuracy, tiredness (!), etc. 13 https://adalabucsd.github.io/papers/2015_MSMS_SIGMODRecord.pdf

  14. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 14

  15. Feature Engineering ❖ Process of converting prepared data into a feature vector representation for ML training/inference ❖ Aka feature extraction, representation extraction, etc. ❖ Activities vary based on data type: Join and Group Bys Temporal feature Feature interactions extraction Feature selection Value recoding Dimensionality reduction 15

  16. Feature Engineering ❖ Process of converting prepared data into a feature vector representation for ML training/inference ❖ Aka feature extraction, representation extraction, etc. ❖ Activities vary based on data type: Bag of words Signal processing- N-grams based features Parsing-based features Deep learning Transfer learning 16

  17. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 17

  18. Hyperparameter Tuning ❖ Most ML models have hyper-parameter knobs Learning rate Complexity Regularization Number of trees Learning rate Max height/min split Regularization Learning rate? Dropout prob. ❖ Most of them raise bias slightly but reduce variance more ❖ No hyp.par. settings universally best for all tasks/data 18

  19. Hyperparameter Tuning ❖ Common methods to tune hyp.par. configs: Grid “Random” search search https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf 19 http://gael-varoquaux.info/science/survey-of-machine-learning-experimental-methods-at-neurips2019-and-iclr2020.html

  20. Hyperband ❖ An automated ML (AutoML) procedure for tuning hyp.par. ❖ Basic Idea: For iterative procedures (e.g., SGD), stop non- promising hyp.par. configs at earlier epochs ❖ Based on multi-armed bandit idea from gambling/RL ❖ Benefits: ❖ Reapportioning resources with early stopping may help reach better overall accuracy sooner ❖ Total resource use may be lower vs grid/random search ❖ 2 knobs as input: ❖ R: Max budget per config (e.g., # SGD epochs) ❖ \eta: Stop rate for configs 20 https://arxiv.org/pdf/1603.06560.pdf

  21. Hyperband Brackets : independent trials Akin to random search Survival of the fittest! 21 https://arxiv.org/pdf/1603.06560.pdf

  22. Hyperband R = 81; \eta = 3 n i : # hyp.par.configs run r i : # epochs per config ❖ Still not as popular as grid/random search; latter is simpler and easier to use (e.g., how to set R and \eta?) 22 https://arxiv.org/pdf/1603.06560.pdf

  23. Review Zoom Poll 23

  24. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 24

  25. Algorithm Selection ❖ Basic Goal: AutoML procedure to pick among a set of interchangeable models (hyp.par. tuning included) ❖ Automate a data scientist’s intuition on feature preprocessing, missing values, hyp.par. tuning, etc. ❖ Many heuristics: AutoWeka, AutoSKLearn, DataRobot, etc. AutoWeka 25 https://www.cs.ubc.ca/labs/beta/Projects/autoweka/papers/autoweka.pdf

  26. Algorithm Selection ❖ AutoScikitLearn uses a more sequential Bayesian optimization approach 26 http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf

  27. NAS and AutoKeras ❖ DL NCG arch. akin to model family in classical ML ❖ Some AutoML tools aim to automate NCG design too Google’ NAS uses RL AutoKeras uses to construct and Bayesian optimization evaluate NCGs and has optimized impl. ❖ Not that popular in practice; compute-intensive; hard to debug https://arxiv.org/pdf/1611.01578.pdf 27 https://arxiv.org/pdf/1806.10282.pdf

  28. Outline ❖ Recap: Bias-Variance-Noise. Decomposition ❖ The Model Selection Triple ❖ Feature Engineering ❖ Hyperparameter Tuning ❖ Algorithm/Architecture Selection ❖ Model Selection Systems ❖ Feature Engineering Systems ❖ Advanced Model Selection Systems Issues 28

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend