CSE 291D/234 Data Systems for Machine Learning
1
CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: - - PowerPoint PPT Presentation
CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 3: Feature Engineering and Model Selection Systems DL book; Chapters 8.2 and 8.3 of MLSys book 1 Model Selection in the Lifecycle Feature Engineering Data acquisition Serving
1
2
3
4
6
7
8
9
https://arxiv.org/pdf/1812.11118.pdf
10
11
12
https://adalabucsd.github.io/papers/2015_MSMS_SIGMODRecord.pdf
13
https://adalabucsd.github.io/papers/2015_MSMS_SIGMODRecord.pdf
14
15
16
17
18
19
https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf http://gael-varoquaux.info/science/survey-of-machine-learning-experimental-methods-at-neurips2019-and-iclr2020.html
20
https://arxiv.org/pdf/1603.06560.pdf
21
https://arxiv.org/pdf/1603.06560.pdf
22
https://arxiv.org/pdf/1603.06560.pdf
23
24
25
https://www.cs.ubc.ca/labs/beta/Projects/autoweka/papers/autoweka.pdf
26
http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf
27
https://arxiv.org/pdf/1611.01578.pdf https://arxiv.org/pdf/1806.10282.pdf
28
29
30
https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/
31
32
+ High data scalability via sharding — BSP does not converge; mini-batch level has high communication costs — Low throughput overall + High throughput model selection + Best accuracy from Sequential SGD — Low data scalability; wastes space (copy) or network (remote read)
(Dask, Hyperband, ASHA, Vizier, etc.)
(RDBMS, Spark, PS, Horovod, etc.)
33
34
35
36
37
38
39
Transfer Learning High-level Model Building APIs Optimization and Scheduling Layer Execution and Storage Layer Ablation Analysis Sequence Analysis Feature Transfer Hyperparameter Tuning
…
Architecture Search Grouped Learning Multi-task Batching
…
Model Hopper Parallelism (MOP) MOP Hybrids Materialization and Memory Manager AutoDiff and SGD Execution Scheduler Direct Filesystem Access (EXT + NFS; HDFS)
…
Cloud Native Dataflow Engines
EC2 EBS Lambda S3
Metadata Manager Fault Tolerance and Elasticity Manager
CLIs GUIs
Explanation Engine
40
https://determined.ai/
41
42
43
44
45
46
https://amplab.cs.berkeley.edu/wp-content/uploads/2017/01/ICDE_2017_CameraReady_475.pdf
47
48
Brand Tags Price
49
50
51
52
53
54
55
https://www.youtube.com/watch?v=r5aEkpEkDzI&feature=emb_title
56