Computer Systems Research Daniel A. Jimnez Department of Computer - PowerPoint PPT Presentation

Machine Learning in Computer Systems Research Daniel A. Jiménez Department of Computer Science & Engineering Texas A&M University

What Is This? Improving computer systems with machine learning Computer systems: Architecture / microarchitecture Programming languages / compilers / run-time Operating systems Machine learning: Using data to build a model of some aspect of a system Then using that model to improve the system Could be online or offline 2

Many Areas Have Been Explored Cache partitioning Memory controllers Branch prediction Prefetchers Voltage scaling Predicting path profiles Improving GPU throughput Resource management Microprocessor design as a whole Code scheduling Code completion Malware detection Etc. etc. etc. 3

This talk Other work in static branch prediction Some of my and others’ work in dynamic branch prediction Some of my work in cache management Other work in cache management Other work in other areas Conclude 4

Branch Prediction (my favorite!) Branch prediction is a natural problem for machine learning Dynamic conditional branch prediction – based on binary inputs with a single output, billions of training pairs Static branch prediction – big training corpus of existing programs with profile information, many ways to analyze features Many examples in the literature 5

Static Branch Prediction Calder et al ., Corpus-based Static Branch Prediction , PLDI 1995 State-of-the-art heuristics (Ball and Larus, PLDI 1993) got ~25% misprediction rate Calder et al . improved to ~20% misprediction rate Used neural networks and a large corpus of programs Features included control-flow idioms, opcodes, etc. Their TOPLAS 1997 article also used decision trees They used a few features and simple FFNNs. What would be today’s approach? 6

Predicting Path Profiles Zekany et al ., CrystalBall: Statically Analyzing Runtime Behavior via Deep Sequence Learning , MICRO 2016 Uses deep learning to statically identify hot paths through a program Output is a probable sequence of basic blocks Problem maps well to a recurrent neural network They show improvement over state of the art heuristics 7

Dynamic Branch Prediction Jiménez & Lin, Dynamic Branch Prediction with Perceptrons , HPCA 2001 We propose using neural learning in the branch predictor Simple perceptrons (individual neurons) have good accuracy Latency was addressed in subsequent research: Jiménez in MICRO 2003, ISCA 2005 Seznec’s O-GEHL in CBP, 2004 Tarjan and Skadron’s hashed perceptron in TACO 2005 Loh and Jiménez, WCED 2005 Etc. Now it’s in processors from AMD, SPARC, and Samsung 8

Branch-Predicting Neuron Inputs ( x ’s ) are from branch outcome history – taken or not taken n + 1 small integer weights ( w ’s ) learned by on-line training Output ( y ) is dot product of x ’s and w ’s ; predict taken if y ≥ 0 Training finds correlations between history and outcome 9

Accuracy Affected by Non-Linearity Perceptrons can’t compute non -linear functions Some branches have non-linear behavior AND XOR 10

Accuracy Improves With Path-Based Piecewise Linear Prediction Maintains low latency, improves accuracy [ISCA 2005,TACO 2009] Current approaches with hashing similarly overcome non-linearity perceptron prediction piecewise linear prediction

Cache Management Cache placement/replacement/bypass Prefetching 12

Placement and Promotion in PseudoLRU Caches I thought this was a really nice result The idea: LRU is boring. Place in MRU, promote to MRU Can we promote based on current position, and have a better placement heuristic? Enormous search space. We applied genetic algorithms, leading to the first pub [Jiménez, MICRO 2013]. Practical design for PseudoLRU (less hardware, less read/modify/write) Tried harder with multi-core workloads Genetic algorithm found a simple recursive algorithm for placement and promotion! [Terán & Jiménez, HPCA 2016] 13

Minimal Disturbance Promotion [HPCA 2016] To promote a block B Find smallest unprotected region containing B Move the first block in that region to MRU (i.e. do normal PLRU promotion on that block) The rest of the blocks move with that block and are now protected A minimal number of bits have been changed to protect B 0 1 0 1 1 0 1 0 1 1 1 0 0 1 1 0 1 B

Reuse Prediction Dead block prediction – predicting whether a block will be used again before it’s evicted Can be used for a variety of optimizations: Placement/Replacement Bypass Prefetching Etc. We use perceptron learning to do dead block prediction Reuse prediction sounds nicer, I’m trying to promote that term 15

Perceptron Learning for Reuse Prediction [Terán and Jiménez, MICRO 2016] Combine multiple features F 1..n Each feature indexes different table T 1..n y out = sum of counters from tables Predict dead if y out > τ Sampler provides training data Perceptron Learning Rule: if mispredict or | y out | < θ then for i ∈ 1..n h = hash(F i ) if block is dead T[ h ]++ else T[ h ]-- 16

Predictor Organization 6 tables 256 entries 6-bit weights Per-core vectors Features: Pc 0 ✓ Pc 1 >> 1 ✓ Pc 2 >> 2 ✓ Pc 3 >> 3 ✓ Tag of current block >> 4 ✓ Tag of current block >> 7 ✓ 17

Better Accuracy 80 80 Coverage rate: SDBP: % False Positive / Coverage % False Positive / Coverage 47.2% 60 60 SHiP: 43.2% Perceptron: 52.4% 40 40 False positive rate: SDBP: 7.4% 20 20 SHiP: 7.7% Perceptron: 3.2% 0 0 18 Here, false positive rate is false positives / all predictions SDBP SHiP Perceptron

Multiperspective Reuse Prediction [Jiménez and Terán, MICRO 2017] Take perceptron idea one step further Use many different features to adapt to workload behavior Huge search space; use genetic algorithm to select features Significantly improved performance over (then) state of the art One contribution was the set of parameterized feature Another is the feature selection process 19

Configuring Hardware Prefetchers Liao et al ., Machine Learning-Based Prefetch Optimization for Data Center Applications , SC 2009 Authors evaluate several classifiers to predict the best configuration of the four Intel Core 2 hardware prefetchers Nearest neighbor Naïve Bayes C4.5 decision tree Ripper classifier Support vector machines Neural (multi-layer perceptron and radial basis function) Performance within 1% of optimal configuration 20

Reinforcement Learning for Prefetching Peled et al ., Semantic Locality and Context-Based Prefetching Using Reinforcement Learning , ISCA 2015 Design a hardware prefetcher using reinforcement learning online Use “contextual bandits” model (generalization of “multi - armed bandits”) Online algorithm: Collects history data for learning, does feature selection Predicts using current context to generate prefetches Updates predictors based on observing results Outperforms SMS 21

Many More Ideas! Ipek et al ., Self-Optimizing Memory Controllers: A Reinforcement Learning Approach , ISCA 2008 Wang & Ipek, Reducing Data Movement Energy via Online Data Clustering and Encoding , MICRO 2016 Won et al ., Online Learning in Artificial Neural Networks for CMP Uncore Power Management , HPCA 2014 Rahman et al ., Maximizing Hardware Prefetch Effectiveness with Machine Learning , HPCC 2015 Dai et al ., Block2Vec: A Deep Learning Strategy on Mining Block Correlations in Storage Systems , ICPPW 2016 22

Many More Ideas! continued AbouGhazaleh et al ., Integrated CPU and L2 Cache Voltage Scaling using Machine Learning , LCTES 2007 Qiu et al ., Phase-Change Memory Optimization for Green Cloud with Genetic Algorithm , IEEE TOCS 2015 Wu et al ., GPGPU Performance and Power Estimation using Machine Learning , HPCA 2015 Bitirgen, Ipek & Martínez, Coordinated Management of Multiple Interacting Resources in Chip Multiprocessors: A Machine Learning Approach , MICRO 2008 23

Many More Ideas! continued Stanley & Mudge, A Parallel Genetic Algorithm for Multiobjective Microprocessor Design , GA 1995 Emer & Gloy, A Language for Describing Predictors and its Application to Automatic Synthesis , ISCA 1997 Gomez, Burger, & Miikkulainen, A Neuroevolution Method for Dynamic Resource Allocation On A Chip Multiprocessor , IJCNN 2001 I’m sure I’ve forgotten some people; feel free to shout out 24

Compiler etc. community too! Moss et al ., Learning to Schedule Straight-Line Code , NIPS 1997 Cavazos & Moss, Inducing Heuristics to Decide Whether to Schedule , PLDI 2004 Agakov et al ., Using Machine Learning to Focus Iterative Optimization , CGO 2006 Raychev, Vechev & Yahav, Code Completion with Statistical Language Models , PLDI 2014 Yuan et al ., Droid-Sec: Deep Learning in Android Malware Detection , SIGCOMM 2014 25

Next Steps Problems in systems research often generate lots of data Great for applying machine learning Many students are interested in machine learning esp. neural Good opportunity to convert them into architecture students! How will you apply machine learning to improving systems? Questions? Discussion? 26

Computer Systems Research Daniel A. Jimnez Department of Computer - PowerPoint PPT Presentation

Machine Learning in Computer Systems Research Daniel A. Jimnez Department of Computer Science & Engineering Texas A&M University What Is This? Improving computer systems with machine learning Computer systems: Architecture /

61A Lecture 36 Announcements Unix Computer Systems 4 Computer Systems Systems research

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

Computer Systems Research Ardalan Amiri Sani Computer systems research Finding the right

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Computer Systems Computer Systems

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

The Scalable Intracampus Research Grid for Computer Science Research: SInRG Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Introduction 2 A Modern Computer iPad Air 2 Computer Systems and Networks Spring 2017

Introduction 2 A Modern Computer iPhone XS Computer Systems and Networks Spring 2019 3

G Corner Electrical Systems Limited SYSTEMS DC Busbar Systems G Corner Electrical CORNER Systems

Research Overview of Research Overview of Nano- -Bioelectronics & Systems Bioelectronics

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Better Buildings Webinar Series Well be starting in just a few minutes. Tell us What

NetGAN without GAN: From Random Walks to Low-Rank Approximations Luca Rendsburg, Holger Heidrich,

Host Ambiguities Host of Troubles: Multiple Ho in HTTP Implementations Jianjun Chen , Jian Jiang,

How Tracking Companies Circumvented Ad Blockers Using WebSockets Muhammad Ahmad Bashir, Sajjad

rmalloc() and rpipe() a uGNI-based Distributed Remote Memory Allocator and Access Library for

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

Andrew Deason Sine Nomine Associates European AFS and Kerberos Conference 2012 Agenda Why is

Hacking challenge: steal a car! Your "local partner in crime" Sawomir Jasek Agenda

Computer Systems Research Daniel A. Jimnez Department of Computer - PowerPoint PPT Presentation

Machine Learning in Computer Systems Research Daniel A. Jimnez Department of Computer Science & Engineering Texas A&M University What Is This? Improving computer systems with machine learning Computer systems: Architecture /

61A Lecture 36 Announcements Unix Computer Systems 4 Computer Systems Systems research

Systems Systems Systems Integration Systems Integration Systems Systems Integration Systems

Types of Expert Systems Interpretation Systems Prediction Systems Diagnosis Systems

Computer Systems Research Ardalan Amiri Sani Computer systems research Finding the right

Operating Systems Operating Systems CMPSC 473 CMPSC 473 Computer Systems Computer Systems

Bracing Systems Bracing Systems 1 1 Rod Bracing Rod Bracing 2 2 Wind Bracing Systems Wind

Distributed HPC Systems ASD Distributed Memory HPC Workshop Computer Systems Group Research

CS262: Computer Vision (and Human-Computer Interaction) John Magee 1 Computer Vision How are

By Shervin Daneshpajouh Computer Arithmetic Computer Arithmetic p Computer Computer Arithmetic

The Scalable Intracampus Research Grid for Computer Science Research: SInRG Computer Science

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Introduction 2 A Modern Computer iPad Air 2 Computer Systems and Networks Spring 2017

Introduction 2 A Modern Computer iPhone XS Computer Systems and Networks Spring 2019 3

G Corner Electrical Systems Limited SYSTEMS DC Busbar Systems G Corner Electrical CORNER Systems

Research Overview of Research Overview of Nano- -Bioelectronics &amp; Systems Bioelectronics

COMP 516 COMP 516 Research Methods in Computer Science Research Methods in Computer Science

Better Buildings Webinar Series Well be starting in just a few minutes. Tell us What

NetGAN without GAN: From Random Walks to Low-Rank Approximations Luca Rendsburg, Holger Heidrich,

Host Ambiguities Host of Troubles: Multiple Ho in HTTP Implementations Jianjun Chen , Jian Jiang,

How Tracking Companies Circumvented Ad Blockers Using WebSockets Muhammad Ahmad Bashir, Sajjad

rmalloc() and rpipe() a uGNI-based Distributed Remote Memory Allocator and Access Library for

CS184b: Computer Architecture [Single Threaded Architecture: abstractions, quantification, and

Andrew Deason Sine Nomine Associates European AFS and Kerberos Conference 2012 Agenda Why is

Hacking challenge: steal a car! Your &quot;local partner in crime&quot; Sawomir Jasek Agenda

Research Overview of Research Overview of Nano- -Bioelectronics & Systems Bioelectronics

Hacking challenge: steal a car! Your "local partner in crime" Sawomir Jasek Agenda