A Brief Introduction to Machine Learning (With Applications to Communications)
Osvaldo Simeone
King’s College London
11 June 2018
Osvaldo Simeone A Brief Intro to ML + Comm 1 / 126
A Brief Introduction to Machine Learning (With Applications to - - PowerPoint PPT Presentation
A Brief Introduction to Machine Learning (With Applications to Communications) Osvaldo Simeone Kings College London 11 June 2018 Osvaldo Simeone A Brief Intro to ML + Comm 1 / 126 Goals and Learning Outcomes Goals: Provide an
King’s College London
Osvaldo Simeone A Brief Intro to ML + Comm 1 / 126
◮ Provide an introduction to main areas in machine learning with a focus
◮ Offer some pointers to specific applications for telecom
◮ Recognize scenarios in which machine learning can and cannot be useful ◮ Identify specific classes of machine learning methods that apply to a
Osvaldo Simeone A Brief Intro to ML + Comm 2 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 3 / 126
◮ Acquisition of domain knowledge... Osvaldo Simeone A Brief Intro to ML + Comm 4 / 126
◮ ... mathematical (physics-based) modelling... Osvaldo Simeone A Brief Intro to ML + Comm 5 / 126
◮ ... and optimized algorithm design with performance guarantees Osvaldo Simeone A Brief Intro to ML + Comm 6 / 126
◮ Selection of a general purpose model and a learning algorithm... Osvaldo Simeone A Brief Intro to ML + Comm 7 / 126
◮ ... learning based on data (examples) and use of the trained
Osvaldo Simeone A Brief Intro to ML + Comm 8 / 126
◮ lower cost ◮ faster development ◮ reduced implementation complexity
◮ suboptimal performance ◮ lack of interpretability ◮ limited applicability Osvaldo Simeone A Brief Intro to ML + Comm 9 / 126
◮ traditional engineering flow too expensive or time-consuming ◮ the task involves a function that maps well-defined inputs to
◮ the task provides clear feedback with clearly definable goals and metrics ◮ large data sets exist or can be created containing input-output pairs ◮ the task does not involve long chains of logic or reasoning that depend
◮ the task requires does not require detailed explanations for how the
◮ the task has a tolerance for error and no need for provably correct or
◮ the phenomenon or function being learned should not change rapidly
Osvaldo Simeone A Brief Intro to ML + Comm 10 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 11 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 12 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 13 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 14 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 15 / 126
◮ PHY: Baseband signals, (multi-RAT) channel quality ◮ MAC/ Link: Throughput, FER, random access load and latency ◮ Network: Location, traffic loads across services, users’ device types,
◮ Application: Users’ preferences, content demands, computing loads,
Osvaldo Simeone A Brief Intro to ML + Comm 16 / 126
◮ Network: Mobility patterns, network-wide traffic statistics, outage rates ◮ Application: User’s behavior patterns, subscription information, service
Osvaldo Simeone A Brief Intro to ML + Comm 17 / 126
◮ traditional engineering flow too expensive or time-consuming (depends) ◮ the task involves a function that maps well-defined inputs to
◮ the task provides clear feedback with clearly definable goals and metrics
◮ the task does not involve long chains of logic or reasoning that depend
◮ the task requires does not require detailed explanations for how the
◮ the task has a tolerance for error and no need for provably correct or
◮ the phenomenon or function being learned should not change rapidly
Osvaldo Simeone A Brief Intro to ML + Comm 18 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 19 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 20 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 21 / 126
◮ regression: continuous labels ◮ classification: discrete labels Osvaldo Simeone A Brief Intro to ML + Comm 22 / 126
0.2 0.4 0.6 0.8 1
0.5 1 1.5
Osvaldo Simeone A Brief Intro to ML + Comm 23 / 126
4 5 6 7 8 9 0.5 1 1.5 2 2.5 3 3.5 4 4.5
Osvaldo Simeone A Brief Intro to ML + Comm 24 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 25 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 25 / 126
i.i.d. p(x,t), n = 1,...,N
Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126
i.i.d. p(x,t), n = 1,...,N
Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126
i.i.d. p(x,t), n = 1,...,N
Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 27 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 27 / 126
ˆ t Et∼pt|x[ℓ(t,ˆ
Osvaldo Simeone A Brief Intro to ML + Comm 28 / 126
ˆ t Et∼pt|x[ℓ(t,ˆ
Osvaldo Simeone A Brief Intro to ML + Comm 28 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 29 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 29 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 31 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 32 / 126
t=1
t=0
Osvaldo Simeone A Brief Intro to ML + Comm 33 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 34 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 35 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 36 / 126
N
n=1
Osvaldo Simeone A Brief Intro to ML + Comm 37 / 126
N
n=1
Osvaldo Simeone A Brief Intro to ML + Comm 37 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 38 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 39 / 126
◮ Model order M: Number of features Osvaldo Simeone A Brief Intro to ML + Comm 40 / 126
◮ Model order M: Number of features Osvaldo Simeone A Brief Intro to ML + Comm 40 / 126
M
m=0
t(x): polynomial of order M
0.2 0.4 0.6 0.8 1
0.5 1 1.5 Osvaldo Simeone A Brief Intro to ML + Comm 41 / 126
0.2 0.4 0.6 0.8 1
1 2 3
Osvaldo Simeone A Brief Intro to ML + Comm 42 / 126
◮ the model is not rich enough to capture the variations present in the
◮ large training loss
N
n=1
Osvaldo Simeone A Brief Intro to ML + Comm 43 / 126
0.2 0.4 0.6 0.8 1
1 2 3
Osvaldo Simeone A Brief Intro to ML + Comm 44 / 126
◮ the model is too rich and, in order to account for the observations in
◮ presumably we have a large generalization loss
Osvaldo Simeone A Brief Intro to ML + Comm 45 / 126
0.2 0.4 0.6 0.8 1
1 2 3
= 9 M M= 1 M= 3
Osvaldo Simeone A Brief Intro to ML + Comm 46 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 47 / 126
1 2 3 4 5 6 7 8 9 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
root average squared loss training generalization (via validation)
Osvaldo Simeone A Brief Intro to ML + Comm 48 / 126
1 2 3 4 5 6 7 8 9 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6
root average squared loss training generalization (via validation)
Osvaldo Simeone A Brief Intro to ML + Comm 48 / 126
10 20 30 40 50 60 70 0.2 0.4 0.6 0.8 1
Osvaldo Simeone A Brief Intro to ML + Comm 49 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 50 / 126
◮ PHY: Detection and decoding, precoding and power allocation,
◮ MAC/ Link: Radio resource allocation, scheduling, multi-RAT
◮ Network: Proactive caching ◮ Application: Computing resource allocation, content request prediction Osvaldo Simeone A Brief Intro to ML + Comm 51 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 52 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 53 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 54 / 126
[Bouchired et al ’98]
Osvaldo Simeone A Brief Intro to ML + Comm 55 / 126
[De Veciana and Zakhor '92]
Osvaldo Simeone A Brief Intro to ML + Comm 56 / 126
[Nachmani et al ‘16]
Osvaldo Simeone A Brief Intro to ML + Comm 57 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 58 / 126
(coordinates)
Osvaldo Simeone A Brief Intro to ML + Comm 59 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 60 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 61 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 62 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 63 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 64 / 126
◮ Network: Routing (classification vs look-up tables), SDN flow table
◮ Application: Cloud/ fog computing, Internet traffic classification Osvaldo Simeone A Brief Intro to ML + Comm 65 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 66 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 67 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 68 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 69 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 70 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 71 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 71 / 126
i.i.d. p(x), n = 1,...,N
Osvaldo Simeone A Brief Intro to ML + Comm 72 / 126
i.i.d. p(x), n = 1,...,N
Osvaldo Simeone A Brief Intro to ML + Comm 72 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 74 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 75 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 76 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 77 / 126
z
Osvaldo Simeone A Brief Intro to ML + Comm 78 / 126
◮ x is a document, and z is (interpreted as) topic ◮ p(z|θ) = distribution of topics ◮ p(x|z,θ) = distribution of words in document given topic
◮ Mixture of Gaussians ◮ Likelihood-free models Osvaldo Simeone A Brief Intro to ML + Comm 79 / 126
◮ x is a document, and z is (interpreted as) topic ◮ p(z|θ) = distribution of topics ◮ p(x|z,θ) = distribution of words in document given topic
◮ Mixture of Gaussians ◮ Likelihood-free models Osvaldo Simeone A Brief Intro to ML + Comm 79 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 80 / 126
◮ x is an image and z is (interpreted as) a compressed (e.g., sparse)
◮ p(z|x,θ) = compression of image to representation ◮ p(x|z,θ) = decompression of representation into an image
Osvaldo Simeone A Brief Intro to ML + Comm 81 / 126
◮ x is an image and z is (interpreted as) a compressed (e.g., sparse)
◮ p(z|x,θ) = compression of image to representation ◮ p(x|z,θ) = decompression of representation into an image
Osvaldo Simeone A Brief Intro to ML + Comm 81 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 82 / 126
θ
z
Osvaldo Simeone A Brief Intro to ML + Comm 83 / 126
θ
z
Osvaldo Simeone A Brief Intro to ML + Comm 83 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 84 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 84 / 126
1 2 3 4
Log-likelihood
ELBO ( 0= 3) ELBO ( 0 = 2) LL
Osvaldo Simeone A Brief Intro to ML + Comm 85 / 126
LL
new
Osvaldo Simeone A Brief Intro to ML + Comm 86 / 126
◮ E step: For fixed parameter vector θ old,
q
◮ M step: For fixed variational distribution qnew(z),
θ
θ
Osvaldo Simeone A Brief Intro to ML + Comm 87 / 126
◮ E step: For fixed parameter vector θ old,
q
◮ M step: For fixed variational distribution qnew(z),
θ
θ
Osvaldo Simeone A Brief Intro to ML + Comm 88 / 126
LL
new
Osvaldo Simeone A Brief Intro to ML + Comm 89 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 90 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 91 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 92 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 93 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 94 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 95 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 96 / 126
◮ E step: Parametrize the variational distribution q(z|ϕ) or q(z|x,ϕ) and
◮ M step: Approximate Ez∼qnew(z) [lnp(x,z|θ)] via Monte Carlo ◮ Use gradient descent for E and/or M steps Osvaldo Simeone A Brief Intro to ML + Comm 97 / 126
◮ E step: Parametrize the variational distribution q(z|ϕ) or q(z|x,ϕ) and
◮ M step: Approximate Ez∼qnew(z) [lnp(x,z|θ)] via Monte Carlo ◮ Use gradient descent for E and/or M steps Osvaldo Simeone A Brief Intro to ML + Comm 97 / 126
5 0.05 0.1 0.15 0.2 0.25 0.3 0.35
Osvaldo Simeone A Brief Intro to ML + Comm 98 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 99 / 126
T(x)Ex∼p[T(x)]−Ex∼q[g(T(x))],
Osvaldo Simeone A Brief Intro to ML + Comm 100 / 126
θ
ϕ Ex∼pD[Tϕ(x)]−Ex∼p(x|θ)[g(Tϕ(x))]
Osvaldo Simeone A Brief Intro to ML + Comm 101 / 126
84
Osvaldo Simeone A Brief Intro to ML + Comm 102 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 103 / 126
◮ PHY: E2E encoding/decoding, CSI compression and feedback,
◮ MAC/ Link: Clustering for resource allocation, clustering for
Osvaldo Simeone A Brief Intro to ML + Comm 104 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 105 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 106 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 107 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 108 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 109 / 126
104
[Ibnkahla ‘00]
Osvaldo Simeone A Brief Intro to ML + Comm 110 / 126
[Nakashima et al '18]
Osvaldo Simeone A Brief Intro to ML + Comm 111 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 112 / 126
◮ Network: Clustering for group-based access control, anomaly detection ◮ Application: Community detection in social media, Internet traffic
Osvaldo Simeone A Brief Intro to ML + Comm 113 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 114 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 115 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 116 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 117 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 118 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 119 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 120 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 121 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 122 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 123 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 124 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 125 / 126
Osvaldo Simeone A Brief Intro to ML + Comm 126 / 126