A Brief Introduction to Machine Learning (With Applications to - - PowerPoint PPT Presentation

a brief introduction to machine learning with
SMART_READER_LITE
LIVE PREVIEW

A Brief Introduction to Machine Learning (With Applications to - - PowerPoint PPT Presentation

A Brief Introduction to Machine Learning (With Applications to Communications) Osvaldo Simeone Kings College London 11 June 2018 Osvaldo Simeone A Brief Intro to ML + Comm 1 / 126 Goals and Learning Outcomes Goals: Provide an


slide-1
SLIDE 1

A Brief Introduction to Machine Learning (With Applications to Communications)

Osvaldo Simeone

King’s College London

11 June 2018

Osvaldo Simeone A Brief Intro to ML + Comm 1 / 126

slide-2
SLIDE 2

Goals and Learning Outcomes

Goals:

◮ Provide an introduction to main areas in machine learning with a focus

  • n probabilistic methods

◮ Offer some pointers to specific applications for telecom

Learning outcomes:

◮ Recognize scenarios in which machine learning can and cannot be useful ◮ Identify specific classes of machine learning methods that apply to a

given problem with applications to telecom networks

Osvaldo Simeone A Brief Intro to ML + Comm 2 / 126

slide-3
SLIDE 3

For More...

  • O. Simeone, “A Brief Introduction to Machine Learning for

Engineers,” arXiv:1709.02840.

Osvaldo Simeone A Brief Intro to ML + Comm 3 / 126

slide-4
SLIDE 4

What is Machine Learning?

Traditional engineering approach:

◮ Acquisition of domain knowledge... Osvaldo Simeone A Brief Intro to ML + Comm 4 / 126

slide-5
SLIDE 5

What is Machine Learning?

Traditional engineering approach:

◮ ... mathematical (physics-based) modelling... Osvaldo Simeone A Brief Intro to ML + Comm 5 / 126

slide-6
SLIDE 6

What is Machine Learning?

Traditional engineering approach:

◮ ... and optimized algorithm design with performance guarantees Osvaldo Simeone A Brief Intro to ML + Comm 6 / 126

slide-7
SLIDE 7

What is Machine Learning?

Machine learning approach:

◮ Selection of a general purpose model and a learning algorithm... Osvaldo Simeone A Brief Intro to ML + Comm 7 / 126

slide-8
SLIDE 8

What is Machine Learning?

Machine learning approach:

◮ ... learning based on data (examples) and use of the trained

(black-box) “machine”

Osvaldo Simeone A Brief Intro to ML + Comm 8 / 126

slide-9
SLIDE 9

When to Use Machine Learning?

Advantages:

◮ lower cost ◮ faster development ◮ reduced implementation complexity

Disadvantages

◮ suboptimal performance ◮ lack of interpretability ◮ limited applicability Osvaldo Simeone A Brief Intro to ML + Comm 9 / 126

slide-10
SLIDE 10

When to Use Machine Learning?

(Slightly modified) criteria by [Brynjolfsson and Mitchell ’17]:

◮ traditional engineering flow too expensive or time-consuming ◮ the task involves a function that maps well-defined inputs to

well-defined outputs

◮ the task provides clear feedback with clearly definable goals and metrics ◮ large data sets exist or can be created containing input-output pairs ◮ the task does not involve long chains of logic or reasoning that depend

  • n diverse background knowledge or common sense

◮ the task requires does not require detailed explanations for how the

decision was made

◮ the task has a tolerance for error and no need for provably correct or

  • ptimal solutions

◮ the phenomenon or function being learned should not change rapidly

  • ver time

Osvaldo Simeone A Brief Intro to ML + Comm 10 / 126

slide-11
SLIDE 11

Taxonomy of Machine Learning Methods

Supervised learning Unsupervised learning Reinforcement learning

Osvaldo Simeone A Brief Intro to ML + Comm 11 / 126

slide-12
SLIDE 12

Taxonomy of Machine Learning Methods

Supervised vs unsupervised learning

Osvaldo Simeone A Brief Intro to ML + Comm 12 / 126

slide-13
SLIDE 13

Taxonomy of Machine Learning Methods

Reinforcement learning: feedback-based sequential decision making

[@ D. Silver] st at rt

Osvaldo Simeone A Brief Intro to ML + Comm 13 / 126

slide-14
SLIDE 14

Communication Networks

Fog network architecture [5GPPP]

Core Network Edge Cloud Wireless Edge Access Network Core Cloud

Osvaldo Simeone A Brief Intro to ML + Comm 14 / 126

slide-15
SLIDE 15

Communication Networks

Fog network architecture [5GPPP]

Core Network Edge Cloud Wireless Edge Access Network Core Cloud Cloud Edge

Data collection and processing can take place at the edge and/or at the cloud.

Osvaldo Simeone A Brief Intro to ML + Comm 15 / 126

slide-16
SLIDE 16

Data in Communication Networks

Data at the edge:

◮ PHY: Baseband signals, (multi-RAT) channel quality ◮ MAC/ Link: Throughput, FER, random access load and latency ◮ Network: Location, traffic loads across services, users’ device types,

battery levels

◮ Application: Users’ preferences, content demands, computing loads,

QoS metrics

Osvaldo Simeone A Brief Intro to ML + Comm 16 / 126

slide-17
SLIDE 17

Data in Communication Networks

Data at the cloud:

◮ Network: Mobility patterns, network-wide traffic statistics, outage rates ◮ Application: User’s behavior patterns, subscription information, service

usage statistics, TCP/IP traffic statistics

Osvaldo Simeone A Brief Intro to ML + Comm 17 / 126

slide-18
SLIDE 18

Learning in Communication Networks

Which tasks?

◮ traditional engineering flow too expensive or time-consuming (depends) ◮ the task involves a function that maps well-defined inputs to

well-defined outputs

◮ the task provides clear feedback with clearly definable goals and metrics

  • ◮ large data sets exist or can be created containing input-output pairs

◮ the task does not involve long chains of logic or reasoning that depend

  • n diverse background knowledge or common sense

◮ the task requires does not require detailed explanations for how the

decision was made

◮ the task has a tolerance for error and no need for provably correct or

  • ptimal solutions (depends)

◮ the phenomenon or function being learned should not change rapidly

  • ver time (depends)

Osvaldo Simeone A Brief Intro to ML + Comm 18 / 126

slide-19
SLIDE 19

Overview

Supervised Learning Unsupervised Learning Reinforcement Learning

Osvaldo Simeone A Brief Intro to ML + Comm 19 / 126

slide-20
SLIDE 20

Overview

Supervised Learning Unsupervised Learning Reinforcement Learning

Osvaldo Simeone A Brief Intro to ML + Comm 20 / 126

slide-21
SLIDE 21

Overview

Supervised Learning Unsupervised Learning Reinforcement Learning

Osvaldo Simeone A Brief Intro to ML + Comm 21 / 126

slide-22
SLIDE 22

Supervised Learning

Supervised learning:

◮ regression: continuous labels ◮ classification: discrete labels Osvaldo Simeone A Brief Intro to ML + Comm 22 / 126

slide-23
SLIDE 23

Supervised Learning: Regression

0.2 0.4 0.6 0.8 1

  • 1.5
  • 1
  • 0.5

0.5 1 1.5

Training set D: N training points (xn,tn), n = 1,...,N xn = covariates, domain points, or explanatory variables tn = dependent variables, labels, or responses (continuous) Goal: Predict the label t for a new, that is, as of yet unobserved, domain point x

Osvaldo Simeone A Brief Intro to ML + Comm 23 / 126

slide-24
SLIDE 24

Supervised Learning: Classification

4 5 6 7 8 9 0.5 1 1.5 2 2.5 3 3.5 4 4.5

?

Training set D: N training points (xn,tn), n = 1,...,N xn = covariates, domain points, or explanatory variables tn = dependent variables, labels, or responses (discrete) Goal: Predict the label (class) t for a new, that is, as of yet unobserved, domain point x

Osvaldo Simeone A Brief Intro to ML + Comm 24 / 126

slide-25
SLIDE 25

Supervised Learning

Impossible task without assuming a model (inductive bias) by the no free lunch theorem Memorizing vs. learning: Retrieval of a value tn corresponding to an already observed pair (xn,tn) ∈ D vs. predict the value t for an unseen x

Osvaldo Simeone A Brief Intro to ML + Comm 25 / 126

slide-26
SLIDE 26

Supervised Learning

Impossible task without assuming a model (inductive bias) by the no free lunch theorem Memorizing vs. learning: Retrieval of a value tn corresponding to an already observed pair (xn,tn) ∈ D vs. predict the value t for an unseen x

Osvaldo Simeone A Brief Intro to ML + Comm 25 / 126

slide-27
SLIDE 27

Defining Supervised Learning

Training set D: (xn,tn) ∼

i.i.d. p(x,t), n = 1,...,N

Based on the training set D, we derive a predictor ˆ t(x). Test pair: (x,t) ∼

  • indep. of D p(x,t)

Quality of the prediction ˆ t(x) for a pair (x,t) ℓ(t,ˆ t(x)) for some loss function ℓ(t,ˆ t), e.g., ℓ(t,ˆ t) = (t −ˆ t)2 (quadratic) or ℓ(t,ˆ t) = 1(t = ˆ t) (probability of error)

Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126

slide-28
SLIDE 28

Defining Supervised Learning

Training set D: (xn,tn) ∼

i.i.d. p(x,t), n = 1,...,N

Based on the training set D, we derive a predictor ˆ t(x). Test pair: (x,t) ∼

  • indep. of D p(x,t)

Quality of the prediction ˆ t(x) for a pair (x,t) ℓ(t,ˆ t(x)) for some loss function ℓ(t,ˆ t), e.g., ℓ(t,ˆ t) = (t −ˆ t)2 (quadratic) or ℓ(t,ˆ t) = 1(t = ˆ t) (probability of error)

Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126

slide-29
SLIDE 29

Defining Supervised Learning

Training set D: (xn,tn) ∼

i.i.d. p(x,t), n = 1,...,N

Based on the training set D, we derive a predictor ˆ t(x). Test pair: (x,t) ∼

  • indep. of D p(x,t)

Quality of the prediction ˆ t(x) for a pair (x,t) ℓ(t,ˆ t(x)) for some loss function ℓ(t,ˆ t), e.g., ℓ(t,ˆ t) = (t −ˆ t)2 (quadratic) or ℓ(t,ˆ t) = 1(t = ˆ t) (probability of error)

Osvaldo Simeone A Brief Intro to ML + Comm 26 / 126

slide-30
SLIDE 30

Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss) Lp(ˆ t) = E(x,t)∼pxt[ℓ(t,ˆ t(x))] Alternative viewpoints to frequentist framework: Bayesian and Minimum Description Length (MDL)

Osvaldo Simeone A Brief Intro to ML + Comm 27 / 126

slide-31
SLIDE 31

Defining Supervised Learning

Goal: minimize average loss on the test pair (generalization loss) Lp(ˆ t) = E(x,t)∼pxt[ℓ(t,ˆ t(x))] Alternative viewpoints to frequentist framework: Bayesian and Minimum Description Length (MDL)

Osvaldo Simeone A Brief Intro to ML + Comm 27 / 126

slide-32
SLIDE 32

When the True Distribution p(x,t) is Known...

... we don’t need data D ... and we have a standard inference problem, i.e., estimation (regression) or detection (classification). The solution can be directly computed from the posterior distribution p(t|x) = p(x,t) p(x) as ˆ t∗(x) = argmin

ˆ t Et∼pt|x[ℓ(t,ˆ

t)|x]

Osvaldo Simeone A Brief Intro to ML + Comm 28 / 126

slide-33
SLIDE 33

When the True Distribution p(x,t) is Known...

... we don’t need data D ... and we have a standard inference problem, i.e., estimation (regression) or detection (classification). The solution can be directly computed from the posterior distribution p(t|x) = p(x,t) p(x) as ˆ t∗(x) = argmin

ˆ t Et∼pt|x[ℓ(t,ˆ

t)|x]

Osvaldo Simeone A Brief Intro to ML + Comm 28 / 126

slide-34
SLIDE 34

When the Model p(x,t) is Known...

With quadratic loss, conditional mean: ˆ t∗(x) = Et∼pt|x[t|x] With probability of error, maximum a posteriori (MAP): ˆ t∗(x) = argmaxt p(t|x) Example: with joint distribution x\t 1 0.05 0.45 1 0.4 0.1 , we have p(t = 1|x = 0) = 0.9 and ˆ t∗(x = 0) = 0.9×1+0.1×0 = 0.9 for quadratic loss, ˆ t∗(x = 0) = 1 for probability of error (MAP) .

Osvaldo Simeone A Brief Intro to ML + Comm 29 / 126

slide-35
SLIDE 35

When the Model p(x,t) is Known...

With quadratic loss, conditional mean: ˆ t∗(x) = Et∼pt|x[t|x] With probability of error, maximum a posteriori (MAP): ˆ t∗(x) = argmaxt p(t|x) Example: with joint distribution x\t 1 0.05 0.45 1 0.4 0.1 , we have p(t = 1|x = 0) = 0.9 and ˆ t∗(x = 0) = 0.9×1+0.1×0 = 0.9 for quadratic loss, ˆ t∗(x = 0) = 1 for probability of error (MAP) .

Osvaldo Simeone A Brief Intro to ML + Comm 29 / 126

slide-36
SLIDE 36

When the True Distribution p(x,t) is Not Known...

... we need data D ... and we have a learning problem

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126

slide-37
SLIDE 37

When the True Distribution p(x,t) is Not Known...

... we need data D ... and we have a learning problem

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126

slide-38
SLIDE 38

When the True Distribution p(x,t) is Not Known...

... we need data D ... and we have a learning problem

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 30 / 126

slide-39
SLIDE 39

Logistic Regression

Example: Binary classification (t ∈ {0,1})

  • 1. Model selection (inductive bias): logistic regression

(discriminative model) φ(x) = [φ1(x)···φD′(x)]T is a vector of features (e.g., bag-of-words model for a text).

Osvaldo Simeone A Brief Intro to ML + Comm 31 / 126

slide-40
SLIDE 40

Logistic Regression

Parametric probabilistic model: p(t = 1|x,w) = σ(wTφ(x)) where σ(a) = (1+exp(−a))−1 is the sigmoid function.

Osvaldo Simeone A Brief Intro to ML + Comm 32 / 126

slide-41
SLIDE 41

Logistic Regression

  • 2. Learning: To be discussed
  • 3. Inference: With probability of error loss, MAP classification

wTφ(x)

  • logit or LLR

t=1

t=0

Osvaldo Simeone A Brief Intro to ML + Comm 33 / 126

slide-42
SLIDE 42

Multi-Layer Neural Networks

  • 1. Model selection (inductive bias): multi-layer neural network

(discriminative model) Multiple layers of learnable weights enable feature learning.

Osvaldo Simeone A Brief Intro to ML + Comm 34 / 126

slide-43
SLIDE 43

Supervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 35 / 126

slide-44
SLIDE 44

Supervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 36 / 126

slide-45
SLIDE 45

Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated the

  • bserved training set D:

maximize p(D|θ) ⇐ ⇒maximize lnp(D|θ) (log-likelihood, or LL) ⇐ ⇒minimize −lnp(D|θ) (negative log-likelihood, or NLL) For discriminative models: minimize −lnp(tD|xD,θ) = −

N

n=1

lnp(tn|xn,θ)

Osvaldo Simeone A Brief Intro to ML + Comm 37 / 126

slide-46
SLIDE 46

Learning: Maximum Likelihood

ML selects a value of θ that is the most likely to have generated the

  • bserved training set D:

maximize p(D|θ) ⇐ ⇒maximize lnp(D|θ) (log-likelihood, or LL) ⇐ ⇒minimize −lnp(D|θ) (negative log-likelihood, or NLL) For discriminative models: minimize −lnp(tD|xD,θ) = −

N

n=1

lnp(tn|xn,θ)

Osvaldo Simeone A Brief Intro to ML + Comm 37 / 126

slide-47
SLIDE 47

Learning: Maximum Likelihood

The problem rarely has analytical solutions and is typically addressed by Stochastic Gradient Descent (SGD). For discriminative models, we have θ new ← θ old +γ∇θ lnp(tn|xn,θ)|θ=θ old γ is the learning rate. With multi-layer neural networks, this approach yields the backpropagation algorithm.

Osvaldo Simeone A Brief Intro to ML + Comm 38 / 126

slide-48
SLIDE 48

Supervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x,t|θ)

  • generative model
  • r

p(t|x,θ)

  • discriminative model
  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Inference: Use model to obtain the predictor ˆ

t(x) (to be tested

  • n new data)

Osvaldo Simeone A Brief Intro to ML + Comm 39 / 126

slide-49
SLIDE 49

Model Selection

How to select a model (inductive bias)? Model selection typically requires the model order, i.e., the capacity of the model. Ex.: For logistic regression,

◮ Model order M: Number of features Osvaldo Simeone A Brief Intro to ML + Comm 40 / 126

slide-50
SLIDE 50

Model Selection

How to select a model (inductive bias)? Model selection typically requires the model order, i.e., the capacity of the model. Ex.: For logistic regression,

◮ Model order M: Number of features Osvaldo Simeone A Brief Intro to ML + Comm 40 / 126

slide-51
SLIDE 51

Model Selection

Example: Regression using a discriminative model p(t|x)

M

m=0

wmxm

  • ˆ

t(x): polynomial of order M

+N (0,1)

0.2 0.4 0.6 0.8 1

  • 1.5
  • 1
  • 0.5

0.5 1 1.5 Osvaldo Simeone A Brief Intro to ML + Comm 41 / 126

slide-52
SLIDE 52

Model Selection

With M = 1, using ML learning of the coefficients –

0.2 0.4 0.6 0.8 1

  • 3
  • 2
  • 1

1 2 3

M= 1

Osvaldo Simeone A Brief Intro to ML + Comm 42 / 126

slide-53
SLIDE 53

Model Selection: Underfitting...

With M = 1, the ML predictor ˆ t(x) underfits the data:

◮ the model is not rich enough to capture the variations present in the

data;

◮ large training loss

LD(θ) = 1 N

N

n=1

(tn −ˆ t(xn))2

Osvaldo Simeone A Brief Intro to ML + Comm 43 / 126

slide-54
SLIDE 54

Model Selection

With M = 9, using ML learning of the coefficients –

0.2 0.4 0.6 0.8 1

  • 3
  • 2
  • 1

1 2 3

= 9 M M= 1

Osvaldo Simeone A Brief Intro to ML + Comm 44 / 126

slide-55
SLIDE 55

Model Selection: ... vs Overfitting

With M = 9, the ML predictor overfits the data:

◮ the model is too rich and, in order to account for the observations in

the training set, it appears to yield inaccurate predictions outside it;

◮ presumably we have a large generalization loss

Lp(ˆ t) = E(x,t)∼pxt[(t−ˆ t(x))2]

Osvaldo Simeone A Brief Intro to ML + Comm 45 / 126

slide-56
SLIDE 56

Model Selection

M = 3 seems to be a resonable choice... ... but how do we know given that we have no data outside of the training set?

0.2 0.4 0.6 0.8 1

  • 3
  • 2
  • 1

1 2 3

= 9 M M= 1 M= 3

Osvaldo Simeone A Brief Intro to ML + Comm 46 / 126

slide-57
SLIDE 57

Model Selection: Validation

Keep some data (validation set) to estimate the generalization error for different values of M (See cross-validation for a more efficient way to use the data.)

Osvaldo Simeone A Brief Intro to ML + Comm 47 / 126

slide-58
SLIDE 58

Model Selection: Validation

Validation allows model order selection.

1 2 3 4 5 6 7 8 9 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

root average squared loss training generalization (via validation)

  • verfitting

underfitting

Validation can also be used more generally to select other hyperparameters (e.g., learning rate).

Osvaldo Simeone A Brief Intro to ML + Comm 48 / 126

slide-59
SLIDE 59

Model Selection: Validation

Validation allows model order selection.

1 2 3 4 5 6 7 8 9 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6

root average squared loss training generalization (via validation)

  • verfitting

underfitting

Validation can also be used more generally to select other hyperparameters (e.g., learning rate).

Osvaldo Simeone A Brief Intro to ML + Comm 48 / 126

slide-60
SLIDE 60

Model Selection: Validation

Model order selection should depend on the amount of data... It is a problem of bias (asymptotic error) versus generalization gap.

10 20 30 40 50 60 70 0.2 0.4 0.6 0.8 1

root average quadratic loss M M = 1 = 7 generalization (via validation) training

Osvaldo Simeone A Brief Intro to ML + Comm 49 / 126

slide-61
SLIDE 61

Application to Communication Networks

Fog network architecture [5GPPP]

Core Network Edge Cloud Wireless Edge Access Network Core Cloud Cloud Edge

Osvaldo Simeone A Brief Intro to ML + Comm 50 / 126

slide-62
SLIDE 62

At the Edge: Overview

At the edge:

◮ PHY: Detection and decoding, precoding and power allocation,

modulation recognition, localization, interference cancelation, joint source channel coding, equalization in the presence of non-linearities

◮ MAC/ Link: Radio resource allocation, scheduling, multi-RAT

handover, dynamic spectrum access, admission control

◮ Network: Proactive caching ◮ Application: Computing resource allocation, content request prediction Osvaldo Simeone A Brief Intro to ML + Comm 51 / 126

slide-63
SLIDE 63

At the Edge: PHY

Channel detection and decoding – classification [Cammerer et al '17]

Osvaldo Simeone A Brief Intro to ML + Comm 52 / 126

slide-64
SLIDE 64

At the Edge: PHY

Channel detection and decoding – classification

[Farsad and Goldsmith '18]

Osvaldo Simeone A Brief Intro to ML + Comm 53 / 126

slide-65
SLIDE 65

At the Edge: PHY

Channel equalization in the presence of non-linearities, e.g., for optical links – regression

[Wang et al ‘16]

Osvaldo Simeone A Brief Intro to ML + Comm 54 / 126

slide-66
SLIDE 66

At the Edge: PHY

Channel equalization in the presence of non-linearities, e.g., for satellite links with non-linear ampliers – regression

[Bouchired et al ’98]

Osvaldo Simeone A Brief Intro to ML + Comm 55 / 126

slide-67
SLIDE 67

At the Edge: PHY

Channel decoding for modulation schemes with complex optimal decoders, e.g., continuous phase modulation – classification

[De Veciana and Zakhor '92]

Osvaldo Simeone A Brief Intro to ML + Comm 56 / 126

slide-68
SLIDE 68

At the Edge: PHY

Channel decoding – classification Leverage domain knowledge to set up the parametrized model to be learned

[Nachmani et al ‘16]

Osvaldo Simeone A Brief Intro to ML + Comm 57 / 126

slide-69
SLIDE 69

At the Edge: PHY

Modulation recognition – classification

[Agirman-Tosun et al '11]

Osvaldo Simeone A Brief Intro to ML + Comm 58 / 126

slide-70
SLIDE 70

At the Edge: PHY

Localization – regression

(coordinates)

[Fang and Lin ‘08]

Osvaldo Simeone A Brief Intro to ML + Comm 59 / 126

slide-71
SLIDE 71

At the Edge: PHY

Precoding and power allocation – regression [Sun et al ’17]

Osvaldo Simeone A Brief Intro to ML + Comm 60 / 126

slide-72
SLIDE 72

At the Edge: PHY

Interference cancellation – regression

[Balatsoukas-Stimming ‘17]

Osvaldo Simeone A Brief Intro to ML + Comm 61 / 126

slide-73
SLIDE 73

At the Edge: MAC/ Link

Spectrum sensing – classification

[Tumuluru et al '10]

Osvaldo Simeone A Brief Intro to ML + Comm 62 / 126

slide-74
SLIDE 74

At the Edge: MAC/ Link

Mmwave channel quality prediction using depth images – regression

[Okamoto et al '18]

Osvaldo Simeone A Brief Intro to ML + Comm 63 / 126

slide-75
SLIDE 75

At the Edge: Network and Application

Content prediction for proactive caching – classification

[Chen et al '17]

Osvaldo Simeone A Brief Intro to ML + Comm 64 / 126

slide-76
SLIDE 76

At the Cloud: Overview

At the cloud:

◮ Network: Routing (classification vs look-up tables), SDN flow table

updating, proactive caching, congestion control

◮ Application: Cloud/ fog computing, Internet traffic classification Osvaldo Simeone A Brief Intro to ML + Comm 65 / 126

slide-77
SLIDE 77

At the Cloud: Network

Link prediction for wireless routing – classification/ regression [Wang et al 06]

Osvaldo Simeone A Brief Intro to ML + Comm 66 / 126

slide-78
SLIDE 78

At the Cloud: Network

Link prediction for optical routing – classification/ regression [Musumeci et al ’18]

Osvaldo Simeone A Brief Intro to ML + Comm 67 / 126

slide-79
SLIDE 79

At the Cloud: Network

Congestion prediction for smart routing – classification

[Tang et al ‘17]

Osvaldo Simeone A Brief Intro to ML + Comm 68 / 126

slide-80
SLIDE 80

At the Cloud: Network and Application

Traffic classification – classification

[Nguyen et al '08]

Osvaldo Simeone A Brief Intro to ML + Comm 69 / 126

slide-81
SLIDE 81

Overview

Supervised Learning Unsupervised Learning Reinforcement Learning

Osvaldo Simeone A Brief Intro to ML + Comm 70 / 126

slide-82
SLIDE 82

Unsupervised Learning

Unsupervised learning tasks operate over unlabelled data sets. General goal: discover properties of the data, e.g., for compressed representation “Some of us see unsupervised learning as the key towards machines with common sense.” (Y. LeCun)

Osvaldo Simeone A Brief Intro to ML + Comm 71 / 126

slide-83
SLIDE 83

Unsupervised Learning

Unsupervised learning tasks operate over unlabelled data sets. General goal: discover properties of the data, e.g., for compressed representation “Some of us see unsupervised learning as the key towards machines with common sense.” (Y. LeCun)

Osvaldo Simeone A Brief Intro to ML + Comm 71 / 126

slide-84
SLIDE 84

“Defining” Unsupervised Learning

Training set D: xn ∼

i.i.d. p(x), n = 1,...,N

Goal: Learn some useful properties of the distribution p(x) Alternative viewpoints to frequentist framework: Bayesian and MDL

Osvaldo Simeone A Brief Intro to ML + Comm 72 / 126

slide-85
SLIDE 85

“Defining” Unsupervised Learning

Training set D: xn ∼

i.i.d. p(x), n = 1,...,N

Goal: Learn some useful properties of the distribution p(x) Alternative viewpoints to frequentist framework: Bayesian and MDL

Osvaldo Simeone A Brief Intro to ML + Comm 72 / 126

slide-86
SLIDE 86

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators, compression algorithms, to detect outliers Clustering: partition all points in D in groups of similar objects (e.g., document clustering) Dimensionality reduction, representation and feature extraction: represent each data points xn in a space of lower dimensionality, e.g., to highlight independent explanatory factors, and/or to ease visualization, interpretation, or successive tasks Generation of new samples: learn a machine that produces samples approximately distributed according to p(x), e.g., to produce artificial scenes based for games or films

Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126

slide-87
SLIDE 87

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators, compression algorithms, to detect outliers Clustering: partition all points in D in groups of similar objects (e.g., document clustering) Dimensionality reduction, representation and feature extraction: represent each data points xn in a space of lower dimensionality, e.g., to highlight independent explanatory factors, and/or to ease visualization, interpretation, or successive tasks Generation of new samples: learn a machine that produces samples approximately distributed according to p(x), e.g., to produce artificial scenes based for games or films

Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126

slide-88
SLIDE 88

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators, compression algorithms, to detect outliers Clustering: partition all points in D in groups of similar objects (e.g., document clustering) Dimensionality reduction, representation and feature extraction: represent each data points xn in a space of lower dimensionality, e.g., to highlight independent explanatory factors, and/or to ease visualization, interpretation, or successive tasks Generation of new samples: learn a machine that produces samples approximately distributed according to p(x), e.g., to produce artificial scenes based for games or films

Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126

slide-89
SLIDE 89

Unsupervised Learning Tasks

Density estimation: estimate p(x), e.g., for use in plug-in estimators, compression algorithms, to detect outliers Clustering: partition all points in D in groups of similar objects (e.g., document clustering) Dimensionality reduction, representation and feature extraction: represent each data points xn in a space of lower dimensionality, e.g., to highlight independent explanatory factors, and/or to ease visualization, interpretation, or successive tasks Generation of new samples: learn a machine that produces samples approximately distributed according to p(x), e.g., to produce artificial scenes based for games or films

Osvaldo Simeone A Brief Intro to ML + Comm 73 / 126

slide-90
SLIDE 90

Unsupervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x|θ)

  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Clustering, feature extraction, sample generation...

Osvaldo Simeone A Brief Intro to ML + Comm 74 / 126

slide-91
SLIDE 91

Unsupervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x|θ)

  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Clustering, feature extraction, sample generation...

Osvaldo Simeone A Brief Intro to ML + Comm 75 / 126

slide-92
SLIDE 92

Models

Unsupervised learning models typically involve hidden or latent variables. zn = hidden, or latent, variables for each data point xn Ex.: zn = cluster index of xn

Osvaldo Simeone A Brief Intro to ML + Comm 76 / 126

slide-93
SLIDE 93

Models

Unsupervised learning models typically involve hidden or latent variables. zn = hidden, or latent, variables for each data point xn Ex.: zn = cluster index of xn

Osvaldo Simeone A Brief Intro to ML + Comm 77 / 126

slide-94
SLIDE 94

(a) Directed Generative Models

Model data x as being caused by z: p(x|θ) = ∑

z

p(z|θ)p(x|z,θ)

Osvaldo Simeone A Brief Intro to ML + Comm 78 / 126

slide-95
SLIDE 95

(a) Directed Generative Models

Ex.: Document clustering

◮ x is a document, and z is (interpreted as) topic ◮ p(z|θ) = distribution of topics ◮ p(x|z,θ) = distribution of words in document given topic

Basic representatives:

◮ Mixture of Gaussians ◮ Likelihood-free models Osvaldo Simeone A Brief Intro to ML + Comm 79 / 126

slide-96
SLIDE 96

(a) Directed Generative Models

Ex.: Document clustering

◮ x is a document, and z is (interpreted as) topic ◮ p(z|θ) = distribution of topics ◮ p(x|z,θ) = distribution of words in document given topic

Basic representatives:

◮ Mixture of Gaussians ◮ Likelihood-free models Osvaldo Simeone A Brief Intro to ML + Comm 79 / 126

slide-97
SLIDE 97

(d) Autoencoders

Model encoding from data to hidden variables, as well as decoding from hidden variables back to data: p(z|x,θ) and p(x|z,θ),

Osvaldo Simeone A Brief Intro to ML + Comm 80 / 126

slide-98
SLIDE 98

(d) Autoencoders

Ex.: Compression

◮ x is an image and z is (interpreted as) a compressed (e.g., sparse)

representation

◮ p(z|x,θ) = compression of image to representation ◮ p(x|z,θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionary learning, neural network-based autoencoders

Osvaldo Simeone A Brief Intro to ML + Comm 81 / 126

slide-99
SLIDE 99

(d) Autoencoders

Ex.: Compression

◮ x is an image and z is (interpreted as) a compressed (e.g., sparse)

representation

◮ p(z|x,θ) = compression of image to representation ◮ p(x|z,θ) = decompression of representation into an image

Basic representative: Principal Component Analysis (PCA), dictionary learning, neural network-based autoencoders

Osvaldo Simeone A Brief Intro to ML + Comm 81 / 126

slide-100
SLIDE 100

Unsupervised Learning

  • 1. Model selection (inductive bias): Define a parametric model

p(x|θ)

  • 2. Learning: Given data D, optimize a learning criterion to obtain

the parameter vector θ

  • 3. Clustering, feature extraction, sample generation...

Osvaldo Simeone A Brief Intro to ML + Comm 82 / 126

slide-101
SLIDE 101

Learning: Maximum Likelihood

Focus on directed generative models (a) To simplify the notation, consider a single data point x (sum over data set D to generalize). ML problem: max

θ

lnp(x|θ) = ln

z

p(x,z|θ)

  • Key issue: Need to marginalize over latent variables, whose

distribution is not known, in order to evaluate LL.

Osvaldo Simeone A Brief Intro to ML + Comm 83 / 126

slide-102
SLIDE 102

Learning: Maximum Likelihood

Focus on directed generative models (a) To simplify the notation, consider a single data point x (sum over data set D to generalize). ML problem: max

θ

lnp(x|θ) = ln

z

p(x,z|θ)

  • Key issue: Need to marginalize over latent variables, whose

distribution is not known, in order to evaluate LL.

Osvaldo Simeone A Brief Intro to ML + Comm 83 / 126

slide-103
SLIDE 103

ELBO

To tackle this issue, a standard approach is the introduction of a variational distribution q(z) and the use of the Evidence Lower BOund (ELBO). For any fixed value x and any distribution q(z) on the latent variables z (possibly dependent on x), the ELBO L (q,θ) is defined as L (q,θ) =Ez∼q(z)[lnp(x,z|θ)−lnq(z)

  • learning signal

]

Osvaldo Simeone A Brief Intro to ML + Comm 84 / 126

slide-104
SLIDE 104

ELBO

To tackle this issue, a standard approach is the introduction of a variational distribution q(z) and the use of the Evidence Lower BOund (ELBO). For any fixed value x and any distribution q(z) on the latent variables z (possibly dependent on x), the ELBO L (q,θ) is defined as L (q,θ) =Ez∼q(z)[lnp(x,z|θ)−lnq(z)

  • learning signal

]

Osvaldo Simeone A Brief Intro to ML + Comm 84 / 126

slide-105
SLIDE 105

ELBO

The ELBO is a global lower bound on the LL function lnp(x|θ) ≥ L (q,θ), where equality holds at a value θ0 if and only if the distribution q(z) satisfies q(z) = p(z|x,θ0).

  • 4
  • 3
  • 2
  • 1

1 2 3 4

  • 5
  • 4.5
  • 4
  • 3.5
  • 3
  • 2.5
  • 2
  • 1.5

Log-likelihood

ELBO ( 0= 3) ELBO ( 0 = 2) LL

Osvaldo Simeone A Brief Intro to ML + Comm 85 / 126

slide-106
SLIDE 106

Expectation-Maximization (EM) Algorithm

...

LL

new

  • ld

Osvaldo Simeone A Brief Intro to ML + Comm 86 / 126

slide-107
SLIDE 107

Expectation-Maximization (EM) Algorithm

Initialize parameter vector θ old. For each iteration

◮ E step: For fixed parameter vector θ old,

max

q

L (q,θ old) → qnew(z) = p(z|x,θ old)

◮ M step: For fixed variational distribution qnew(z),

max

θ

L (qnew,θ) → max

θ

Ez∼qnew(z) [lnp(x,z|θ)]

Osvaldo Simeone A Brief Intro to ML + Comm 87 / 126

slide-108
SLIDE 108

Expectation-Maximization (EM) Algorithm

Initialize parameter vector θ old. For each iteration

◮ E step: For fixed parameter vector θ old,

max

q

L (q,θ old) → qnew(z) = p(z|x,θ old). Bayesian inference of the latent variables

◮ M step: For fixed variational distribution qnew(z),

max

θ

L (qnew,θ) → max

θ

Ez∼qnew(z) [lnp(x,z|θ)] Solve a supervised learning problem

Osvaldo Simeone A Brief Intro to ML + Comm 88 / 126

slide-109
SLIDE 109

Expectation-Maximization (EM) Algorithm

EM guarantees decreasing objective values, which ensures convergence to a local optimum of the original problem.

...

LL

new

  • ld

Osvaldo Simeone A Brief Intro to ML + Comm 89 / 126

slide-110
SLIDE 110

Example: Mixture of Gaussians

Directed generative model: z ∼ Bern(π) x|z =k ∼ N(µk,Σk)

Osvaldo Simeone A Brief Intro to ML + Comm 90 / 126

slide-111
SLIDE 111

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 91 / 126

slide-112
SLIDE 112

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 92 / 126

slide-113
SLIDE 113

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 93 / 126

slide-114
SLIDE 114

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 94 / 126

slide-115
SLIDE 115

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 95 / 126

slide-116
SLIDE 116

Example: Mixture of Gaussians

Osvaldo Simeone A Brief Intro to ML + Comm 96 / 126

slide-117
SLIDE 117

Scaling EM

EM algorithm may be impractical for large scale problems: need to compute posterior in E step and to average over z in the M step. Solutions:

◮ E step: Parametrize the variational distribution q(z|ϕ) or q(z|x,ϕ) and

maximize ELBO over ϕ (variational autoencoder)

◮ M step: Approximate Ez∼qnew(z) [lnp(x,z|θ)] via Monte Carlo ◮ Use gradient descent for E and/or M steps Osvaldo Simeone A Brief Intro to ML + Comm 97 / 126

slide-118
SLIDE 118

Scaling EM

EM algorithm may be impractical for large scale problems: need to compute posterior in E step and to average over z in the M step. Solutions:

◮ E step: Parametrize the variational distribution q(z|ϕ) or q(z|x,ϕ) and

maximize ELBO over ϕ (variational autoencoder)

◮ M step: Approximate Ez∼qnew(z) [lnp(x,z|θ)] via Monte Carlo ◮ Use gradient descent for E and/or M steps Osvaldo Simeone A Brief Intro to ML + Comm 97 / 126

slide-119
SLIDE 119

Learning: Beyond Maximum Likelihood

ML tends to provide inclusive and “blurry” estimates of the distribution of the data distribution.

  • 5

5 0.05 0.1 0.15 0.2 0.25 0.3 0.35

This can be a problem for tasks such as data generation.

Osvaldo Simeone A Brief Intro to ML + Comm 98 / 126

slide-120
SLIDE 120

Learning: Beyond Maximum Likelihood

ML can be proven to minimize the KL divergence KL(pD(x)||p(x|θ)) = Ez∼pD(x)

  • ln pD(x)

p(x|θ)

  • betwen the empirical distribution

pD(x) = N[x] N (with counts N[x] = |{n : xn = x}|) and the model.

Osvaldo Simeone A Brief Intro to ML + Comm 99 / 126

slide-121
SLIDE 121

Learning: Beyond Maximum Likelihood

The KL divergence is part of the larger class of f -divergences between two distributions p(x) and q(x): Df (p||q) = max

T(x)Ex∼p[T(x)]−Ex∼q[g(T(x))],

for some concave increasing function g(·).

𝑈(𝑦)

𝑦~𝑞(𝑦) 𝑦~𝑟(𝑦) 𝑞 𝑦 if 𝑈 𝑦 large discriminator 𝑟 𝑦 if 𝑈 𝑦 small

Osvaldo Simeone A Brief Intro to ML + Comm 100 / 126

slide-122
SLIDE 122

Learning: Generative Adversarial Networks (GANs)

Generalizing the ML problem, GANs attempt to solve the problem min

θ

max

ϕ Ex∼pD[Tϕ(x)]−Ex∼p(x|θ)[g(Tϕ(x))]

for some differentiable function Tϕ(x) of the parameter vector ϕ. Choice of the divergence (via the discriminator) is tailored to data. Can be applied to likelihood-free models.

Osvaldo Simeone A Brief Intro to ML + Comm 101 / 126

slide-123
SLIDE 123

Learning: Generative Adversarial Networks (GANs)

84

[NVIDIA]

Osvaldo Simeone A Brief Intro to ML + Comm 102 / 126

slide-124
SLIDE 124

Applications to Communication Networks

Fog network architecture [5GPPP]

Core Network Edge Cloud Wireless Edge Access Network Core Cloud Cloud Edge

Osvaldo Simeone A Brief Intro to ML + Comm 103 / 126

slide-125
SLIDE 125

At the Edge: Overview

At the edge:

◮ PHY: E2E encoding/decoding, CSI compression and feedback,

fingerprinting for localization, blind source separation, blind channel equalization

◮ MAC/ Link: Clustering for resource allocation, clustering for

self-organizing multi-hop networks

Osvaldo Simeone A Brief Intro to ML + Comm 104 / 126

slide-126
SLIDE 126

At the Edge: PHY

End-to-end encoding/decoding for wireless channels – autoencoders [O’Shea and Hoydis ’17]

Osvaldo Simeone A Brief Intro to ML + Comm 105 / 126

slide-127
SLIDE 127

At the Edge: PHY

End-to-end encoding/decoding for optical channels – autoencoders

[Karanov et al ‘18]

Osvaldo Simeone A Brief Intro to ML + Comm 106 / 126

slide-128
SLIDE 128

At the Edge: PHY

Channel State Information (CSI) compression and feedback – autoencoders [Wen et al ‘17]

Osvaldo Simeone A Brief Intro to ML + Comm 107 / 126

slide-129
SLIDE 129

At the Edge: PHY

Fingerprinting for localization – autoencoders [Xiao et al '17]

Osvaldo Simeone A Brief Intro to ML + Comm 108 / 126

slide-130
SLIDE 130

At the Edge: PHY

Mimicking a propagation channel - GAN

[O’Shea et al ‘18]

Osvaldo Simeone A Brief Intro to ML + Comm 109 / 126

slide-131
SLIDE 131

At the Edge: PHY

Mimicking and identifying a propagation channel (e.g., satellite) - generative models Leveraging domain knowledge improves the learned model.

104

[Ibnkahla ‘00]

Osvaldo Simeone A Brief Intro to ML + Comm 110 / 126

slide-132
SLIDE 132

At the Edge: MAC/ Link

Generating artificial examples to augment training set for spectrum sensing - GAN

[Nakashima et al '18]

Osvaldo Simeone A Brief Intro to ML + Comm 111 / 126

slide-133
SLIDE 133

At the Edge: MAC/ Link

Resource allocation – clustering

[Abdelnasser et al '14]

Osvaldo Simeone A Brief Intro to ML + Comm 112 / 126

slide-134
SLIDE 134

At the Cloud: Overview

At the cloud:

◮ Network: Clustering for group-based access control, anomaly detection ◮ Application: Community detection in social media, Internet traffic

clustering

Osvaldo Simeone A Brief Intro to ML + Comm 113 / 126

slide-135
SLIDE 135

At the Cloud: Network

Self-organizing multi-hop networks – clustering

[Abbassi and Younis '07]

Osvaldo Simeone A Brief Intro to ML + Comm 114 / 126

slide-136
SLIDE 136

At the Cloud: Network

Anomaly detection – density estimation

[Musumeci et al ’18]

Osvaldo Simeone A Brief Intro to ML + Comm 115 / 126

slide-137
SLIDE 137

At the Cloud: Application

Community detection in social networks - clustering [Abbe et al ‘16]

Osvaldo Simeone A Brief Intro to ML + Comm 116 / 126

slide-138
SLIDE 138

Concluding Remarks

Machine learning tools can leverage the availability of data and computing resources in modern communication systems. Supervised, unsupervised and reinforcement learning paradigms lend themselves to different key communication (sub)tasks. Not a universal solution – case by case analysis of advantages and disadvantages.

Osvaldo Simeone A Brief Intro to ML + Comm 117 / 126

slide-139
SLIDE 139

Concluding Remarks

Engineering the integration of traditional model-based techniques and data-driven machine learning methods

[Reich ‘96]

Osvaldo Simeone A Brief Intro to ML + Comm 118 / 126

slide-140
SLIDE 140

Acknowledgements

This work has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 725731).

Osvaldo Simeone A Brief Intro to ML + Comm 119 / 126

slide-141
SLIDE 141

References

[O’Shea and Hoydis ’17] T. J. O’Shea, J. Hoydis, An Introduction to Machine Learning Communications Systems, 2017 [Cammerer et al ’17] Cammerer S, Gruber T, Hoydis J, Brink ST. Scaling Deep Learning-based Decoding of Polar Codes via Partitioning. arXiv preprint arXiv:1702.06901. 2017 Feb 22. [Balatsoukas-Stimming ’17] Balatsoukas-Stimming A. Non-Linear Digital Self-Interference Cancellation for In-Band Full-Duplex Radios Using Neural

  • Networks. arXiv preprint arXiv:1711.00379. 2017 Nov 1.

[Sun et al ’17] Sun H, Chen X, Shi Q, Hong M, Fu X, Sidiropoulos ND. Learning to Optimize: Training Deep Neural Networks for Wireless Resource Management. arXiv preprint arXiv:1705.09412. 2017 May 26. [de Kerret et al ’17] Paul de Kerret, David Gesbert, Maurizio Filippone, Decentralized Deep Scheduling for Interfrence Channels, arXiv, 2017. [Wen et al ’17] Wen, Chao-Kai; Shih, Wan-Ting; Jin, Shi, Deep Learning for Massive MIMO CSI Feedback arXiv.

Osvaldo Simeone A Brief Intro to ML + Comm 120 / 126

slide-142
SLIDE 142

References

[Agirman-Tosun et al ’11] Agirman-Tosun H, et al. “Modulation classification of MIMO-OFDM signals by independent component analysis and support vector machines.” in Proc. Asilomar 2011. [Fang and Lin ’08] Fang SH, Lin TN. Indoor location system based on discriminant-adaptive neural network in IEEE 802.11 environments. IEEE Transactions on Neural networks. 2008 Nov;19(11):1973-8. [Tumuluru et al ’10] Tumuluru VK, Wang P, Niyato D. A neural network based spectrum prediction scheme for cognitive radio. in Proc. ICCC 2010. [Chen et al ’17] Chen M, et a;. Echo state networks for proactive caching in cloud-based radio access networks with mobile users. IEEE Transactions

  • n Wireless Communications. 2017.

[Xiao et al ’17] Xiao C, Yang D, Chen Z, Tan G. 3-D BLE Indoor Localization Based on Denoising Autoencoder. IEEE Access. 2017;5:12751-60. [Abdelnasser et al ’14] Abdelnasser A, et al. Clustering and resource allocation for dense femtocells in a two-tier cellular OFDMA network. IEEE Transactions on Wireless Communications, 2014.

Osvaldo Simeone A Brief Intro to ML + Comm 121 / 126

slide-143
SLIDE 143

References

[Nguyen et al ’08] Nguyen TT, Armitage G. A survey of techniques for internet traffic classification using machine learning. IEEE Communications Surveys & Tutorials. 2008 Oct 1;10(4):56-76. [Wang et al 06] Wang, Yong, Margaret Martonosi, and Li-Shiuan Peh. "A supervised learning approach for routing optimizations in wireless sensor networks." Proc. ACM Workshop on Multi-hop ad hoc Networks, 2006. [Abbassi and Younis ’07] Abbasi AA, Younis M. A survey on clustering algorithms for wireless sensor networks. Computer communications, 2007. [Abbe et al ’16] Abbe E, Bandeira AS, Hall G. Exact recovery in the stochastic block model. IEEE Transactions on Information Theory, 2016. [Wang et al ’18] Wang, Zhi, et al. Handover Control in Wireless Systems via Asynchronous Multi-User Deep Reinforcement Learning, arxiv. [Wang et al ’16] Wang D, et al. Nonlinearity Mitigation Using a Machine Learning Detector Based on k-Nearest Neighbors. IEEE Photonics Technology Letters. 2016.

Osvaldo Simeone A Brief Intro to ML + Comm 122 / 126

slide-144
SLIDE 144

References

[Venkatraman et al ’10] Venkatraman P, et al. Opportunistic bandwidth sharing through reinforcement learning. IEEE Trans. Veh. Techn., 2010. [Iannello et al ’12] Iannello F, Simeone O, Spagnolini U. Optimality of myopic scheduling and whittle indexability for energy harvesting sensors. in

  • Proc. CISS 2012.

[Xu et al ’17] Xu Z, et al. A deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs, in Proc. IEEE ICC 2017. [Mnih et al ’15] Mnih V, et al. Human-level control through deep reinforcement learning. Nature, 2015. [Bogale et al ’18] T. Bogale, et al., Machine Intelligence Techniques for Next-Generation Context-Aware Wireless Networks, arxiv. [Tang et al ’17] F. Tang et al., "On Removing Routing Protocol from Future Wireless Networks: A Real-time Deep Learning Approach for Intelligent Traffic Control," IEEE Wireless Communications, 2018. [Sallent et al ’15] O. Sallent, et al., "Learning-based coexistence for LTE

  • peration in unlicensed bands," in Proc. IEEE ICC 2015.

Osvaldo Simeone A Brief Intro to ML + Comm 123 / 126

slide-145
SLIDE 145

References

[Kato et al ’17] N. Kato et al., "The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective," IEEE Wireless Communications, June 2017. [Siracusano and Bifulco ’18] Siracusano, Giuseppe; Bifulco, Roberto, In-network Neural Networks, arXiv:1801.05731. [He et al ’17] He Y, et al. Deep Reinforcement Learning (DRL)-based Resource Management in Software-Defined and Virtualized Vehicular Ad Hoc Networks, in Proc. ACM SDAIVN 2017. [Farsad and Goldsmith ’18] Farsad, Nariman; Goldsmith, Andrea, Neural Network Detection of Data Sequences in Communication Systems, arXiv:1802.02046 [Emigh et al ’15] Emigh, Matthew, et al. "A model based approach to exploration of continuous-state MDPs using Divergence-to-Go." in Proc. IEEE Machine Learning for Signal Processing (MLSP), 2015. [Caciularu and Burshtein ’18] Avi Caciularu, David Burshtein, Blind Channel Equalization using Variational Autoencoders, in Proc. IEEE ICC workshop, 2018.

Osvaldo Simeone A Brief Intro to ML + Comm 124 / 126

slide-146
SLIDE 146

References

[Musumeci et al ’18] Francesco Musumeci, Cristina Rottondi, Avishek Nag, Irene Macaluso, Darko Zibar, Marco Ruffini, Massimo Tornatore, Networking and Internet Architecture A Survey on Application of Machine Learning Techniques in Optical Networks, arXiv:1803.07976 [Okamoto et al ’18] Hironao Okamoto, et al., Machine-Learning-Based Future Received Signal Strength Prediction Using Depth Images for mmWave Communications arXiv:1804.00709. [Davaslioglu and Sagduyu ’18] Kemal Davaslioglu, Yalin E. Sagduyu, Generative Adversarial Learning for Spectrum Sensing, in Proc. IEEE ICC 2018. [Aoudia and Hoydis ’18] Model Fayçal Ait Aoudia, Jakob Hoydis, End-to-End Learning of Communications Systems Without a Channel, arXiv:1804.02276. [Karanov et al ’18] Boris Karanov, et al, End-to-end Deep Learning of Optical Fiber Communications, arXiv:1804.04097. [Nachmani et al ‘16] Nachmani, et al. "Learning to decode linear codes using deep learning." Allerton 2016.

Osvaldo Simeone A Brief Intro to ML + Comm 125 / 126

slide-147
SLIDE 147

References

[O’Shea et al ’18] Timothy J. O’Shea, Tamoghna Roy, Nathan West, “Approximating the Void: Learning Stochastic Channel Models from Observation with Variational Generative Adversarial Networks”, arXiv:1805.06350. [Zhao et al ’18] Zhifeng Zhao et al, “Deep Reinforcement Learning for Network Slicing”, arXiv:1805.06591. [Ibnkahla ’00] Ibnkahla M. Applications of neural networks to digital communications–a survey. Signal processing. 2000 Jul 1;80(7):1185-215. [Bouchired et al ’98] Bouchired S, Roviras D, Castanié F. Equalisation of satellite mobile channels with neural network techniques. Space

  • Communications. 1998.

[De Veciana and Zakhor ’92] De Veciana G, Zakhor A. Neural net-based continuous phase modulation receivers. IEEE transactions on

  • communications. 1992.

Osvaldo Simeone A Brief Intro to ML + Comm 126 / 126