Machine learning for cancer genomics Jean-Philippe Vert - - PowerPoint PPT Presentation

machine learning for cancer genomics
SMART_READER_LITE
LIVE PREVIEW

Machine learning for cancer genomics Jean-Philippe Vert - - PowerPoint PPT Presentation

Machine learning for cancer genomics Jean-Philippe Vert Jean-Philippe.Vert@mines.org Mines ParisTech / Curie Institute / Inserm Informatics and mathematical sciences: interactions with biomedical sciences workshop, Paris, June 17, 2011.


slide-1
SLIDE 1

Machine learning for cancer genomics

Jean-Philippe Vert Jean-Philippe.Vert@mines.org

Mines ParisTech / Curie Institute / Inserm

”Informatics and mathematical sciences: interactions with biomedical sciences” workshop, Paris, June 17, 2011.

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 1 / 44

slide-2
SLIDE 2

Outline

1

Introduction

2

Cancer prognosis from DNA copy number variations

3

Diagnosis and prognosis from gene expression data

4

Conclusion

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 2 / 44

slide-3
SLIDE 3

Outline

1

Introduction

2

Cancer prognosis from DNA copy number variations

3

Diagnosis and prognosis from gene expression data

4

Conclusion

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 3 / 44

slide-4
SLIDE 4

Chromosomic aberrations in cancer

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 4 / 44

slide-5
SLIDE 5

Comparative Genomic Hybridization (CGH)

Motivation

Comparative genomic hybridization (CGH) data measure the DNA copy number along the genome Very useful, in particular in cancer research to observe systematically variants in DNA content

  • 1
  • 0.5

0.5 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 X Chromosome Log-ratio

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 5 / 44

slide-6
SLIDE 6

Cancer prognosis: can we predict the future evolution?

500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1

Aggressive (left) vs non-aggressive (right) melanoma

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 6 / 44

slide-7
SLIDE 7

DNA → RNA → protein

CGH shows the (static) DNA Cancer cells have also abnormal (dynamic) gene expression (= transcription)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 7 / 44

slide-8
SLIDE 8

Tissue profiling with DNA chips

Data

Gene expression measures for more than 10k genes Measured typically on less than 100 samples of two (or more) different classes (e.g., different tumors)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 8 / 44

slide-9
SLIDE 9

Can we identify the cancer subtype? (diagnosis)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 9 / 44

slide-10
SLIDE 10

Can we predict the future evolution? (prognosis)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 10 / 44

slide-11
SLIDE 11

Pattern recognition, aka supervised classification

500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 11 / 44

slide-12
SLIDE 12

Pattern recognition, aka supervised classification

500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 11 / 44

slide-13
SLIDE 13

Pattern recognition, aka supervised classification

500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 11 / 44

slide-14
SLIDE 14

Pattern recognition, aka supervised classification

500 1000 1500 2000 2500 −0.5 0.5 500 1000 1500 2000 2500 −1 1 2 500 1000 1500 2000 2500 −2 −1 1 500 1000 1500 2000 2500 −2 2 4 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −1 −0.5 0.5 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −4 −2 2 500 1000 1500 2000 2500 −1 1

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 11 / 44

slide-15
SLIDE 15

Pattern recognition, aka supervised classification

Challenges

Few samples High dimension Structured data Heterogeneous data Prior knowledge Fast and scalable implementations Interpretable models

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 12 / 44

slide-16
SLIDE 16

Shrinkage estimators

1

Define a large family of "candidate classifiers", e.g., linear predictors: fβ(x) = β⊤x for x ∈ Rp

2

For any candidate classifier fβ, quantify how "good" it is on the training set with some empirical risk, e.g.: R(β) = 1 n

n

  • i=1

l(fβ(xi), yi) .

3

Choose β that achieves the minimium empirical risk, subject to some constraint: min

β R(β)

subject to Ω(β) ≤ C .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 13 / 44

slide-17
SLIDE 17

Shrinkage estimators

1

Define a large family of "candidate classifiers", e.g., linear predictors: fβ(x) = β⊤x for x ∈ Rp

2

For any candidate classifier fβ, quantify how "good" it is on the training set with some empirical risk, e.g.: R(β) = 1 n

n

  • i=1

l(fβ(xi), yi) .

3

Choose β that achieves the minimium empirical risk, subject to some constraint: min

β R(β)

subject to Ω(β) ≤ C .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 13 / 44

slide-18
SLIDE 18

Shrinkage estimators

1

Define a large family of "candidate classifiers", e.g., linear predictors: fβ(x) = β⊤x for x ∈ Rp

2

For any candidate classifier fβ, quantify how "good" it is on the training set with some empirical risk, e.g.: R(β) = 1 n

n

  • i=1

l(fβ(xi), yi) .

3

Choose β that achieves the minimium empirical risk, subject to some constraint: min

β R(β)

subject to Ω(β) ≤ C .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 13 / 44

slide-19
SLIDE 19

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

b*

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-20
SLIDE 20

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

b* best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-21
SLIDE 21

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

est

b* b

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-22
SLIDE 22

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

C

b* best best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-23
SLIDE 23

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

C

b* best b*

C

best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-24
SLIDE 24

Why skrinkage classifiers?

min

β R(β)

subject to Ω(β) ≤ C .

b* best b*

C

best

C

Bias Variance

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 14 / 44

slide-25
SLIDE 25

Why skrinkage classifiers?

b* best b*

C

best

C

Bias Variance

"Increases bias and decreases variance" Common choices are

Ω(β) = p

i=1 β2 i (ridge regression, SVM, ...)

Ω(β) = p

i=1 | βi | (lasso, boosting, ...)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 15 / 44

slide-26
SLIDE 26

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

b* best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-27
SLIDE 27

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

est

b* b

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-28
SLIDE 28

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

C

b* best b*

C

best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-29
SLIDE 29

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

b* best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-30
SLIDE 30

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

est

b* b

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-31
SLIDE 31

Including prior knowledge in the penalty?

min

β R(β)

subject to Ω(β) ≤ C .

C

b* best b*

C

best

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 16 / 44

slide-32
SLIDE 32

Outline

1

Introduction

2

Cancer prognosis from DNA copy number variations

3

Diagnosis and prognosis from gene expression data

4

Conclusion

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 17 / 44

slide-33
SLIDE 33

CGH array classification

Prior knowledge

For a CGH profile x ∈ Rp, we focus on linear classifiers, i.e., the sign of : fβ(x) = β⊤x . We expect β to be

sparse : not all positions should be discriminative piecewise constant : within a selected region, all probes should contribute equally

  • 1
  • 0.5

0.5 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 2021 22 23 X Chromosome Log-ratio

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 18 / 44

slide-34
SLIDE 34

Promoting sparsity with the ℓ1 penalty

The ℓ1 penalty (Tibshirani, 1996; Chen et al., 1998)

The solution of min

β∈Rp R(β) + λ p

  • i=1

|βi| is usually sparse.

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 19 / 44

slide-35
SLIDE 35

Promoting piecewise constant profiles penalty

The variable fusion penalty (Land and Friedman, 1996)

The solution of min

β∈Rp R(β) + λ p−1

  • i=1

|βi+1 − βi| is usually piecewise constant.

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 20 / 44

slide-36
SLIDE 36

Fused Lasso signal approximator (Tibshirani et al., 2005)

min

β∈Rp p

  • i=1

(yi − βi)2 + λ1

p

  • i=1

|βi| + λ2

p−1

  • i=1

|βi+1 − βi| . First term leads to sparse solutions Second term leads to piecewise constant solutions

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 21 / 44

slide-37
SLIDE 37

Fused lasso for supervised classification (Rapaport et al., 2008)

min

β∈Rp n

  • i=1

  • yi, β⊤xi
  • + λ1

p

  • i=1

|βi| + λ2

p−1

  • i=1

|βi+1 − βi| . where ℓ is, e.g., the hinge loss ℓ(y, t) = max(1 − yt, 0).

Implementation

When ℓ is the hinge loss (fused SVM), this is a linear program -> up to p = 103 ∼ 104 When ℓ is convex and smooth (logistic, quadratic), efficient implementation with proximal methods -> up to p = 108 ∼ 109

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 22 / 44

slide-38
SLIDE 38

Fused lasso for supervised classification (Rapaport et al., 2008)

min

β∈Rp n

  • i=1

  • yi, β⊤xi
  • + λ1

p

  • i=1

|βi| + λ2

p−1

  • i=1

|βi+1 − βi| . where ℓ is, e.g., the hinge loss ℓ(y, t) = max(1 − yt, 0).

Implementation

When ℓ is the hinge loss (fused SVM), this is a linear program -> up to p = 103 ∼ 104 When ℓ is convex and smooth (logistic, quadratic), efficient implementation with proximal methods -> up to p = 108 ∼ 109

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 22 / 44

slide-39
SLIDE 39

Example: predicting metastasis in melanoma

500 1000 1500 2000 2500 3000 3500 4000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 BAC Weight

500 1000 1500 2000 2500 3000 3500 4000 −1 −0.8 −0.6 −0.4 −0.2 0.2 0.4 0.6 0.8 1 BAC Weight

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 23 / 44

slide-40
SLIDE 40

Extension: joint segmentation of many profiles

200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0.5 1 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0.5 1 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0.5 1 200 400 600 800 1000 1200 1400 1600 1800 2000 −1 −0.5 0.5 1

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 24 / 44

slide-41
SLIDE 41

Fused group Lasso signal approximator

min

β∈Rn×p Y − β2 + λ p−1

  • i=1

βi+1 − βi

200 400 600 800 1000 1200 1400 1600 1800 2000 !1 !0.5 0.5 1

Probe Log-ratio

(a)

(b)

200 400 600 800 1000 1200 1400 1600 1800 2000 !1 !0.5 0.5 1

Probe Score (c)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 25 / 44

slide-42
SLIDE 42

Outline

1

Introduction

2

Cancer prognosis from DNA copy number variations

3

Diagnosis and prognosis from gene expression data

4

Conclusion

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 26 / 44

slide-43
SLIDE 43

Molecular diagnosis / prognosis / theragnosis

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 27 / 44

slide-44
SLIDE 44

Gene selection, signature

The idea

We look for a limited set of genes that are sufficient for prediction. Equivalently, the linear classifier will be sparse

Why?

Bet on sparsity: we believe the "true" model is sparse. Interpretation: we will get a biological interpretation more easily by looking at the selected genes. Satistics: this is one way to constrain the solution and reduce the complexity to allow learning.

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 28 / 44

slide-45
SLIDE 45

But...

Challenging the idea of gene signature

We often observe little stability in the genes selected... Is gene selection the most biologically relevant hypothesis? What about thinking instead of "pathways" or "modules" signatures?

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 29 / 44

slide-46
SLIDE 46

Gene networks

N

  • Glycan

biosynthesis

Protein kinases

DNA and RNA polymerase subunits Glycolysis / Gluconeogenesis Sulfur metabolism Porphyrin and chlorophyll metabolism Riboflavin metabolism Folate biosynthesis

Biosynthesis of steroids, ergosterol metabolism Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis Purine metabolism Oxidative phosphorylation, TCA cycle

Nitrogen, asparagine metabolism

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 30 / 44

slide-47
SLIDE 47

Graph based penalty

Prior hypothesis

Genes near each other on the graph should have similar weigths.

Two solutions (Rapaport et al., 2007, 2008)

Ωspectral(β) =

  • i∼j

(βi − βj)2 , Ωgraphfusion(β) =

  • i∼j

|βi − βj| +

  • i

|βi| .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 31 / 44

slide-48
SLIDE 48

Graph based penalty

Prior hypothesis

Genes near each other on the graph should have similar weigths.

Two solutions (Rapaport et al., 2007, 2008)

Ωspectral(β) =

  • i∼j

(βi − βj)2 , Ωgraphfusion(β) =

  • i∼j

|βi − βj| +

  • i

|βi| .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 31 / 44

slide-49
SLIDE 49

Classifiers

N

  • Glycan

biosynthesis

Protein kinases

DNA and RNA polymerase subunits Glycolysis / Gluconeogenesis Sulfur metabolism Porphyrin and chlorophyll metabolism Riboflavin metabolism Folate biosynthesis

Biosynthesis of steroids, ergosterol metabolism Lysine biosynthesis Phenylalanine, tyrosine and tryptophan biosynthesis Purine metabolism Oxidative phosphorylation, TCA cycle

Nitrogen, asparagine metabolism

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 32 / 44

slide-50
SLIDE 50

Classifiers

a) b)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 33 / 44

slide-51
SLIDE 51

Limits

a) b)

We are happy to see pathways appear. However, in some cases, connected genes should have "opposite" weights (inhibition, pathway branching, etc...) How to capture pathways without constraints on the weight similarities?

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 34 / 44

slide-52
SLIDE 52

Selecting pre-defined groups of variables

Group lasso (Yuan & Lin, 2006)

If groups of covariates are likely to be selected together, the ℓ1/ℓ2-norm induces sparse solutions at the group level: Ωgroup(w) =

  • g

wg2 Ω(w1, w2, w3) = (w1, w2)2+w32

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 35 / 44

slide-53
SLIDE 53

Graph lasso

Hypothesis: selected genes should form connected components

  • n the graph

Two solutions (Jacob et al., 2009): Ωgroup(β) =

  • i∼j
  • β2

i + β2 j ,

Ωoverlap(β) = sup

α∈Rp:∀i∼j,α2

i +α2 j ≤1

α⊤β .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 36 / 44

slide-54
SLIDE 54

Graph lasso

Hypothesis: selected genes should form connected components

  • n the graph

Two solutions (Jacob et al., 2009): Ωgroup(β) =

  • i∼j
  • β2

i + β2 j ,

Ωoverlap(β) = sup

α∈Rp:∀i∼j,α2

i +α2 j ≤1

α⊤β .

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 36 / 44

slide-55
SLIDE 55

Overlap and group unity balls

Balls for ΩG

group (·) (middle) and ΩG

  • verlap (·) (right) for the groups

G = {{1, 2}, {2, 3}} where w2 is represented as the vertical coordinate.

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 37 / 44

slide-56
SLIDE 56

Summary: Graph lasso vs kernel

Graph lasso: Ωgraph lasso(w) =

  • i∼j
  • w2

i + w2 j .

constrains the sparsity, not the values Graph kernel Ωgraph kernel(w) =

  • i∼j

(wi − wj)2 . constrains the values (smoothness), not the sparsity

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 38 / 44

slide-57
SLIDE 57

Preliminary results

Breast cancer data

Gene expression data for 8, 141 genes in 295 breast cancer tumors. Canonical pathways from MSigDB containing 639 groups of genes, 637 of which involve genes from our study.

METHOD ℓ1 ΩG

OVERLAP (.)

ERROR 0.38 ± 0.04 0.36 ± 0.03 MEAN ♯ PATH. 130 30

Graph on the genes.

METHOD ℓ1 Ωgraph(.) ERROR 0.39 ± 0.04 0.36 ± 0.01

  • AV. SIZE C.C.

1.03 1.30

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 39 / 44

slide-58
SLIDE 58

Lasso signature

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 40 / 44

slide-59
SLIDE 59

Graph Lasso signature

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 41 / 44

slide-60
SLIDE 60

Outline

1

Introduction

2

Cancer prognosis from DNA copy number variations

3

Diagnosis and prognosis from gene expression data

4

Conclusion

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 42 / 44

slide-61
SLIDE 61

Conclusion

Many challenging problems for statistical learning in genomics (high dimension, structure, noise...) Integration of prior knowledge in the penalization / regularization function is an efficient approach to fight the curse of dimension Several computationally efficient approaches (structured LASSO, kernels...) Tight collaborations with domain experts can help develop specific learning machines for specific data Natural extensions for data integration

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 43 / 44

slide-62
SLIDE 62

People I need to thank

Franck Rapaport (MSKCC), Emmanuel Barillot, Andrei Zynoviev Kevin Bleakley, Anne-Claire Haury(Institut Curie / ParisTech), Laurent Jacob (UC Berkeley) Guillaume Obozinski (INRIA)

Jean-Philippe Vert (ParisTech) Machine learning in genomics Paris 2011 44 / 44