Machine Learning for Pairwise Data Adriana B rlut iu A few words - - PowerPoint PPT Presentation

machine learning for pairwise data
SMART_READER_LITE
LIVE PREVIEW

Machine Learning for Pairwise Data Adriana B rlut iu A few words - - PowerPoint PPT Presentation

Machine Learning for Pairwise Data Adriana B rlut iu A few words about myself 2012 Scientific programmer S.A.I.A.&OncoPredict: working on applications of machine learning and computational intelligence in medical oncology.


slide-1
SLIDE 1

Machine Learning for Pairwise Data

Adriana Bˆ ırlut ¸iu

slide-2
SLIDE 2

A few words about myself

◮ 2012− Scientific programmer S.A.I.A.&OncoPredict: working

  • n applications of machine learning and computational

intelligence in medical oncology.

◮ 2006 − 2011 Research assistant - Ph.D. student, Institute for

Computing and Information Sciences, Radboud University Nijmegen, Netherlands. Thesis: Machine learning for pairwise data.

◮ 2000 − 2005 B.Sc. and M.Sc. from Faculty of Mathematics

and Computer Science, Babe¸ s-Bolyai University Cluj-Napoca.

slide-3
SLIDE 3

Machine learning

◮ Branch of AI focused on the design and development of

methods that allow machines to learn based on observations.

◮ Spam filtering, speech and hand-write recognition, medical

diagnosis, detecting credit card fraud, stock market analysis.

◮ Availability of empirical data and computational power.

slide-4
SLIDE 4

Supervised learning

Learning a latent function from observations

◮ Data D = {(xi, yi), i = 1, . . . , n} ◮ Input space X ⊆ Rd, output space Y ⊆ R ◮ Goal: predict functional relation f : X → Y

slide-5
SLIDE 5

Machine learning

In many cases obtaining labeled data to train the algorithms is expensive!

slide-6
SLIDE 6

Machine learning ↔ human learning

Characteristics of learning:

◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning

slide-7
SLIDE 7

Machine learning ↔ human learning

Characteristics of learning:

◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning

Applied to: preference learning and supervised network inference.

slide-8
SLIDE 8

Machine learning ↔ human learning

Characteristics of learning:

◮ based on prior experience → multi-task/transfer learning ◮ selects the most useful information → active learning

Applied to: preference learning and supervised network inference. Connection between the two: pairwise data.

slide-9
SLIDE 9

Preference learning

◮ Learning from observations that reveal information about the

preferences of an individual or a class of individuals.

◮ Used in decision support systems, recommender systems. ◮ Application areas: E-commerce, marketing, health care,

computer games.

slide-10
SLIDE 10

Personalization of hearing-aids

Goal: tune the parameters so as to maximize the user satisfaction Problems:

◮ Large dimensionality of the parameter space ◮ Determinants of hearing-impaired user satisfaction are

unknown

◮ Listening tests are costly and unreliable

=> Personalized fitting based on a probabilistic framework

slide-11
SLIDE 11

Personalization and decision making for hearing-aids

Finding the hearing-aid parameters that are optimal for a patient

slide-12
SLIDE 12

Bayesian updating

◮ Suppose θ is unknown ◮ Start from a prior distribution P(θ) ◮ Update this prior based on observations D using Bayes rule.

P(θ|D) = P(D|θ)P(θ) P(D)

slide-13
SLIDE 13

Bayes rule

◮ P(rain) = 20% ◮ P(umbrella|rain) = 70% and P(umbrella|no rain) = 10%

Does not need to sum up to 100%, contrary to P(umbrella|rain) + P(no umbrella|rain)

◮ Bayes rule

P(rain|umbrella) =

P(rain)×P(umbrella|rain) P(rain)×P(umbrella|rain)+P(no rain)×P(umbrella|no rain)

=

0.2×0.7 0.2×0.7+0.8×0.1 = 64%

slide-14
SLIDE 14

Multi-task learning

Learning multiple functions → multi-task/transfer learning

slide-15
SLIDE 15

Supervised inference of biological networks

◮ Infer missing edges in a graph (dotted edges) where a few

edges are already known (solid edges).

◮ Use attributes available about individual vertices, such as

vectors of expression levels across different experiments if vertices are genes.

slide-16
SLIDE 16

Supervised edge inference or link prediction

◮ o,o′: two proteins ◮ x(o) and x(o′): input feature vectors encoding some

properties of o and o′ Learn a function f : (x(o), x(o′)) → {0, 1} from training data D = {x(oi), i = 1, . . . , p; Aij, i, j = 1, . . . , p}.

slide-17
SLIDE 17

Network topology

◮ Scale-free architecture ◮ Clustering coefficient, network diameter, average shortest path ◮ Network motifs: small subgraphs which appear in the network

significantly more frequently than in a randomized network How can this information be used?

slide-18
SLIDE 18

Personalized cancer medicine

◮ microArray data ◮ Diagnostic, predicting recurrence, predicting progression ◮ Problem: large number of features − > large dimensionality ◮ Feature selection: maximum relevance minimum redundance ◮ Misclassification costs ◮ Cancer pathways ◮ Personalize cancer treatment

slide-19
SLIDE 19

Decision tree classifier

i-Biomarker represented by a decision tree. The samples are classified in the terminal nodes of the tree: cancer (red rectangles) or normal (blue rectangles). For a new sample, we observe the values of the three genes and compare them with the threshold values identified at each node.

slide-20
SLIDE 20

Ensemble methods

Ensemble method classification flow chart. The approach is to compose an ensemble with n i-Biomarkers (decision trees), each i-Biomarker trained on a data set derived from the original data set. Each i-Biomarker is used for making predictions on the samples from the test data set. The votes of individual i-Biomarkers are integrated in a final decision (diagnosis).