Complex Aggregates over Subsets of Elements Celine Vens < - - PowerPoint PPT Presentation

complex aggregates over subsets of elements
SMART_READER_LITE
LIVE PREVIEW

Complex Aggregates over Subsets of Elements Celine Vens < - - PowerPoint PPT Presentation

Complex Aggregates over Subsets of Elements Celine Vens < Celine.Vens@irc.vib-ugent.be > Sofie Van Gassen < Sofie.VanGassen@intec.ugent.be > Tom Dhaene < Tom.Dhaene@intec.ugent.be > Yvan Saeys < Yvan.Saeys@irc.vib-ugent.be >


slide-1
SLIDE 1

Complex Aggregates over Subsets of Elements

Celine Vens <Celine.Vens@irc.vib-ugent.be> Sofie Van Gassen <Sofie.VanGassen@intec.ugent.be> Tom Dhaene <Tom.Dhaene@intec.ugent.be> Yvan Saeys <Yvan.Saeys@irc.vib-ugent.be>

September 16, 2014

slide-2
SLIDE 2

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Flow Cytometry

Measurement of cell properties in a fluidic system

Celine Vens et.al. — September 16, 2014 2/11

slide-3
SLIDE 3

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Flow Cytometry

Hundreds of thousands of cells are measured for each patient ⇒ Relational data

Patient ID Class Clinical information

Cell ID Cell measurements

Primary table that has a one-to-many relationship with a secondary table

Celine Vens et.al. — September 16, 2014 2/11

slide-4
SLIDE 4

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on individual cells

  • E.g. Does the patient have a cell with a CD3 value larger than x?

Celine Vens et.al. — September 16, 2014 3/11

slide-5
SLIDE 5

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on individual cells

  • E.g. Does the patient have a cell with a CD3 value larger than x?
  • Too specific

Celine Vens et.al. — September 16, 2014 3/11

slide-6
SLIDE 6

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on aggregates over cells

  • E.g. Is the mean CD3 value of all cells from the patient larger than x?

Celine Vens et.al. — September 16, 2014 3/11

slide-7
SLIDE 7

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on aggregates over cells

  • E.g. Is the mean CD3 value of all cells from the patient larger than x?
  • Too general

Celine Vens et.al. — September 16, 2014 3/11

slide-8
SLIDE 8

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on complex aggregates of cells

  • E.g. Is the number of cells with a CD3 value larger than x larger than y?

Celine Vens et.al. — September 16, 2014 3/11

slide-9
SLIDE 9

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

How to diagnose patients?

Cell measurements Cell ID CD3

Based on complex aggregates of cells

  • E.g. Is the number of cells with a CD3 value larger than x larger than y?
  • Good compromise, but ...

Celine Vens et.al. — September 16, 2014 3/11

slide-10
SLIDE 10

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Problems with traditional complex aggregates

−5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10

Celine Vens et.al. — September 16, 2014 4/11

slide-11
SLIDE 11

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Complex aggregates over subsets

Complex aggregate over clusters

We present a new type of complex aggregates, where a subset is defined as a cluster of the set. Advantages

  • Clusters can define more specific subsets in comparison with

conditions on individual attribute values

  • Clusters can be aggregated in more advanced ways, capturing

information about the shape of the cluster

Celine Vens et.al. — September 16, 2014 5/11

slide-12
SLIDE 12

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Partitional (flat) clustering

  • Cluster the data with a clustering algorithm of choice
  • One cluster structure for all patients
  • r
  • Independent clustering for each patient
  • Pre-compute aggregate functions on the obtained clusters
  • Add this new information to the relational database
  • Possibility to remove the original data, to speed up the

relational learning

Celine Vens et.al. — September 16, 2014 6/11

slide-13
SLIDE 13

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Online hierarchical clustering

  • Cluster the data during the learning algorithm
  • Clustering takes place ”
  • n demand”to refine or generalize a

rule

  • E.g. “Is the mean CD3 value of all cells from the patient

larger than x?”⇒

  • Perform a hierarchical clustering step
  • Is the mean CD3 value of one of the resulting clusters larger

than x?

Celine Vens et.al. — September 16, 2014 7/11

slide-14
SLIDE 14

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Online hierarchical clustering

  • Many opportunities to split or merge will result in a larger

search space

  • Original data needs to be stored
  • In this work we focus on the partitional clustering approach

Celine Vens et.al. — September 16, 2014 7/11

slide-15
SLIDE 15

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Synthetic dataset 1

−5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10 −5 5 10

Mean accuracy 1 Tilde decision tree, 5-fold cross validation

Celine Vens et.al. — September 16, 2014 8/11

slide-16
SLIDE 16

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

HIV Vaccine Trials Network

  • Dataset from the FlowCAP II challenge
  • 48 patients (27 for training, 21 for testing)
  • Two samples from each patient,

challenged with different antigens

  • Goal: detect automatically which antigen has been used

Celine Vens et.al. — September 16, 2014 9/11

slide-17
SLIDE 17

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

HIV Vaccine Trials Network

  • Clustering with the FlowMeans-algorithm
  • Bagging ensemble of 100 Tilde trees

Celine Vens et.al. — September 16, 2014 9/11

slide-18
SLIDE 18

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

HIV Vaccine Trials Network

  • We obtain a test accuracy of 0.95
  • In line with other FlowCAP results, but several

state-of-the-art algorithms obtain perfect accuracy

  • Further optimization might improve our results

Celine Vens et.al. — September 16, 2014 9/11

slide-19
SLIDE 19

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Conclusion

Our contributions are twofold:

  • We introduced flow cytometry data to the ILP community
  • We introduced a new kind of complex aggregates:

cluster aggregates

Celine Vens et.al. — September 16, 2014 10/11

slide-20
SLIDE 20

c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s

Introduction Complex aggregates Results

Thank You!

Any questions?

Celine Vens et.al. — September 16, 2014 11/11