complex aggregates over subsets of elements
play

Complex Aggregates over Subsets of Elements Celine Vens < - PowerPoint PPT Presentation

Complex Aggregates over Subsets of Elements Celine Vens < Celine.Vens@irc.vib-ugent.be > Sofie Van Gassen < Sofie.VanGassen@intec.ugent.be > Tom Dhaene < Tom.Dhaene@intec.ugent.be > Yvan Saeys < Yvan.Saeys@irc.vib-ugent.be >


  1. Complex Aggregates over Subsets of Elements Celine Vens < Celine.Vens@irc.vib-ugent.be > Sofie Van Gassen < Sofie.VanGassen@intec.ugent.be > Tom Dhaene < Tom.Dhaene@intec.ugent.be > Yvan Saeys < Yvan.Saeys@irc.vib-ugent.be > September 16, 2014

  2. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Flow Cytometry Measurement of cell properties in a fluidic system Celine Vens et.al. — September 16, 2014 2/11

  3. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Flow Cytometry Hundreds of thousands of cells are measured for each patient ⇒ Relational data Cell ID Cell measurements Patient ID Class Clinical information Primary table that has a one-to- many relationship with a secondary table Celine Vens et.al. — September 16, 2014 2/11

  4. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on individual cells • E.g. Does the patient have a cell with a CD3 value larger than x ? Celine Vens et.al. — September 16, 2014 3/11

  5. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on individual cells • E.g. Does the patient have a cell with a CD3 value larger than x ? • Too specific Celine Vens et.al. — September 16, 2014 3/11

  6. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on aggregates over cells • E.g. Is the mean CD3 value of all cells from the patient larger than x ? Celine Vens et.al. — September 16, 2014 3/11

  7. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on aggregates over cells • E.g. Is the mean CD3 value of all cells from the patient larger than x ? • Too general Celine Vens et.al. — September 16, 2014 3/11

  8. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on complex aggregates of cells • E.g. Is the number of cells with a CD3 value larger than x larger than y ? Celine Vens et.al. — September 16, 2014 3/11

  9. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results How to diagnose patients? Cell measurements Cell ID CD3 Based on complex aggregates of cells • E.g. Is the number of cells with a CD3 value larger than x larger than y ? • Good compromise, but ... Celine Vens et.al. — September 16, 2014 3/11

  10. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Problems with traditional complex aggregates 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 Celine Vens et.al. — September 16, 2014 4/11

  11. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Complex aggregates over subsets Complex aggregate over clusters We present a new type of complex aggregates, where a subset is defined as a cluster of the set. Advantages • Clusters can define more specific subsets in comparison with conditions on individual attribute values • Clusters can be aggregated in more advanced ways, capturing information about the shape of the cluster Celine Vens et.al. — September 16, 2014 5/11

  12. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Partitional (flat) clustering • Cluster the data with a clustering algorithm of choice • One cluster structure for all patients or • Independent clustering for each patient • Pre-compute aggregate functions on the obtained clusters • Add this new information to the relational database • Possibility to remove the original data, to speed up the relational learning Celine Vens et.al. — September 16, 2014 6/11

  13. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Online hierarchical clustering • Cluster the data during the learning algorithm • Clustering takes place ” on demand”to refine or generalize a rule • E.g. “Is the mean CD3 value of all cells from the patient larger than x ?” ⇒ • Perform a hierarchical clustering step • Is the mean CD3 value of one of the resulting clusters larger than x ? Celine Vens et.al. — September 16, 2014 7/11

  14. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Online hierarchical clustering • Many opportunities to split or merge will result in a larger search space • Original data needs to be stored • In this work we focus on the partitional clustering approach Celine Vens et.al. — September 16, 2014 7/11

  15. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Synthetic dataset 1 Mean accuracy 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 10 10 10 10 10 5 5 5 5 5 0 0 0 0 0 −5 −5 −5 −5 −5 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 −5 0 5 10 1 Tilde decision tree, 5-fold cross validation Celine Vens et.al. — September 16, 2014 8/11

  16. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • Dataset from the FlowCAP II challenge • 48 patients (27 for training, 21 for testing) • Two samples from each patient, challenged with different antigens • Goal: detect automatically which antigen has been used Celine Vens et.al. — September 16, 2014 9/11

  17. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • Clustering with the FlowMeans-algorithm • Bagging ensemble of 100 Tilde trees Celine Vens et.al. — September 16, 2014 9/11

  18. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results HIV Vaccine Trials Network • We obtain a test accuracy of 0.95 • In line with other FlowCAP results, but several state-of-the-art algorithms obtain perfect accuracy • Further optimization might improve our results Celine Vens et.al. — September 16, 2014 9/11

  19. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Conclusion Our contributions are twofold: • We introduced flow cytometry data to the ILP community • We introduced a new kind of complex aggregates: cluster aggregates Celine Vens et.al. — September 16, 2014 10/11

  20. c o m p l e x a g g r e g a t e s o v e r s u b s e t s o f e l e m e n t s Introduction Complex aggregates Results Thank You! Any questions? Celine Vens et.al. — September 16, 2014 11/11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend