a method for similarity based grouping of biological data
play

A method for similarity-based grouping of biological data Vaida - PowerPoint PPT Presentation

A method for similarity-based grouping of biological data Vaida Jakonien , David Rundqvist, Patrick Lambrix Outline Environments for supporting grouping algorithms needed Method for similarity based grouping Test cases Summary


  1. A method for similarity-based grouping of biological data Vaida Jakonien ė , David Rundqvist, Patrick Lambrix

  2. Outline � Environments for supporting grouping algorithms needed � Method for similarity based grouping � Test cases � Summary and future work V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 2

  3. Tools for biological data analysis Hierarchical microarray clustering (J-Express Pro) Classification of abstracts V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 3

  4. Tools for biological data analysis � Other applications of grouping � structuring search results � data cleaning � data integration V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 4

  5. Similarity of biological data Similarity between data entries Lord PW, Stevens RD, Brass A, Goble CA. Sequence alignment (BLAST) Bioinformatics, 19(10):1275-83, 2003. � Basic task – computation of a similarity value between objects V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 5

  6. Similarity-based grouping � Similarity-based grouping for biological data needed � Not a trivial task � influence of a number of aspects � data is complex � variety of grouping algorithms is available: which method performs best for which grouping task � existing grouping algorithms may not be applied straightforward V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 6

  7. Similarity-based grouping � Environments that support comparison and evaluation of different grouping strategies are needed V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 7

  8. Method for similarity-based grouping Domain independent Domain dependent Grouping sim. funct. sim. funct. attributes Specification of Data source Library of grouping rules similarity funct. Pairwise grouping Other knowledge Grouping Evaluation Library of classifications Analysis V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 8

  9. � A toolKit for Evaluating Grouping Algorithms V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 9

  10. Test cases � Grouping task. Grouping of proteins with respect to � biological function � class of isozymes they belong to � Data source � human proteins involved in glycolysis � via Entrez retrieved 190 data entries V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 10

  11. Test cases. Data entry Entrez. Protein database V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 11

  12. Test cases. Data entry V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 12

  13. Test cases. Data entry GO ann Sequence V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 13

  14. Test cases. Data sources and mappings - only terms of GO function ontology analyzed GO ann , 67 data entries DS1: - only data entries having GO terms GO Consortium. Mappings between data values and ontological terms: ec2go – ec_numbers translated into GO terms spkw2go – swissprot keywords translated into GO terms DS2: spkw2go Keywords ec2go GO comb , 93 data entries Ec_number GO ann DS3: ec2go Ec_number GO comb , 92 data entries GO ann V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 14

  15. Test cases. Other components Domain dependent Grouping Domain independent sim. funct. attributes sim. funct. Specification of � Library of similarity Data source grouping rules Library of functions similarity funct. Pairwise grouping � EditDist(v 1 ,v 2 ) Other knowledge � SeqSim(v 1 ,v 2 ) Grouping � SemSim(v 1 ,v 2 ) � Other knowledge Evaluation Library of classifications � GO ontology Analysis � Classifications. Manual classification according to � biological function � classes of isozymes V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 15

  16. Specification of grouping rules Method. Specification of grouping rules Pairwise grouping Grouping (DS3) Evaluation Analysis V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 16

  17. Specification of grouping rules Method. Pairwise grouping Pairwise grouping Grouping Evaluation Analysis all pairs of data entries compared (DS3) V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 17

  18. Specification of grouping rules Method. Grouping Pairwise grouping Grouping Evaluation data entries in a group directly or transitively Analysis similar to each other (ConnectedComponents) all data entries in a group similar to each other (Cliques) V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 18

  19. Specification of grouping rules Method. Grouping Pairwise grouping Grouping Evaluation Analysis V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 19

  20. Specification of grouping rules Method. Evaluation Pairwise grouping Grouping Evaluation � Types of quality measures Analysis � internal – based on information obtained during the grouping � external – with respect to known classes of the grouped data V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 20

  21. Specification of grouping rules Method. Analysis Pairwise grouping Grouping Evaluation Analysis true positives false positives false negatives V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 21

  22. Method. Analysis V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 22

  23. Method. Analysis � Studied aspects, e.g. use of different data sources, grouping algorithms, and classifications, grouping on different attributes, impact of threshold V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 23

  24. Test cases. Observations � Best suited grouping approaches. For data source Glyc-Funct-AnnEc-onlyGO (DS3) � SemSim(GOcomb) for grouping on biological function � SeqSim(Sequence) for grouping on classes of isozymes � Suitability of mappings for the used grouping approches � spkw2go – too general, e.g. ’Glycolysis’ � ec2go – specific enough, e.g. ’6-phosphofructokinase activity’ V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 24

  25. Summary and future work � Motivated need for environments that support the development and evaluation of similarity-based grouping procedures � Proposed a method that identifies the main components and steps that are importan for such environments. � Illustrated the grouping method by test cases based on different strategies and classifications � Extend the Kitega implementation V. Jakonien ė , D. Rundqvist, P. Lambrix. Linköpings universitet, Sweden 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend