improved gene ontology annotation predictions through
play

Improved Gene Ontology Annotation Predictions through Bayesian - PowerPoint PPT Presentation

Improved Gene Ontology Annotation Predictions through Bayesian Network Post-processing Marco Tagliasacchi, Marco Masseroli Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy


  1. Improved Gene Ontology Annotation Predictions through Bayesian Network Post-processing Marco Tagliasacchi, Marco Masseroli Dipartimento di Elettronica e Informazione, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milano, Italy

  2. Improved GO Annotation Predictions through Bayesian Network Post-processing Summary 2 � Motivation � Related work � Problem statement and goal � SVD method � Bayesian network method � Evaluation results � Conclusions BITS 2009, Genova, 18-20 March 2009

  3. Improved GO Annotation Predictions through Bayesian Network Post-processing Motivation 3 Several controlled vocabularies and ontologies � available and used to functionally annotate genes and proteins • Gene Ontology is the most widely used – Biological processes – Molecular functions – Cellular components Controlled annotations are paramount to: � • Support biological interpretation of experimental results • Derive new biomedical knowledge BITS 2009, Genova, 18-20 March 2009

  4. Improved GO Annotation Predictions through Bayesian Network Post-processing Motivation 4 Annotation issues: � • Not exhaustive – Only a subset of genes and proteins of sequenced organisms known and annotated • Incomplete annotations – Biological knowledge yet to be discovered • Incorrect annotations – Possibly those inferred from electronic annotations • Only few reliable annotations – By time consuming human curation Extremely useful computational methods: � • Reliably predict annotations • Provide prioritized lists of predicted annotations to be checked by curators BITS 2009, Genova, 18-20 March 2009

  5. Improved GO Annotation Predictions through Bayesian Network Post-processing Related work 5 Prediction of annotation profiles has been addressed in � the past literature: • Methods based on existing annotations: – Decision trees/Bayesian networks [Kings et al., 2003] – Singular value decomposition (SVD) [Khatri et al., 2005] – k-NN classifiers [Tao et al., 2007] – ... • Methods based on other information sources: – Microarray data [Barutcuoglu et al., 2006] – Mined textual information [Raychaudhuri et al., 2002], [Perez et al., 2004] – ... • For a survey: Pandey et al. “Computational approaches for protein function prediction: A survey” (2006) BITS 2009, Genova, 18-20 March 2009

  6. Improved GO Annotation Predictions through Bayesian Network Post-processing Problem statement and goal 6 Propose a post-processing � method to be applied to the output of the SVD method [Khatri et al., 2005] Anomalous Fix the issue related to the � prediction existence of anomalous predictions of ontological annotations: • A gene might be predicted annotated to an ontology term, but not to one of its ancestors GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity Output score of the GO:0022804 Active transmembrane transporter activity SVD method GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  7. Improved GO Annotation Predictions through Bayesian Network Post-processing Proposed solution 7 Leverage the semantic relationship � between ontological terms as expressed by the ontology structure Construct a Bayesian network � based on the ontology topology and use the output of SVD as prior evidence Produce corrected anomaly � free annotation profiles GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity Output score of the GO:0022804 Active transmembrane transporter activity proposed method GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  8. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 8 1. Input: available direct annotations Ontological terms (e.g. GO terms)   0 1 0 0 0 0 1 0 ... 0   0 1 0 0 0 1 1 0 ... 0     0 1 0 1 0 0 0 0 ... 0 Genes =  A  1 0 0 0 0 0 0 1 ... 0     M M M M M M M M O M   1 1 0 0 0 0 0 0 ... 0   GO:0003647 Molecular function GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  9. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 9 2. Annotation unfolding: Ontological terms (e.g. GO terms) Ontological terms (e.g. GO terms)     0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 ... ... 0 0     0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 0 ... ... 0 0           1 1 1  0 1 0 1 0 0 0 0 ... 0  0 1 1 1 1 0 0 0 1 0 1 ... 0 Genes Genes % =  =  A A       1 0 0 0 0 0 0 1 ... 0 1 0 0 0 0 0 0 1 ... 0         M M M M M M M M M M M M M M M M O O M M     1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 ... ... 0 0     GO:0003647 Molecular function GO:0003647 Molecular function GO:0005215 Transporter activity GO:0005215 Transporter activity GO:0022857 Transmembrane transporter activity GO:0022857 Transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0022804 Active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0015291 Secondary active transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0022891 Substrate-specific transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0015075 Ion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity GO:0008509 Anion transmembrane transporter activity BITS 2009, Genova, 18-20 March 2009

  10. Improved GO Annotation Predictions through Bayesian Network Post-processing SVD method 10 3. Compute SVD: % = Σ = Σ = Σ T A U V U V U V = % = Σ T A U V 4. Compute reduced rank approximation: % = = Σ k Σ = Σ T A U A V A U U V A V U V k k k k k k k k = % = Σ T A U V k k k k 5. Apply threshold ( ): > τ , ) k % • If and A i j = � predicted new annotation (FP) % > τ ( , ) 0 A i j ( , ) k % % • If and � confirmed annotation (TP) > τ A i j = A i j ( , ) ( , ) 1 k • If and � confirmed no annotation (TN) % ≤ τ % A i j ( , ) A i j = ( , ) 0 k % • If and � annotation to be checked (FN) % A i j = ≤ τ ( , ) 1 A i j ( , ) k BITS 2009, Genova, 18-20 March 2009

  11. Improved GO Annotation Predictions through Bayesian Network Post-processing Anomalous predictions 11 The output of the SVD � method might contain anomalous predictions The real valued output of � Anomalous the SVD method might be prediction such that: % % > A i j ( , ) A i r ( , ) k k where r is ancestor of j After thresholding, term j � might result annotated to gene i , while term r is not Output score of the SVD method BITS 2009, Genova, 18-20 March 2009

  12. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 12 Design a Bayesian network to remove anomalous � predictions • Input: real-valued scores computed by SVD method • Output: anomaly-free real-valued scores Bayesian network structure based on ontology topology � • Term nodes • Evidence nodes e t j j t e t e t e c c c c c c 1 1 2 2 L L Need to define conditional probabilities � BITS 2009, Genova, 18-20 March 2009

  13. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 13 For each gene i: e t j j t e t e t e c c c c c c 1 1 2 2 L L Term nodes (t-nodes) conditional probabilities � p t t t ( | , ,..., t ) ( | t t t t t , t ,..., t ) t i j c c c j j c c c c c c 1 2 L 1 2 L 1 2 3 Estimated from available annotations BITS 2009, Genova, 18-20 March 2009

  14. Improved GO Annotation Predictions through Bayesian Network Post-processing Bayesian network method 14 e t j j t e t e t e c c c c c c 1 1 2 2 L L Evidence nodes (e-nodes) conditional probabilities: � • Gaussian Mixture Model (estimated from available <t j ,e j > pairs) BITS 2009, Genova, 18-20 March 2009

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend