www.mlda.swu.edu.cn
Interspecies gene function prediction using semantic similarity
1
Interspecies gene function prediction using semantic similarity - - PowerPoint PPT Presentation
Interspecies gene function prediction using semantic similarity Guoxian Yu*, Wei Luo, Guangyuan Fu, Jun Wang Machine Learning and Data Analysis Lab. Southwest University gxyu@swu.edu.cn www.mlda.swu.edu.cn 1 Outline 1 Backgrounds 2
www.mlda.swu.edu.cn
1
www.mlda.swu.edu.cn
2
www.mlda.swu.edu.cn
3
www.mlda.swu.edu.cn
4
www.mlda.swu.edu.cn
p Gene Ontology (GO) annotations of a Human gene and a
5
www.mlda.swu.edu.cn
pSemantic similarity metrics:
6
Edge-based: Node-based :
1 2 1 2
2 ( ) ( , ) ( ) ( )
A Lin
IC t tsim t t IC t IC t × = + where IC(t) is the information content of t, it is defined as −log(p(t)), 𝑢" is the most informative common ancestor between 𝑢# and 𝑢$. Pesquita, C., et al. Semantic similarity in biomedical ontologies. PLoS Computational Biology, 2009, 5(7), e1000443. Distance: shortest paths、average paths
IC Common ancestor Common descendants Number of ancestors、node depth
www.mlda.swu.edu.cn
pSemantic similarity metrics
7
Pairwise Groupwise Set : Term Overlap Vector : CoSim Graph : UI、GIC Best pairs : MAX、BMA All pairs : AVG
[1] Pesquita, C. et al. Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics, 2008, 9(S5), 4. [2] Tao, Y. et al. Information theory applied to the sparse gene ontology annotation network to predict novel gene function. Bioinformatics, 2007, 23(13), 529-538. [3] Yu, G. et al. Predicting protein function via downward random walks on a gene ontology. BMC Bioinformatics, 2015, 16: 271.
www.mlda.swu.edu.cn
8
[1] Pesquita, C., Faria, D., Bastos, H., Ferreira, A.E., Falcao, A.O., Couto, F.M.: Metrics for GO based protein semantic similarity: a systematic evaluation. BMC Bioinformatics, 2008, 9(S5), 4. [2] Mistry M, Pavlidis P. Gene Ontology term overlap as a measure of gene functional
www.mlda.swu.edu.cn
9
f1 f2 f3 f4 f5 p1 1 1 p2 1 1 p3 1 1 1 1 p4 1 p5 1 p6 1 1 p7 1 1 1 p8 1 p9 1 1 f1 f2 f3 f4 f5 p1 1 1 p2 1 1 p3 1 1 1 1 p4 1 f1 f2 f3 f4 f5 p1 1 p2 1 1 p3 1 1 1 p4 1 p5 1 1
www.mlda.swu.edu.cn
10
f1 f2 f3 f4 f5 p1 1 1 p2 1 1 p3 1 1 1 1 p4 1
p1 p2 p3 p4 p1 1 1 0.23 0.35 p2 1 1 0.23 0.35 p3 0.23 0.23 1 p4 0.35 0.35 1
pIntrapecies- semantic similarity
www.mlda.swu.edu.cn
11
f1 f2 f3 f4 f5 p1 1 1 p2 1 1 p3 1 1 1 1 p4 1 p5 1 p6 1 1 p7 1 1 1 p8 1 p9 1 1
p1 p2 p3 p4 p5 p6 p7 p8 p9
p1 1 1 0.23 0.35 0.24 0.21 p2 1 1 0.23 0.35 0.24 0.21 p3 0.23 0.23 1 0.27 0.17 0.73 0.2 0.23 p4 0.35 0.35 1 0.42 0.35
p5
0.27 1 0.37
p6
0.24 0.24 0.17 1 0.23 0.58 0.24
p7
0.73 0.37 0.23 1 0.27 0.3
p8
0.2 0.58 0.27 1
p9
0.21 0.21 0.23 0.35 0.24 0.3 1
simGIC
www.mlda.swu.edu.cn
pGene function prediction using kNN
12
1 2
( )
k
j j N i
∈
( ) 1 2
s ks
s j j N i
∈
www.mlda.swu.edu.cn
pDatasets
pInvestigations:
n The improvement of interspecies gene function prediction n The impact of semantic similarities n The influence of homology between species
13
www.mlda.swu.edu.cn
14
pResults on archived annotations using TO in CC branch.
www.mlda.swu.edu.cn
15
p Results on archived annotations using GIC in CC branch
pObservations:
n Interspecies gene function prediction works better than single
n A species with high homology contributes more than the one with
n Semantic similarity do not affect the above observations and results.
www.mlda.swu.edu.cn
16
pCombining annotations in CC, MF and BP together pObservation
n Interspecies gene function prediction also has improvement,
n BP, CC and MF provide functional clue for each other branch.
www.mlda.swu.edu.cn
17
pResults on simulated missing GO annotations
pObservation
n Interspecies gene function prediction brings more prominent
www.mlda.swu.edu.cn
18
pInterspecies
pGO
pFuture work: synergy the semantic similarity with other
www.mlda.swu.edu.cn
19