a linear time algorithm for comparing similar ordered
play

A linear-time algorithm for comparing similar ordered trees H el` - PowerPoint PPT Presentation

A linear-time algorithm for comparing similar ordered trees H el` ene Touzet LIFL University of Lille 1 France Comparison with k errors P roblem : Input : two ordered trees (that are assumed to be similar) a natural number k :


  1. A linear-time algorithm for comparing similar ordered trees H´ el` ene Touzet LIFL – University of Lille 1 – France

  2. Comparison with k errors ◮ P roblem : Input : two ordered trees (that are assumed to be similar) a natural number k : the best mapping M containing less than k errors, Output if it exists ◮ E rror : insertion of a node, deletion of a node ◮ E dit operations : substitution, deletion, insertion ◮ C omparison model: edit distance vs alignment

  3. How to compare trees: edit operations S ubstitution D eletion I nsertion

  4. How to compare trees: comparison model ◮ E dit Distance [Tai 1979, Zhang-Shasha 1989, Klein 1998, Dulucq &Touzet 2003] ◮ all mappings are valid ◮ largest common subtree a a a b f e c c d e c d d e ◮ A lignment [Jiang et al. 1995] ◮ insertions should precede deletions ◮ smallest common supertree a a a b f b f e c c d d e c d d e

  5. Previous results Tree Tree Strings distance alignment O ( n 4 ) O ( n 2 d 2 ) full O ( n 2 ) Zhang-Shasha mapping Jiang et al. O ( n 3 log( n )) Klein O ( n log( n ) d 3 k 2 ) k -errors O ( kn ) Jansson-Lingas : size of the tree n d : maximal degree of the tree : bound on the number of errors - known in advance k

  6. Previous results Tree Tree Strings distance alignment O ( n 4 ) O ( n 2 d 2 ) full O ( n 2 ) Zhang-Shasha mapping Jiang et al. O ( n 3 log( n )) Klein O ( n log( n ) d 3 k 2 ) O ( k 3 n ) k -errors O ( kn ) Jansson-Lingas : size of the tree n d : maximal degree of the tree : bound on the number of errors - known in advance k

  7. Edit graph for the string alignment problem ◮ T wo-dimensional grid ◮ T hree kinds of arcs: deletion, insertion and substitution C A T G G A C A T G G A - C | | | | | T C - T G G A C G Time complexity: O ( n 2 ) G A C

  8. Edit graph for the string alignment problem ◮ T wo-dimensional grid ◮ T hree kinds of arcs: deletion, insertion and substitution C A T G G A C A T G G A - C | | | | | T C - T G G A C G Time complexity: O ( n 2 ) G A With k -errors : O ( kn ) C

  9. Tree edit graph ◮ T rees as strings : enumerate the nodes in postorder traversal ◮ S upplementary constraints imposed by the tree structure 1 2 3 4 5 6 6 6 1 5 1 5 2 1 4 4 3 3 3 2 2 4 5 6

  10. Tree edit graph ◮ T rees as strings : enumerate the nodes in postorder traversal ◮ S upplementary constraints imposed by the tree structure 1 2 3 4 5 6 6 6 1 5 1 5 2 1 4 4 3 3 3 2 2 4 5 Legal path 6

  11. Tree edit graph ◮ T rees as strings : enumerate the nodes in postorder traversal ◮ S upplementary constraints imposed by the tree structure 1 2 3 4 5 6 6 6 1 5 1 5 2 1 4 4 3 3 3 2 2 4 5 Illegal path 6

  12. Tree edit graph ◮ T rees as strings : enumerate the nodes in postorder traversal ◮ S upplementary constraints imposed by the tree structure 1 2 3 4 5 6 6 6 1 5 1 5 2 1 4 4 3 3 3 2 2 4 5 6

  13. Tree edit graph ◮ T rees as strings : enumerate the nodes in postorder traversal ◮ S upplementary constraints imposed by the tree structure 1 2 3 4 5 6 6 6 1 5 1 5 2 1 4 4 3 3 3 2 2 4 5 6

  14. Edit graph for trees ◮ D eletion arcs (horizontal arcs): ( x , y ) � ( x − 1 , y ) labeled by del ◮ I nsertion arcs (vertical arcs): ( x , y ) � ( x , y − 1) labeled by ins ◮ S ubstitution arcs : ( x , y ) � ( x − size ( x ) , y − size ( y )) labeled by the distance between A ( x ) and B ( y ) ◮ S ize of the graph : O ( mn )

  15. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  16. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  17. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  18. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  19. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  20. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6

  21. 1 2 3 4 5 6 6 5 1 1 4 3 2 2 3 6 4 1 5 4 5 3 2 6 and so on . . .

  22. Usage of the tree edit graph How to compute the valuations of the arcs ? ◮ T he label of the substitution arc starting from ( x , y ) is the weight of an optimal path in the subgraph delimited by A ( x ) × B ( y ) Time complexity : O ( n 4 ) Space complexity : O ( n 2 ) How to recover the mapping from the tree edit graph ? Multi-level tracing back : ◮ C onstruction of an optimal path for A × B ◮ I teration for subgraphs induced by matching pairs of nodes Time complexity : O ( n 3 ) Space complexity : O ( n 2 )

  23. ◮ O ptimal paths for td ( x , y ) h = x − size ( x ) , l = y − size ( y ) fd ( h , l , h , l ) = 0 fd ( i , l , h , l ) = fd ( i − 1 , l , h , l ) + del fd ( h , j , h , l ) = fd ( h , j − 1 , h , l ) + ins 8 fd ( i − 1 , j , h , l ) + del < fd ( i , j , h , l ) = min fd ( i , j − 1 , h , l ) + ins fd ( i − size ( i ) , j − size ( j ) , h , l ) + td ( i , j ) : ◮ F or the subtrees if fd ( x − 1 , y − 1 , h , l ) + sub ( x , y ) < min { fd ( x − 1 , y , h , l ) + del , fd ( x , y − 1 , h , l ) + ins } then td ( x , y ) ← fd ( x − 1 , y − 1 , h , l ) + sub ( x , y ) else td ( x , y ) ← + ∞ ◮ T his is Zhang&Shasha algorithm ◮ K lein and Dulucq&Touzet algorithms build the same edit graph, but they use alternative strategies to compute the valuations of the arcs.

  24. Edit distance with k errors ◮ E rror : insertion of a node, deletion of a node ◮ P roblem : Input : two ordered trees, a natural number k Output : the best mapping containing less than k errors, (if it exists) ◮ M ethod : pruning the tree edit graph

  25. Edit distance with k errors Idea 1 : the best mappings have their path near the main diagonal 1 2 3 4 5 6 1 2 3 4 5 6

  26. Edit distance with k errors Idea 1 : the best mappings have their path near the main diagonal 1 2 3 4 5 6 k -strip= { ( x , y ); | x − y | ≤ k } 1 2 3 4 5 6

  27. Edit distance with k errors Idea 1 : the best mappings have their path near the main diagonal 1 2 3 4 5 6 k -strip= { ( x , y ); | x − y | ≤ k } 1 Size of the graph : O ( nk ) 2 Computation time for each 3 node: O ( size ( A , x ) k ) 4 5 O ( k 2 � size ( A , x )) 6

  28. Edit distance with k errors Idea 2 : when inspecting the subtree rooted at x , there is no need to visit the nodes of depth > k + 1 1 2 3 4 5 6 1 6 2 5 1 4 3 3 4 2 5 6

  29. Edit distance with k errors Idea 2 : when inspecting the subtree rooted at x , there is no need to visit the nodes of depth > k + 1 1 2 3 4 5 6 1 6 2 5 1 4 3 3 4 2 5 6

  30. Edit distance with k errors Idea 2 : when inspecting the subtree rooted at x , there is no need to visit the nodes of depth > k + 1 1 2 3 4 5 6 A ( x , k ) = { i ∈ A ( x ); 1 depth ( i ) − depth ( x ) ≤ k + 1 } 2 O ( nk ) couples de sous-arbres 3 O ( size ( A , x , k ) k ) pour chaque 4 couple 5 k 2 � size ( A , x , k ) 6

  31. Edit distance with k errors Idea 2 : when inspecting the subtree rooted at x , there is no need to visit the nodes of depth > k + 1 1 2 3 4 5 6 A ( x , k ) = { i ∈ A ( x ); 1 depth ( i ) − depth ( x ) ≤ k + 1 } 2 Size of the graph: O ( nk ) 3 Computation time for each 4 node: O ( size ( A , x , k ) k ) 5 O( k 2 � size ( A , x , k )) = O ( k 3 n ) 6

  32. ◮ T ree edit graph for k errors : O ( k 3 n ) Input: two trees A and B , positive integer k Output: tree edit graph for ( x , y ) ∈ k-strip ( A , B ) do O ( k 2 � size ( A , x , k )) = O ( k 3 n ) if not k -relevant( x , y ) then td ( x , y ) ← + ∞ else for i ∈ A ( x , k ) do O ( k size ( A , x , k )) for j ∈ B such that ( i , j ) ∈ k-strip ( A , B ) do O ( k ) compute fd ( i , j ) O (1) end do end do compute td ( x , y ) O (1) end if end do ◮ R ecovering the optimal mapping : O ( k 3 n )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend