approximate search of regular expressions using bit
play

Approximate Search of Regular Expressions Using Bit-Parallel - PowerPoint PPT Presentation

Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapevad Ruges, 2007 Contents Regular expression (RE) syntax Glushkovs automaton Existing bit-parallel algorithms Exact


  1. Approximate Search of Regular Expressions Using Bit-Parallel Algorithms Kristo Tammeoja Jaak Vilo Teooriapäevad Rõuges, 2007

  2. Contents � Regular expression (RE) syntax � Glushkov’s automaton � Existing bit-parallel algorithms � Exact matching � Approximate matching � New feature added � Error-free regions 2

  3. Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) 3

  4. Regular expression � Syntax � (, ) � | � Quantifier � *, +, ?, {m,n}, {m,} � Character classes (example [a-z]) � Matching as used in presentation � Regular expression A* � AAAAA match � BAAAC no match 4

  5. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* 1:R(E|G)<EX>* 5

  6. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R E R G R E E X 1:R(E|G)<EX>* R G E X R E E X E X 6

  7. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E subst. R G R G del. R E E X R E X E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 7

  8. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X R E E X E X R E E E X E X ins. R E E R X E X 8

  9. Regular expression 1 error allowed R(E|G)(EX) * 1:R(E|G)(EX)* 1:R<E|G>(EX)* R R R E no match subst. R G R no match G del. R E E X R E X match E 1:R(E|G)<EX>* R G E X R E G E X match R E E X E X R E E E X E X ins. match R E E R X E X no match 9

  10. Glushkov’s automaton R ( E | G ) ( E X ) * 10

  11. Glushkov’s automaton � Character in RE = state in automaton R ( E | G ) ( E X ) * R E G E X 11

  12. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE R ( E | G ) ( E X ) * R E G E X 12

  13. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R E G E X R... 13

  14. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R... 14

  15. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * R R E G E X R ... 15

  16. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X R E... R G... G 16

  17. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E R R E G E X RE... G 17

  18. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R E E... G 18

  19. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X R G E... E G 19

  20. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R R E G E X RG E X... E X G 20

  21. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X RGE X E... E X G 21

  22. Glushkov’s automaton � Character in RE = state in automaton + one state for the beginning of the RE � Transitions show which characters/positions can precede each other R ( E | G ) ( E X ) * E E R E R E G E X E X G 22

  23. Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 23

  24. Glushkov’s automaton � All labels entering a node are labeled by the same character R ( E | G ) ( E X ) * E E R E R E G E X E X G 24

  25. Glushkov’s automaton � All labels entering a node are labeled by the same character for example after reading character ‘E’ only states with label ‘E’ can be active E E R E R E G E X E X G 25

  26. Exact search � Simulation of NFA = changing active states based on the character read from the text � We use bit-vectors (one bit for each state) to hold active states δ (D, a) � D – bit-vector of active states � a – character read � Returns new bit-vector � 2 |D| · | Σ | different sets of parameters � |D| – number of states in automaton � | Σ | - alphabet's size 26

  27. Exact search � “ After reading character ‘E’ only states with label ‘E’ can be active ” so ... � δ (D, a) = T[D] & B[a] � T[ D ] – states that can be reached from states in D by any character � B[ a ] – states that can be reached by character a 27

  28. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0100000 ‘C’ ... 0101010 ... 28

  29. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ ... 0101010 ... 29

  30. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 30

  31. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 ‘C’ 0000001 ... 0101010 ... 31

  32. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 ... 32

  33. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 33

  34. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] ‘A’ 0111010 1000000 0101010 ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0101010 0010101 ... 34

  35. Exact search � δ (D, a) = T[D] & B[a] A B C AA|AB|AC A A A B A C A A A δ (0101010, ‘A’) a B[a] D T[D] 0010101 T[D] ‘A’ 0111010 1000000 0101010 & 0111010 B[a] ‘B’ 0000100 0100000 0010000 ‘C’ 0000001 ... 0010000 0101010 0010101 ... 35

  36. Exact search D ← 100..00 // initial state active F ← bit-vector of final states For pos ∈ 1 ... n Do // scanning text D ← T[D] & B[t pos ] If D & F ≠ 000..00 Then match End of For 36

  37. Approximate search Errors � Insertion � Deletion � Substitution 37

  38. Approximate search � When searching with k errors we make k+1 replicas of the automaton, one for each error-level � Plus we need transitions for errors R E G E X No errors R E G E X ? ? ? ? ? R E G E X Up to 1 error R E G E X 38

  39. Approximate search � R 0 , R 1 – current bit-vectors � R 0 ’, R 1 ’ – bit-vectors after processing character a R 0 ’ = T[R 0 ] & B[c] R 1 ’ = ? 39

  40. Approximate search R 1 ’ = T[R 1 ] & B[c] | ... no errors � Same as in exact search E GEX R E G E X No errors R E G E X R E G E X Up to 1 error R E G E X 40

  41. Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | ... no errors del � Active states remain the same R A EGEX R E G E X No errors R E G E X Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 41

  42. Approximate search R 1 ’ = T[R 1 ] & B[c] | R 0 | T[R 0 ’] | ... no errors del ins � Insert new character after the current one � Just one step in automaton R E EX R E G E X No errors R E G E X ε ε ε ε ε Σ Σ Σ Σ Σ Σ R E G E X Up to 1 error R E G E X 42

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend