CS 61A/CS 98-52
Mehrdad Niknami
University of California, Berkeley
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 23
CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley - - PowerPoint PPT Presentation
CS 61A/CS 98-52 Mehrdad Niknami University of California, Berkeley Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 23 Motivation How would you find a substring inside a string? Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 1 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 2 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 3 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 4 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 5 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 6 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 7 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 8 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 9 / 23
1If you’ve seen backreferences: those are not technically valid in regexes. Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 23
1If you’ve seen backreferences: those are not technically valid in regexes. Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 23
1If you’ve seen backreferences: those are not technically valid in regexes. Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 23
1If you’ve seen backreferences: those are not technically valid in regexes. Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 10 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 11 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure 2 Use the regex to parse the actual text (corpus) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure 2 Use the regex to parse the actual text (corpus)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure 2 Use the regex to parse the actual text (corpus)
1 Step 1 is theoretically harder, but practically easier.
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure 2 Use the regex to parse the actual text (corpus)
1 Step 1 is theoretically harder, but practically easier.
2 Step 2 is theoretically easier, but practically harder. Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
1 Parse the regex (pattern) to “understand” its structure 2 Use the regex to parse the actual text (corpus)
1 Step 1 is theoretically harder, but practically easier.
2 Step 2 is theoretically easier, but practically harder.
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 12 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 13 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
2Note that an FA is not quite the same thing as a finite-state machine (FSM). Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 14 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 15 / 23
3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 2 Feed corpus to FA 3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 2 Feed corpus to FA in linear time! 3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 2 Feed corpus to FA in linear time! 3 ... 3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 2 Feed corpus to FA in linear time! 3 ... 4 Profit! 3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
1 Convert regex pattern to FA 2 Feed corpus to FA in linear time! 3 ... 4 Profit!
3Pumping lemma: A long-enough input must contain a repeatable substring. (Why?) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 16 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3) Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
1 s0 = •(a|b)*(1+2|3)
2 s1 = (a|b)*(•1+•2|3) 3 s2 = (a|b)*(1+2•|3)
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 17 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 18 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 19 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 20 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 21 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 22 / 23
Mehrdad Niknami (UC Berkeley) CS 61A/CS 98-52 23 / 23