A U T O M ATA A N D F O R M A L L A N G U A G E S , # C O U R S E [ 1 5 1 0 3 ] R E G U L A R L A N G U A G E S , E X P R E S S I O N S A N D A P P L I C AT I O N S D R . V A D I M Z AY T S E V A . K . A . @ G R A M M A R W A R E
R O A D M A P • Chomsky hierarchy revisited • How to see if the language is regular? • The class of regular languages • Tools to work with regular languages • Advanced methods source is given at the bottom of each slide
C H O M S K Y H I E R A R C H Y Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY.
C H O M S K Y H I E R A R C H Y l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e Noam Chomsky. On Certain Formal Properties of Grammars , Information & Control 2(2):137–167, 1959.
C H O M S K Y : A U T O M ATA Tu r i n g m a c h i n e l i n e a r b o u n d e d a u t o m a t o n l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e f i n i t e s t a t e p u s h d o w n a u t o m a t o n a u t o m a t o n (too many to list)
C H O M S K Y : T O O L S i m a g i n a r y c o m p u t e r l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e g r a m m a r w a re re g e x p
C H O M S K Y : R E W R I T I N G α → β α X β → α γ β l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e X → a X → γ X → a B Axel Thue. Probleme über Veränderungen von Zeichenreihen nach gegebenen Regeln , 1914. http://arxiv.org/abs/1308.5858
R E G E X P S R E V I S I T E D • Regular sets by Stephen Kleene in 1956 • ∅ , ε , letters from Σ • concatenation • iteration • alternation • Precisely fit the regular class S. C. Kleene, Representation of Events in Nerve Nets and Finite Automata . In Automata Studies , pp. 3–42, 1956. photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.
D E T E R M I N I S T I C F I N I T E A U T O M AT O N C. E. Shannon, W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, 1949. (finite state grammars and finite diagrams and finite state Markov processes)
T O W H I C H C L A S S D O L A N G U A G E S B E L O N G ? • ∅ F I N I T E • { ε } F I N I T E • { ε } in a non-empty alphabet F I N I T E • {x, y, z} F I N I T E a l l • {0 ⁿ | n > 1} R E G U L A R re c u r s i v e l y e n u m e ra b l e • decimal numbers R E G U L A R c o n t e x t - s e n s i t i v e • {0 ⁿ 1 ⁿ | n > 1} C - F R E E c o n t e x t - f re e ² • {0 ⁿ 1 ⁿ | n > 1} re g u l a r C - F R E E f i n i t e • {0 ⁿ 1 ⁿ 2 ⁿ | n > 1} C - S E N S I T I V E interactive
I S A TA S K S O LVA B L E B Y R E G U L A R M E A N S ? • Substring search ✓ • grep, contains(), find(), substring(), … • Substring replacement ✓ • sed, awk, perl, vim, replace(), replaceAll(), … • Pretty-printing ✗ • VS.NET, Sublime, TextMate, … interactive
I S A TA S K S O LVA B L E B Y R E G U L A R M E A N S ? • Counting [non-empty] lines in a file ✓ • wc -l, grep -c “” • grep -v “^$”, sed -n /./p | wc -l, … • Parsing HTML ✗ • <BODY><TABLE><P><A HREF= … • Parsing a postcode ✓ • 1098 XG, … interactive
H OW TO P R OV E W H I C H C L A S S A L A N G UAG E B E L O N G S TO
a l l P U M P I N G L E M M A re c u r s i v e l y e n u m e ra b l e F O R R E G U L A R L A N G U A G E S c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r • In simple terms f i n i t e • sufficiently long words have repeatable parts • (works for all infinite regular languages) • L is regular ⇒ formula holds • Formula does not hold ⇒ L is finite or not regular Jos C.M. Baeten, Models of Computation: Automata, Formal Languages and Communicating Processes , §2.9, p.58.
J O H N M Y H I L L A N D A N I L N E R O D E Cornell, Faculty and Senior Researcher Profiles. Who's That Mathematician? Paul R. Halmos Collection - Page 36.
M Y H I L L – N E R O D E T H E O R E M • Myhill-Nerode equivalence • u~v ⟺ ∀ w: (uw ∈ L ∧ vw ∈ L) ∨ (uw ∉ L ∧ vw ∉ L) • Theorem: L is regular iff the number of Myhill-Nerode equivalence classes is finite. • In simple terms • few groups of forgettable prefixes • Works both ways Anil Nerode, Linear Automaton Transformations , Proceedings of the AMS 9, 1958.
L I M I T E D M E M O R Y • Advice from teh internetz: • how many characters must you remember from the stream? • bounded ⇒ regular • unbounded ⇒ ? c o r re c t ! m e m o r y i s l i m i t e d , • Correct or not? a l p h a b e t i s l i m i t e d ⇒ p re f i x e s a re l i m i t e d Brian M. Scott, http://math.stackexchange.com/questions/282216/determine-if-a-language-is-regular-from-the-first-sight
N U M B E R O F C O U N T E R S • {0 ⁱ 1 ⁿ … } • no relation between i and n ⇒ regular • 1 counter ⇒ context-free • n counters ⇒ context-sensitive • ∞ counters ⇒ recursively enumerable Himanshu Saikia, http://math.stackexchange.com/questions/282216/determine-if-a-language-is-regular-from-the-first-sight
D I S A S S E M B L E / M A S S A G E • {0 ⁿ 1 ⁿ | n > 1} • {0 ⁱ 1 ⁿ | n > 1, i > 1, i ≠ n} • matching brackets language not regular • ⇒ no matching pairs language is regular • Many combinations of regular languages are regular • Proving by decomposition is valid
T H E C L A S S O F R E G U L A R L A N G UAG E S
C L A S S C L O S E D U N D E R C O M P L E M E N T • If A is a regular language, then • Ā is regular • Meaning … • grep -v “123” file.txt • (Must know the alphabet Σ ) • (Actually stronger: any finite number of errors) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4. E. Stearns, J. Hartmanis, Regularity Preserving Modifications of Regular Expressions , Information & Control 6:55–69, 1963.
C L A S S C L O S E D U N D E R S E T U N I O N • If A and B are regular languages, then • A ⋃ B is regular • Meaning … • [a-z] • x | y | z (in some notations) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
C L A S S C L O S E D U N D E R I N T E R S E C T I O N • If A and B are regular languages, then • A ⋂ B is regular • Meaning … • cat file.txt | grep “abc” | grep “xyz” • (Not true for context-free languages!) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
C L A S S C L O S E D U N D E R D I F F E R E N C E • If A and B are regular languages, then • A ∖ B is regular • Meaning … • cat file.txt | grep “abc” | grep -v “123” • (Not true for context-free languages!) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
C L A S S C L O S E D U N D E R I T E R AT I O N • If A is a regular language, then • A* and A ⁺ are regular • Meaning … • [a]* • [a] ⁺ J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
C L A S S C L O S E D U N D E R C O N C AT E N AT I O N • If A and B are regular languages, then • AB is regular • Meaning … • [Bb][Oo][Dd][Yy] • (Just glue regexps; in practice, watch out for subgroups) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
C L A S S [ S O M E T I M E S ] C L O S E D U N D E R D E C O M P O S I T I O N • If A is a regular language, then • “front halves” is regular • “tail halves” is regular • “middle thirds” is regular • “arbitrary halves/thirds” is regular • NB: glued side thirds is NOT regular E. Stearns, J. Hartmanis, Regularity Preserving Modifications of Regular Expressions , Information & Control 6:55–69, 1963.
C L A S S C L O S E D U N D E R H O M O M O R P H I S M • If A is a regular language and • h : Σ → Σ * • then • h(A) is regular • Meaning that debugging is feasible • (Even better for context-free languages: substitutions) J. E. Hopcroft, R. Motwani, J. D. Ullman, Introduction to Automata Theory, Languages and Computation , Chapter 4.
R E G E X P S YO U N E E D T O D E B U G var whitelist = @"</?p>|<br\s?/?>|</?b>|</?strong>|</?i>|</?em>| </?s>|</?strike>|</?blockquote>|</?sub>|</?super>| </?h(1|2|3)>|</?pre>|<hr\s?/?>|</?code>|</?ul>| </?ol>|</?li>|</a>|<a[^>]+>|<img[^>]+/?>"; Jeff Atwood, If You Like Regular Expressions So Much, Why Don't You Marry Them? , 22 Mar 2005. Jeff Atwood, Regular Expressions: Now You Have Two Problems, 27 Jun 2008.
Recommend
More recommend