R E G U L A R L A N G U A G E S , E X P R E S S I O N S A N D A P P L I C AT I O N S
A U T O M ATA A N D F O R M A L L A N G U A G E S , # C O U R S E [ 1 5 1 0 3 ] D R . V A D I M Z AY T S E V A . K . A . @ G R A M M A R W A R E
R E G U L A R L A N G U A G E S , E X P R E S S I O N S A N D A P P L - - PowerPoint PPT Presentation
A U T O M ATA A N D F O R M A L L A N G U A G E S , # C O U R S E [ 1 5 1 0 3 ] R E G U L A R L A N G U A G E S , E X P R E S S I O N S A N D A P P L I C AT I O N S D R . V A D I M Z AY T S E V A . K . A . @ G R A M M A R W A R E R O A D M A P
A U T O M ATA A N D F O R M A L L A N G U A G E S , # C O U R S E [ 1 5 1 0 3 ] D R . V A D I M Z AY T S E V A . K . A . @ G R A M M A R W A R E
source is given at the bottom of each slide
Duncan Rawlinson, Chomsky.jpg, 2004, CC-BY.
l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
Noam Chomsky. On Certain Formal Properties of Grammars, Information & Control 2(2):137–167, 1959.
l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
Tu r i n g m a c h i n e p u s h d o w n a u t o m a t o n f i n i t e s t a t e a u t o m a t o n l i n e a r b o u n d e d a u t o m a t o n
(too many to list)
l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
i m a g i n a r y g r a m m a r w a re re g e x p c o m p u t e r
l a n g u a g e s re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
α → β X → γ X → a X → a B α X β → α γ β
Axel Thue. Probleme über Veränderungen von Zeichenreihen nach gegebenen Regeln, 1914. http://arxiv.org/abs/1308.5858
photo from: Konrad Jacobs, S. C. Kleene, 1978, MFO.
(finite state grammars and finite diagrams and finite state Markov processes)
interactive
a l l re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
²
R E G U L A R C - F R E E C - S E N S I T I V E F I N I T E F I N I T E F I N I T E F I N I T E R E G U L A R C - F R E E
interactive
✓ ✗ ✓
✓
interactive
✓ ✗
a l l re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
Jos C.M. Baeten, Models of Computation: Automata, Formal Languages and Communicating Processes, §2.9, p.58.
F O R R E G U L A R L A N G U A G E S
Cornell, Faculty and Senior Researcher Profiles. Who's That Mathematician? Paul R. Halmos Collection - Page 36.
Anil Nerode, Linear Automaton Transformations, Proceedings of the AMS 9, 1958.
Brian M. Scott, http://math.stackexchange.com/questions/282216/determine-if-a-language-is-regular-from-the-first-sight
m e m o r y i s l i m i t e d , a l p h a b e t i s l i m i t e d ⇒ p re f i x e s a re l i m i t e d
Himanshu Saikia, http://math.stackexchange.com/questions/282216/determine-if-a-language-is-regular-from-the-first-sight
var whitelist = @"</?p>|<br\s?/?>|</?b>|</?strong>|</?i>|</?em>| </?s>|</?strike>|</?blockquote>|</?sub>|</?super>| </?h(1|2|3)>|</?pre>|<hr\s?/?>|</?code>|</?ul>| </?ol>|</?li>|</a>|<a[^>]+>|<img[^>]+/?>";
Jeff Atwood, If You Like Regular Expressions So Much, Why Don't You Marry Them?, 22 Mar 2005. Jeff Atwood, Regular Expressions: Now You Have Two Problems, 27 Jun 2008.
(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r \n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\ [\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)? [ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?: (?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r \n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;: \\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)? [ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)? [ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)* \](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\ [\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*\>(?:(?:\r \n)?[ \t])*)|(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)? [ \t]))*"(?:(?:\r\n)?[ \t])*)*:(?:(?:\r\n)?[ \t])*(?:(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\ [\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)? [ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)? [ \t])*)*\<(?:(?:\r\n)?[ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)* \](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\ [\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\". \[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\ ["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)? [ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\ [\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r \n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*) (?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r \n)?[ \t])*))*\>(?:(?:\r\n)?[ \t])*)(?:,\s*(?:(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^ \"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(? =[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*|(?:[^()<>@,;:\\".\[\] \000- \031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)*\<(?:(?:\r\n)? [ \t])*(?:@(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*) (?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r \n)?[ \t])*))*(?:,@(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\ \.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\ [([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*))*)*:(?:(?:\r\n)?[ \t])*)?(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\ ["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r\n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+ (?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|"(?:[^\"\r\\]|\\.|(?:(?:\r\n)?[ \t]))*"(?:(?:\r\n)?[ \t])*))*@(?:(?:\r\n)?[ \t])*(?: [^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)?[ \t])*)(?:\.(?:(?:\r \n)?[ \t])*(?:[^()<>@,;:\\".\[\] \000-\031]+(?:(?:(?:\r\n)?[ \t])+|\Z|(?=[\["()<>@,;:\\".\[\]]))|\[([^\[\]\r\\]|\\.)*\](?:(?:\r\n)? [ \t])*))*\>(?:(?:\r\n)?[ \t])*))*)?;\s*)
Jeff Atwood, Regex use vs. Regex abuse, 16 Feb 2005. RFC822. Paul Warren, Mail::RFC822::Address: regexp-based address validation, 17/09/2012.
(who the hell uses [[:digit:]] anyway?)
photo from: Archetypal hackers ken (left) and dmr (right).
Lee E. McMahon, sed, Stream EDitor, 1973 or 1974, http://www.columbia.edu/~rh120/ch106.x09 photo from: http://merdivengo.blogspot.com/2012/03/turnuva-sistemleri-uzerine.html
. J. Weinberger, AWK — A Pattern Scanning and Processing Language. SPE, 9(4): 267-279, 1979.
https://en.wikipedia.org/wiki/AWK
int lineno=1;
[a-zA-Z] digit [0-9] id {letter}({letter}|{digit})* number {digit}+ %% printf("\nTokeniser running -- ^D to exit\n");
{line();printf("<id>");} {id} printf("<id>"); ^{number} {line();printf("<number>");} {number} printf("<number>"); ^[ \t]+ line(); [ \t]+ printf(" "); [\n] ECHO; ^[^a-zA-Z0-9 \t\n]+ {line();printf("\\%s\\",yytext);} [^a-zA-Z0-9 \t\n]+ printf("\\%s\\",yytext); %% line() { printf("%4d: ",lineno++); }
c h a ra c t e r- l e v e l g ra m m a r
::= ::= ::= ::=
Tutorialspoint, PERL Regular Expressions, http://www.tutorialspoint.com/perl/perl_regular_expression.htm
http://www.pcre.org
http://rascal-mpl.org/
a l l re c u r s i v e l y e n u m e ra b l e c o n t e x t - s e n s i t i v e c o n t e x t - f re e re g u l a r f i n i t e
Understand Your Data and Be More Productive, O’Reilly, 2006.
grammarware.github.com, …
2010), VU (2004–2008), UTwente (2002–2004), Rostov State Transport University (1999–2008), Rostov State University (1998–2003), Desk.nl (1999, 2001)