today
play

TODAY Regular Expressions REs and NFAs NFA simulation NFA - PowerPoint PPT Presentation

BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING R EGULAR E XPRESSIONS Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick and K. Wayne of Princeton University. TODAY Regular Expressions


  1. 
 BBM 202 - ALGORITHMS D EPT . OF C OMPUTER E NGINEERING R EGULAR E XPRESSIONS Acknowledgement: The course slides are adapted from the slides prepared by R. Sedgewick 
 and K. Wayne of Princeton University.

  2. TODAY ‣ Regular Expressions ‣ REs and NFAs ‣ NFA simulation ‣ NFA construction ‣ Applications

  3. 
 
 Pattern matching Substring search. Find a single string in text. Pattern matching. Find one of a specified set of strings in text. Ex. [genomics] • Fragile X syndrome is a common cause of mental retardation. • Human genome contains triplet repeats of CGG or AGG , 
 bracketed by GCG at the beginning and CTG at the end. • Number of repeats is variable, and correlated with syndrome. GCG(CGG|AGG)*CTG pattern GCGGCGTGTGTGCGAGAGAGTGGGTTTAAAGCTGGCGCGGAGGCGGCTGGCGCGGAGGCTG text 3

  4. Syntax highlighting input output /************************************************************************* * Compilation: javac NFA.java HTML Ada * Execution: java NFA regexp text XHTML Asm LATEX * Dependencies: Stack.java Bag.java Digraph.java DirectedDFS.java Applescript * MediaWiki Awk ODF * % java NFA "(A*B|AC)D" AAAABD Bat TEXINFO * true Bib ANSI * Bison DocBook * % java NFA "(A*B|AC)D" AAAAC C/C++ * false C# * Cobol *************************************************************************/ Caml Changelog public class NFA { Css private Digraph G; // digraph of epsilon transitions D private String regexp; // regular expression Erlang private int M; // number of characters in regular expression Flex Fortran // Create the NFA for the given RE GLSL public NFA(String regexp) { Haskell this.regexp = regexp; Html M = regexp.length(); Java Stack<Integer> ops = new Stack<Integer>(); Javalog G = new Digraph(M+1); Javascript Latex Lisp GNU source-highlight 3.1.4 Lua ⋮ 4

  5. Google code search http://code.google.com/p/chromium/source/search 5

  6. 
 Pattern matching: applications Test if a string matches some pattern. • Process natural language. • Scan for virus signatures. • Specify a programming language. • Access information in digital libraries. • Search genome using PROSITE patterns. • Filter text (spam, NetNanny, Carnivore, malware). • Validate data-entry fields (dates, email, URL, credit card). 
 ... Parse text files. • Compile a Java program. • Crawl and index the Web. • Read in data stored in ad hoc input file format. • Create Java documentation from Javadoc comments. 
 ... 6

  7. Regular expressions A regular expression is a notation to specify a set of strings. a “language” operation order example RE matches does not match AABAAB AABAAB concatenation 3 every other string AA AA | BAAB or 4 every other string BAAB AA 
 AB AB*A closure 2 ABBBBBBBBA ABABA AAAAB 
 A(A|B)AAB every other string ABAAB parentheses 1 A 
 AA (AB)*A ABABABABABA ABBA 7

  8. 
 
 
 
 
 
 
 
 
 
 
 
 
 Regular expression shortcuts Additional operations are often added for convenience. operation example RE matches does not match CUMULUS SUCCUBUS .U.U.U. wildcard JUGULUM TUMULTUOUS word 
 camelCase 
 [A-Za-z][a-z]* character class Capitalized 4illegal ABCDE ADE A(BC)+DE at least 1 ABCBCDE BCDE 08540-1321 111111111 [0-9]{5}-[0-9]{4} exactly k 19072-5541 166-54-111 [^AEIOU]{6} RHYTHM DECADE complement Ex. [A-E]+ is shorthand for (A|B|C|D|E)(A|B|C|D|E)* 8

  9. 
 
 
 
 
 
 
 
 
 
 
 
 
 Regular expression examples RE notation is surprisingly expressive regular expression matches does not match .*SPB.* 
 RASPBERRY SUBSPACE CRISPBREAD SUBSPECIES ( substring search ) [0-9]{3}-[0-9]{2}-[0-9]{4} 166-11-4433 11-55555555 
 166-45-1111 8675309 ( Social Security numbers ) [a-z]+@([a-z]+\.)+(edu|com) wayne@princeton.edu 
 spam@nowhere rs@princeton.edu ( email addresses ) [$_A-Za-z][$_A-Za-z0-9]* ident3 3a PatternMatcher ident#3 ( Java identifiers ) REs plays a well-understood role in the theory of computation. 9

  10. Can the average web surfer learn to use REs? Google. Supports * for full word wildcard and | for union. 10

  11. Regular expressions to the rescue http://xkcd.com/208 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend