perl regular expressions
play

Perl: Regular expressions A powerful tool for searching and - PowerPoint PPT Presentation

Perl: Regular expressions A powerful tool for searching and transform ing text. SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 1 Science M otivation while (my $line =


  1. Perl: Regular expressions A powerful tool for searching and transform ing text. SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 1 Science

  2. M otivation while (my $line = <STDIN>) { We have seen many • chomp $line; operations involving if ($line eq “BEGIN:VSTART”) { # ... string comparisons } • Several Perl built-in } functions also help with # ... operations on strings my ($property, $value) = split /:/, $foo; if ($property eq “DSTART) { – split & join # ... etc etc etc – substr } – length @csv_fields = split /,/, $input_line; There is a lot we can do • $output = join “:”, @data; with such functions $first_char = substr $input, 0, 1; Example: • $width = length $heading; – Given a string holding print $heading, “\n: some timestamp, print “-” x $width; extract out different parts of date & time SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 2 Science

  3. M otivation my $datetime = “20051225T053000”; Recall: • – iCalendar dates are used $year = substr $datetime, 0, 4; by iCal-like programs $month = substr $datetime, 4, 2; – The year, month, etc. $day = substr $datetime, 6, 2; $hour = substr $datetime, 9, 2; portions of the code are $min = substr $datetime, 11, 2; fixed in position $sec = substr $datetime, 13, 2; How could we use “ substr” • to help us? # ISO 8601 time format ” h t • This code certainly obtains my $datetime = “i2003-10-31T13:37:14-0500”; l a e h what we need. r u o y $year = substr $datetime, 1, 5; – But it can be a bit tricky o t $month = substr $datetime, 7, 8; s u to get right. o d r a z a – Adapting code to use # coffee break H “ another date/time format # ... $day = substr $datetime, 9, 2; is not trivial… $hour = substr $datetime, 12, 2; – … and is bugbait! $min = substr $datetime, 14, 2; $sec = substr $datetime, 16, 2; SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 3 Science

  4. M otivation my $datetime = “20051225T053000”; A better method is to • indicate the string’s pattern my ($year, $month, $day, in a way the reflects the $hour, $minute, $second) actual order of pattern = $datetime =~ m{ \A # start of string components (\d{4}) # year – The date begins at the (\d{2}) # month start of the string. (\d{2}) # day T # literal T – The year is four digits. (\d{2}) # hour – The month follows (two (\d{2}) # minute (\d{2}) # second digits)… \z # end of string – … and then the day. }xms; – The “ T” character separates the date and time if ($datetime =~ – Hour, minute and date /^(\d{4})(\d{2})(\d{2})T(\d{2})(\d{2})(\d{2})$/) { follow, each two digits ($year, $month, $day, $hour, $min, $sec) long. = ($1, $2, $3, $4, $5, $6); } For the elder Perlmongers: • SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 4 Science

  5. M otivation ISO 8601 time format Back to our “ code • my $datetime = “i2003-10-31T13:37:14-0500 ”; modification” example – Now we have a different my ($year, $month, $day, $hour, $minute, $second) date format = $ical_date – Using a regular =~ m{ \A # start of string i # literal i expression, we can (\d{4}) # year greatly reduce the - # literal dash possibility of bugs (\d{2}) # month - # literal dash – String begins with an (\d{2}) # day T # literal T “ i” … (\d{2}) # hour – followed by year… : # literal colon (\d{2}) # minute – followed by a dash… : # literal colon (\d{2}) # second – followed by month… .+ # ignore remainder \z # end of string – etc… }xms; SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 5 Science

  6. Topics • Our coverage of regex syntax will Simple matching • be much more slowly paced that • Metacharacters the “ motivation” just shown! – Previous slides have been Anchored search • shown to give you a “ flavour” Character classes • of what regular expressions can achieve. • Range operators in We will learn how to – character classes construct such expression over the next few lectures. Matching any character • • We have a range of topics • Grouping Regular expressions can seem • Extracting Matches • complex and cryptic – However, slow and patient Search and Replace • work with such expressions will improve your productivity. SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 6 Science

  7. Perl Regular Expressions • Perl is renowned for its excellence at text % ls *.c processing. • Handling of regular % ps aux | grep “s265s*” | less expressions plays a big factor in its fame. • Mastering even the basics Java: will allow you to manipulate import java.util.regex.*; text with ease. Python: Regular expressions have a • import re; strong formalism (FSA). • You have already used C#: some and seen others. using System.Text.RegularExpressions; • Other languages have some support for regexes, usually via some library. SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 7 Science

  8. Simple String M atching • Regular expressions are usually used in my $line = <SOMEINPUT>; conjunction with an “ if” chomp $line; – “ if < string matches # Unbeknownst to programmer, the first line # of the input is the line “Hello, World”; this pattern> …” if ($line =~ m/World/xms) { – “ ... then > do print “Regexp matches!\n”; } something with that else { match> .” print “Oh, poop.\n”; } • The simplest such match if ($line eq “World”) { refers to a string print “line is equal to ‘World’\n”; } But note: this is much • else { print “line sure ain’t equal to ‘World’\n”; different that using “ eq” } SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 8 Science

  9. A word about “ m /yadayada/xm s” • The text between the two slashes is the regular expression (“ regex” ). • Leading “ m” indicates the regex is used for a match Trailing “ xms” are three regex options • – “ x” : Extended formatting (whitespace in regex is ignored) – “ m” : For line boundaries (and eliminates a cause of some subtle bugs) – “ s” : ensures everything is matched by the “ .” symbol Why all of this verbiage instead of plain old “ /yadayada/” as of • old? /’[^\\’]*(?:\\.[^\\’]*)*’/ • Also note: “ m{ } ” or “ m//” m{ ‘ # an opening single quote [^\\’]* # any non-special chars (?: # then all of.. \\ . # any explicitly backslashed char [^\\’]* # followed by any non-special chars )* # repeated zero of many times ‘ # a closing single quote }xms SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 9 Science

  10. Another exam ple • The code on the right #!/usr/bin/perl searches for a pattern in use strict; some dictionary file my $regexp = shift @ARGV; while (my $word = <>) { – Note that a command- if ($word =~ m/$regexp/xms) { print $word; line argument is being } used for a regex! } – Also note “ < > ” syntax: % ./search.pl pter /usr/share/dict/linux.words This takes the first abrupter Acalypterae unused command-line acanthopteran Acanthopteri argument, and uses it ... <snip> ... as a filename for unchapter unchaptered opening! underprompter ... <snip> ... Zygopteris zygopteron zygopterous % SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 10 Science

  11. M etacharacters { } [ ] ( ) • Regexs obtain their power ^ $ . by describing sets of | * ? strings. / \ Such descriptions involve • “2+2=4” =~ m/2+2/xms # doesn’t match the use of “ metacharacters” “2+2=4” =~ m/2\+2/xms # does match • Of course, some strings “The interval is [0,1).” =~ that we want to match will m/[0,1)./xms # syntax error contain these strings. “The interval is [0,1).” =~ m/\[0,1\)\./xms # does match – Therefore we must “ escape” them. “/usr/bin/perl” =~ m/\/usr\/bin/\/perl/xms # matches “/usr/bin/perl” =~ m{/usr/bin/perl}xms # better ‘C:\WINDOWS’ =~ m/C:\\WINDOWS/ # matches SENG 265: Software Developm ent University of Victoria M ethods Department of Computer Perl Regular Expression: Slide 11 Science

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend