Day 9: Regular Expressions Suggested reading: Learning Perl (6th Ed.) - - PowerPoint PPT Presentation

day 9 regular expressions
SMART_READER_LITE
LIVE PREVIEW

Day 9: Regular Expressions Suggested reading: Learning Perl (6th Ed.) - - PowerPoint PPT Presentation

Computer Sciences 368 Introduction to Perl Day 9: Regular Expressions Suggested reading: Learning Perl (6th Ed.) Chapter 7: In the World of Regular Expressions Chapter 8: Matching with Regular Expressions 2012 Summer Cartwright 1 Computer


slide-1
SLIDE 1

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Day 9: Regular Expressions

Suggested reading: Learning Perl (6th Ed.)

Chapter 7: In the World of Regular Expressions Chapter 8: Matching with Regular Expressions

1

slide-2
SLIDE 2

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Homework Review

2

slide-3
SLIDE 3

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Patterns

3

slide-4
SLIDE 4

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Can You Identify a Phone Number?

Tim's office 24002 608-262-4002 (608) 262-4002 608/262 4002 6 \0/ 8-2-6-2-4 \0/ (02) +1 (608) 262 4002 6082624002 6,082,624,002 000-000-0000 193-241-8827

4

slide-5
SLIDE 5

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Some Other (Possible) Patterns

  • Telephone numbers (NANP)
  • Dates (e.g., 22 July 2011, 2011-07-22)
  • Image filenames (e.g., cs-logo.png)
  • Hostnames
  • Email addresses (VERY hard)
  • Specific data records
  • Specific lines from a log file

5

slide-6
SLIDE 6

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Regular Expressions

6

slide-7
SLIDE 7

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

A regular expression is a formal description

  • f a pattern

that partitions all strings into matching / non-matching

7

slide-8
SLIDE 8

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Matching Patterns

8

#!/usr/bin/perl use strict; use warnings; print 'Enter reg. expression (no delimiters): '; chomp(my $re_string = <STDIN>); my $re = qr/$re_string/;

  • pen(INPUT, '<', $ARGV[0])
  • r die "Could not open file: $!\n";

while (<INPUT>) { print if /$re/; } close INPUT;

slide-9
SLIDE 9

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Matching Basics

9

slide-10
SLIDE 10

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Metacharacters I

10

Most characters match self (letters, digits, !, @, …)

/cat/

cat, a cat, catalog, scatter, tomcat

/cat/

empty string, a, at, act, cart, Cat

^ matches start of line

/^cat/

cat, catalog, cathedral, cat's meow

/^cat/

^cat, a cat, scatter, tomcat, ␣cat

$ matches end of line

/cat$/

cat, bobcat, scat, tomcat, nice cat

/cat$/

cat$, cats, scatter, cat␣

/^cat$/

cat

/^cat$/

does not match anything else

slide-11
SLIDE 11

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Metacharacters II

11

. matches any single character

/d.g/

dog, dig, d.g, adage, mid-game, add2go

/d.g/

Dog, drag, edge, add-2-go

\ makes following metacharacter “normal”

/1\.0/

1.0, 131.0.73.12, $21.03

/1\.0/

1\.0, 120, 1e0, 10.1

/2\^8/

2^8

/2\^8/

2\^8, 2\8

/C:\\/

C:\Documents, file:///C:\Documents, C:\\

/C:\\/

c:\..., C:foo

slide-12
SLIDE 12

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Counting Modifiers I

12

* match 0–n times (aka “maybe some …”)

/an*y/

any, canyon, botany, granny, days, play

/an*y/

an*y, a, n, y, an, andy, an-y

+ match 1–n times (aka “some …”)

/an+y/

any, canyon, botany, granny, tannyl

/an+y/

an+y, days, play, Any, a+y

? match 0–1 times (aka “maybe a …”)

/an?y/

any, canyon, botany, days, play

/an?y/

an?y, a, n, y, an, andy, ann, granny

slide-13
SLIDE 13

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Counting Modifiers II

13

.* and .+ give you superpowers

/a.*z/

azimuth, dazzle, waltz, abuzz, a.*z

/a.*z/

a, z, apples, buzz, Azimuth

{n,m} match n–m times; also: {n} {n,} {,m}

/^a.{3,6}e$/

above, ashore, achieve, airframe

/^a.{3,6}e$/

ae, ate, able, manager

/a.+z/

dazzle, waltz, abuzz, a.*z

/a.+z/

a, z, azimuth, apples, buzz, Abuzz

slide-14
SLIDE 14

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Character Classes I

14

[…] matches one of enclosed chars (use - for range)

/q[aeio]/

Iraqi, qanat, qintar

/q[aeio]/

q[aeio], q, queue, question, q?

[^…] matches one of anything but enclosed chars

/:[0-5][0-9]/

1:00, 11:50 a.m., 12:59, page:08

/:[0-5][0-9]/

1:60, 2:3 ratio, 256, 42, :

/q[^u]/

Iraqi, qanat, qintar, miqra, q[^u]

/q[^u]/

q, queue, question

/^[^A-Za-z]+$/

1, 1:23, 1,234,567, :), \@/, ^_^

/^[^A-Za-z]+$/

^[^A-Za-z]+$, word, 11:50 a.m.

slide-15
SLIDE 15

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Character Classes II

15

\d

matches a digit (= [0-9])

\D

matches a non-digit (= [^0-9] or [^\d])

\w

matches a “word” char (= [A-Za-z0-9_])

\W

matches a non-“word” char (= [^\w])

\s

matches whitespace (=[ \t\n…])

\S

matches non-whitespace (= [^\s])

/^-?\d+$/

0, 1, -1, 1234, -000

/^-?\d+$/

  • -1, a1, 1e4, 1.0, empty string

/^\s*word/

word, maybe with some whitespace before

/^\s*word/

this line has a word

slide-16
SLIDE 16

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Boundaries

16

\b

matches a word boundary

\B

matches a non-word boundary

/word\b/

word, reword, sword

/word\b/

wordy, wordless, swordplay

/\bword\B/

wordy, wordless, wordplay

/\bword\B/

word, sword, swordplay

slide-17
SLIDE 17

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Case-Insensitivity

17

/…/i ignore case in matching

/cat/

cat, a cat, catalog, scatter, tomcat

/cat/

Cat, a Cat, Cathy, TomCat

/cat/i

cat, Cat, Cathy, tomcat, TomCat

/cat/i

dog

slide-18
SLIDE 18

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Commenting Regular Expressions

//x Whitespace and comments allowed in RE

Both must be quoted with \ to be part of RE

18

$text =~ s{ ( # start of opening <hostname> # open hostname element \s * # maybe some whitespace ) # end of opening . * ? # capture hostname here ( # start of closing \s * # maybe some whitespace </hostname> # end hostname element ) # end of closing } {$1$host$2}imx;

slide-19
SLIDE 19

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

19

Delimiters

print if /cat/i; # checks $_ for match print if m/cat/i; print if m,cat,i; print if m{cat}i; print if $some_string =~ /cat/i; print if $some_string =~ m/cat/i; print if $some_string =~ m,cat,i; print if $some_string =~ m{cat}i;

slide-20
SLIDE 20

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Other Scripting Languages

20

  • Most have regular expressions
  • Perl has the best, by far (cf. PCRE library)
  • Others may have limited REs or different syntax
  • OO languages often have match objects
slide-21
SLIDE 21

Cartwright 2012 Summer

Computer Sciences 368 Introduction to Perl

Homework

  • No Perl coding — just use provided script
  • Write regular expressions
  • Need to get 11 correct expressions for full credit
  • Some require that you explain what will and will

not match: Provide examples!!!

21