(PERL) Introduction What is PERL? Practical Report and Extraction - - PDF document

perl introduction
SMART_READER_LITE
LIVE PREVIEW

(PERL) Introduction What is PERL? Practical Report and Extraction - - PDF document

Practical Report and Extraction Language (PERL) Introduction What is PERL? Practical Report and Extraction Language. It is an interpreted language optimized for scanning arbitrary text files, extracting information from them, and


slide-1
SLIDE 1

Practical Report and Extraction Language

(PERL)

slide-2
SLIDE 2

Internet & Web Based Technology 2

Introduction

  • What is PERL?

– Practical Report and Extraction Language. – It is an interpreted language optimized for scanning arbitrary text files, extracting information from them, and printing reports based on that information. – Very powerful string handling features. – Available on all platforms.

slide-3
SLIDE 3

Internet & Web Based Technology 3

Main Advantages

  • Speed of development

– You can enter the program in a text file, and just run it. It is an interpretive language; no compiler is needed.

  • It is powerful

– The regular expressions of Perl are extremely powerful. – Uses sophisticated pattern matching techniques to scan large amounts of data very quickly.

  • Portability

– Perl is a standard language and is available on all platforms. – Free versions are available on the Internet.

  • Editing Perl programs

– No sophisticated editing tool is needed. – Any simple text editor like Notepad or vi will do.

slide-4
SLIDE 4

Internet & Web Based Technology 4

  • Flexibility

– Perl does not limit the size of your data. – If memory is available, Perl can handle the whole file as a single string. – Allows one to write simple programs to perform complex tasks.

slide-5
SLIDE 5

Internet & Web Based Technology 5

How to run Perl?

  • Perl can be downloaded from the Internet.

– Available on almost all platforms.

  • Assumptions:

– For Windows operating system, you can run Perl programs from the command prompt.

  • Run “cmd” to get command prompt window.

– For Unix/Linux, you can run directly from the shell prompt.

slide-6
SLIDE 6

Internet & Web Based Technology 6

  • Recommended steps:

– Create a directory/folder where you will be storing the Perl files. – Using any text editor, create a file “test.pl” with the following content: print “Good day\n”; print “This is my first Perl program\n”; – Execute the program by typing the following at the command prompt: perl test.pl

Working through an example

slide-7
SLIDE 7

Internet & Web Based Technology 7

  • On Unix/Linux, an additional line has to be given at

the beginning of every Perl program.

#!/usr/bin/perl print “Good day\n”; print “This is my first Perl program \n”;

slide-8
SLIDE 8

Internet & Web Based Technology 8

Variables

  • Scalar variables

– A scalar variable holds a single value. – Other variable types are also available (array and associative array) – to be discussed later. – A ‘$’ is used before the name of a variable to indicate that it is a scalar variable. $xyz = 20;

slide-9
SLIDE 9

Internet & Web Based Technology 9

  • Some examples:

$a = 10; $name=“Indranil Sen Gupta”; $average = 28.37; – Variables do not have any fixed types. – Variables can be printed as: print “My name is $name, the average temperature is $average\n”;

slide-10
SLIDE 10

Internet & Web Based Technology 10

  • Data types:

– Perl does not specify the types of variables.

  • It is a loosely typed language.
  • Languages like C or java are strongly typed.
slide-11
SLIDE 11

Internet & Web Based Technology 11

  • A powerful feature

– Variable names are automatically replaced by values when they appear in double-quoted strings.

  • An example:

$stud = “Rupak”; $marks = 75; print “Marks obtained by $stud is $marks\n”; print ‘Marks obtained by $stud is $marks\n’;

Variable Interpolation

slide-12
SLIDE 12

Internet & Web Based Technology 12

– The program will give the following output: Marks obtained by Rupak is 75 Marks obtained by $stud is $marks – What do we see:

  • If we need to do variable interpolation, use double

quotes; otherwise, use single quotes.

slide-13
SLIDE 13

Internet & Web Based Technology 13

  • Another example:

$Expense = ‘$100’; print “The expenditure is $Expense.\n”;

slide-14
SLIDE 14

Internet & Web Based Technology 14

Expressions with Scalars

  • Illustrated through examples (syntax similar to C)

$abc = 10; $abc++; $total- -; $a = $b ** 10; # exponentiation $a = $b % 10; # modulus $balance = $balance + $deposit; $balance += $deposit;

slide-15
SLIDE 15

Internet & Web Based Technology 15

  • Operations on strings:

– Concatenation: the dot (.) is used. $a = “Good”; $b = “ day”; $c = “\n”; $total = $a.$b.$c; # concatenate the strings $a .= “ day\n”; # add to the string $a

slide-16
SLIDE 16

Internet & Web Based Technology 16

– Arithmetic operations on strings $a = “bat”; $b = $a + 1; print $a, “ and ”, $b; will print bat and bau – Operations carried out based on ASCII codes.

  • May not always be meaningful.
slide-17
SLIDE 17

Internet & Web Based Technology 17

– String repetition operator (x). $a = $b x3; will concatenate three copies of $b and assign it to $a. print “Ba”. “na”x2; will print the string “banana”.

slide-18
SLIDE 18

Internet & Web Based Technology 18

String as a Number

  • A string can be used in an arithmetic expression.

– How is the value evaluated? – When converting a string to a number, Perl takes any spaces, an optional minus sign, and as many digits it can find (with dot) at the beginning of the string, and ignores everything else. “23.54” evaluates to 23.54 “123Hello25” evaluates to 123 “banana” evaluates to 0

slide-19
SLIDE 19

Internet & Web Based Technology 19

  • The character ‘\’ is used as the escape character.

– It escapes all of Perl’s special characters (e.g., $, @, #, etc.). $num = 20; print “Value of \$num is $num\n”; print “The windows path is c:\\perl\\”;

Escaping

slide-20
SLIDE 20

Internet & Web Based Technology 20

Line Oriented Quoting

  • Perl supports specification of a string spanning

multiple lines.

– Use the marker ‘<<’. – Follow it by a string, which is used to terminate the quoted material.

  • Example:

print << terminator; Hello, how are you? Good day. terminator

slide-21
SLIDE 21

Internet & Web Based Technology 21

  • Another example:

print “<HTML>\n”; print “<HEAD><TITLE>Test page </TITLE></HEAD>\n”; print “<BODY>\n”; print “<H2>This is a test document.<H2>\n”; print “</BODY></HTML>”;

slide-22
SLIDE 22

Internet & Web Based Technology 22

print << EOM; <HTML> <HEAD><TITLE>Test page </TITLE></HEAD> <BODY> <H2>This is a test document.<H2> </BODY></HTML> EOM

slide-23
SLIDE 23

Lists and Arrays

slide-24
SLIDE 24

Internet & Web Based Technology 24

Basic Difference

  • List is an ordered list of scalars.
  • Array is a variable that holds a list.
  • Each element of an array is a scalar.
  • The size of an array:

– Lower limit: 0 – Upper limit: no specific limit; depends on virtual memory.

slide-25
SLIDE 25

Internet & Web Based Technology 25

List Literal

  • Examples:

(10, 20, 50, 100) (‘red', “blue", “green") (“a", 1, 2, 3, ‘b') ($a, 12) () # empty list (10..20) # list constructor function (‘A’..’Z’) # same, for lettere\s

slide-26
SLIDE 26

Internet & Web Based Technology 26

Specifying Array Variable

  • We use the special character ‘@’.

@months # denotes an array The individual elements of the array are scalars, and can be referred to as:

$months[0] # first element of @months $months[1] # second element of @months ……

slide-27
SLIDE 27

Internet & Web Based Technology 27

Initializing an Array

  • Two ways:

– Specify values, separated by commas. @color = (‘red’, ‘green’, “blue”, “black”); – Use the quote words (qw) function, that uses space as the delimiter: @color = qw (red green blue black);

slide-28
SLIDE 28

Internet & Web Based Technology 28

Array Assignment

– Assign from a list of literals @numbers = (1, 2, 3); @colors = (“red”, “green”, “blue”); – From the contents of another array. @array1 = @array2; – Using the qw function: @word = qw (Hello good morning); – Combination of above: @allcolors = (“white”, @colors, “brown”);

slide-29
SLIDE 29

Internet & Web Based Technology 29

– Some other examples: @xyz = (2..5); @xyz = (1, @xyz); @xyz = (@xyz, 6);

slide-30
SLIDE 30

Internet & Web Based Technology 30

Multiple Assignments

($x, $y, $y) = (10, 20, 30); ($x, $y) = ($y, $x); # swap elements ($a, @col) = (‘red’, ‘green’, ‘blue’); # $a gets the value ‘red’ # @col gets the value (‘green’, ‘blue’) ($first, @val, $last) = (1, 2, 3, 4); # $first gets the value 1 # @val gets the value (2, 3, 4) # $last is undefined

slide-31
SLIDE 31

Internet & Web Based Technology 31

Number of Elements in Array

  • Two ways:

$size = scalar @colors; $size = @colors;

slide-32
SLIDE 32

Internet & Web Based Technology 32

Accessing Elements

@list = (1, 2, 3, 4); $first = $list[0]; $fourth = $list[3]; $list[1]++; # array becomes (1, 3, 3, 4) $x = $list[5]; # $x gets the value undef $list[2] = “Go”;# array becomes (1, 2, “Go”, 4)

slide-33
SLIDE 33

Internet & Web Based Technology 33

  • The $# is the index of the last element of the array.

@value = (1, 2, 3, 4, 5); print “$#value \n”; # prints 4

  • An empty array has the value

$#value = -1;

slide-34
SLIDE 34

Internet & Web Based Technology 34

shift and unshift

  • They operate on the front of the array.

– ‘shift’ removes the first element of the array. – ‘unshift’ replaces the element at the start of the array.

slide-35
SLIDE 35

Internet & Web Based Technology 35

  • Example:

@color = qw (red, blue, green, black); $first = shift @color; # $first gets “red”, and @color becomes # (blue, green, black) unshift (@color, “white”); # @color becomes (white, blue, green, black)

slide-36
SLIDE 36

Internet & Web Based Technology 36

pop and push

  • They operate on the bottom of the array.

– ‘pop’ removes the last element of the array. – ‘push’ replaces the last element of the array.

slide-37
SLIDE 37

Internet & Web Based Technology 37

  • Example:

@color = qw (red, blue, green, black); $first = pop @color; # $first gets “black”, and @color becomes # (red, blue, green) push (@color, “white”); # @color becomes (red, blue, green, white)

slide-38
SLIDE 38

Internet & Web Based Technology 38

Reversing an Array

  • By using the ‘reverse’ keyword.

@names = (“Mina”, “Tina”, ‘Rina”) @rev = reverse @names; # Reversed list stored in ‘rev’. @names = reverse @names; # Original array is reversed.

slide-39
SLIDE 39

Internet & Web Based Technology 39

Printing an Array

  • Example:

@colors = qw (red, green, blue); print @colors; # prints without spaces – redgreenblue print “@colors”; # prints with spaces – red green blue

slide-40
SLIDE 40

Internet & Web Based Technology 40

Sort the Elements of an Array

  • Using the ‘sort’ keyword, by default we can sort the

elements of an array lexicographically.

– Elements considered as strings. @colors = qw (red blue green black); @sort_col = sort @colors # Array @sort_col is (black blue green red)

slide-41
SLIDE 41

Internet & Web Based Technology 41

– Another example: @num = qw (10 2 5 22 7 15); @new = sort @num; # @new will contain (10 15 2 22 5 7) – How do sort numerically? @num = qw (10 2 5 22 7 15); @new = sort {$a <=> $b} @num; # @new will contain (2 5 7 10 15 22)

slide-42
SLIDE 42

Internet & Web Based Technology 42

The ‘splice’ function

  • Arguments to the ‘splice’ function:

– The first argument is an array. – The second argument is an offset (index number of the list element to begin splicing at). – Third argument is the number of elements to remove. @colors = (“red”, “green”, “blue”, “black”); @middle = splice (@colors, 1, 2); # @middle contains the elements removed

slide-43
SLIDE 43

File Handling

slide-44
SLIDE 44

Internet & Web Based Technology 44

Interacting with the user

  • Read from the keyboard (standard input).

– Use the file handle <STDIN>. – Very simple to use. print “Enter your name: ”; $name = <STDIN>; # Read from keyboard print “Good morning, $name. \n”; – $name also contains the newline character.

  • Need to chop it off.
slide-45
SLIDE 45

Internet & Web Based Technology 45

The ‘chop’ Function

  • The ‘chop’ function removes the last character of

whatever it is given to chop.

  • In the following example, it chops the newline.

print “Enter your name: ”; chop ($name = <STDIN>); # Read from keyboard and chop newline print “Good morning, $name. \n”;

  • ‘chop’ removes the last character irrespective of

whether it is a newline or not.

– Sometimes dangerous.

slide-46
SLIDE 46

Internet & Web Based Technology 46

Safe chopping: ‘chomp’

  • The ‘chomp’ function works similar to ‘chop’, with

the difference that it chops off the last character

  • nly if it is a newline.

print “Enter your name: ”; chomp ($name = <STDIN>); # Read from keyboard and chomp newline print “Good morning, $name. \n”;

slide-47
SLIDE 47

Internet & Web Based Technology 47

File Operations

  • Opening a file

– The ‘open’ command opens a file and returns a file handle. – For standard input, we have a predefined handle <STDIN>. $fname = “/home/isg/report.txt”;

  • pen XYZ , $fname;

while (<XYZ>) { print “Line number $. : $_”; }

slide-48
SLIDE 48

Internet & Web Based Technology 48

– Checking the error code: $fname = “/home/isg/report.txt”;

  • pen XYZ, $fname or die “Error in open: $!”;

while (<XYZ>) { print “Line number $. : $_”; } – $. returns the line number (starting at 1) – $_ returns the contents of last match – $i returns the error code/message

slide-49
SLIDE 49

Internet & Web Based Technology 49

  • Reading from a file:

– The last example also illustrates file reading. – The angle brackets (< >) are the line input operators.

  • The data read goes into $_
slide-50
SLIDE 50

Internet & Web Based Technology 50

  • Writing into a file:

$out = “/home/isg/out.txt”;

  • pen XYZ , “>$out” or die “Error in write: $!”;

for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; }

slide-51
SLIDE 51

Internet & Web Based Technology 51

  • Appending to a file:

$out = “/home/isg/out.txt”;

  • pen XYZ , “>>$out” or die “Error in write: $!”;

for $i (1..20) { print XYZ “$i :: Hello, the time is”, scalar(localtime), “\n”; }

slide-52
SLIDE 52

Internet & Web Based Technology 52

  • Closing a file:

close XYZ;

where XYZ is the file handle of the file being closed.

slide-53
SLIDE 53

Internet & Web Based Technology 53

  • Printing a file:

– This is very easy to do in Perl. $input = “/home/isg/report.txt”;

  • pen IN, $input or die “Error in open: $!”;

while (<IN>) { print; } close IN;

slide-54
SLIDE 54

Internet & Web Based Technology 54

Command Line Arguments

  • Perl uses a special array called @ARGV.

– List of arguments passed along with the script name on the command line. – Example: if you invoke Perl as: perl test.pl red blue green then @ARGV will be (red blue green). – Printing the command line arguments: foreach (@ARGV) { print “$_ \n”; }

slide-55
SLIDE 55

Internet & Web Based Technology 55

Standard File Handles

  • <STDIN>

– Read from standard input (keyboard).

  • <STDOUT>

– Print to standard output (screen).

  • <STDERR>

– For outputting error messages.

  • <ARGV>

– Reads the names of the files from the command line and

  • pens them all.
slide-56
SLIDE 56

Internet & Web Based Technology 56

– @ARGV array contains the text after the program’s name in command line.

  • <ARGV> takes each file in turn.
  • If there is nothing specified on the command line, it

reads from the standard input. – Since this is very commonly used, Perl provides an abbreviation for <ARGV>, namely, < > – An example is shown.

slide-57
SLIDE 57

Internet & Web Based Technology 57

$lineno = 1; while (< >) { print $lineno ++; print “$lineno: $_”; } – In this program, the name of the file has to be given on the command line. perl list_lines.pl file1.txt perl list_lines.pl a.txt b.txt c.txt

slide-58
SLIDE 58

Control Structures

slide-59
SLIDE 59

Internet & Web Based Technology 59

Introduction

  • There are many control constructs in Perl.

– Similar to those in C. – Would be illustrated through examples. – The available constructs:

  • for
  • foreach
  • if/elseif/else
  • while
  • do, etc.
slide-60
SLIDE 60

Internet & Web Based Technology 60

Concept of Block

  • A statement block is a sequence of statements

enclosed in matching pair of { and }.

if (year == 2000) { print “You have entered new millenium.\n”; }

  • Blocks may be nested within other blocks.
slide-61
SLIDE 61

Internet & Web Based Technology 61

Definition of TRUE in Perl

  • In Perl, only three things are considered as FALSE:

– The value 0 – The empty string (“ ”) – undef

  • Everything else in Perl is TRUE.
slide-62
SLIDE 62

Internet & Web Based Technology 62

if .. else

  • General syntax:

if (test expression) { # if TRUE, do this } else { # if FALSE, do this }

slide-63
SLIDE 63

Internet & Web Based Technology 63

  • Examples:

if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } else { print “You are somebody else. \n”; } if ($flag == 1) { print “There has been an error. \n”; } # The else block is optional

slide-64
SLIDE 64

Internet & Web Based Technology 64

elseif

  • Example:

print “Enter your id: ”; chomp ($name = <STDIN>); if ($name eq ‘isg’) { print “Welcome Indranil. \n”; } elseif ($name eq ‘bkd’) { print “Welcome Bimal. \n”; } elseif ($name eq ‘akm’) { print “Welcome Arun. \n”; } else { print “Sorry, I do not know you. \n”; }

slide-65
SLIDE 65

Internet & Web Based Technology 65

while

  • Example: (Guessing the correct word)

$your_choice = ‘ ‘; $secret_word = ‘India’; while ($your_choice ne $secret_word) { print “Enter your guess: \n”; chomp ($your_choice = <STDIN>); } print “Congratulations! Mera Bharat Mahan.”

slide-66
SLIDE 66

Internet & Web Based Technology 66

for

  • Syntax same as in C.
  • Example:

for ($i=1; $i<10; $i++) { print “Iteration number $i \n”; }

slide-67
SLIDE 67

Internet & Web Based Technology 67

foreach

  • Very commonly used function that iterates over a

list.

  • Example:

@colors = qw (red blue green); foreach $name (@colors) { print “Color is $name. \n”; }

  • We can use ‘for’ in place of ‘foreach’.
slide-68
SLIDE 68

Internet & Web Based Technology 68

  • Example: Counting odd numbers in a list

@xyz = qw (10 15 17 28 12 77 56); $count = 0; foreach $number (@xyz) { if (($number % 2) == 1) { print “$number is odd. \n”; $count ++; } print “Number of odd numbers is $count. \n”; }

slide-69
SLIDE 69

Internet & Web Based Technology 69

Breaking out of a loop

  • The statement ‘last’, if it appears in the body of a

loop, will cause Perl to immediately exit the loop.

– Used with a conditional. last if (i > 10);

slide-70
SLIDE 70

Internet & Web Based Technology 70

Skipping to end of loop

  • For this we use the statement ‘next’.

– When executed, the remaining statements in the loop will be skipped, and the next iteration will begin. – Also used with a conditional.

slide-71
SLIDE 71

Relational Operators

slide-72
SLIDE 72

Internet & Web Based Technology 72

The Operators Listed

le <= Less or equal ge >= Greater or equal lt < Less than gt > Greater than ne != Not equal eq == Equal String Numeric Comparison

slide-73
SLIDE 73

Internet & Web Based Technology 73

Logical Connectives

  • If $a and $b are logical expressions, then the

following conjunctions are supported by Perl:

– $a and $b $a && $b – $a or $b $a || $b – not $a ! $a

  • Both the above alternatives are equivalent; first one

is more readable.

slide-74
SLIDE 74

String Functions

slide-75
SLIDE 75

Internet & Web Based Technology 75

The Split Function

  • ‘split’ is used to split a string into multiple pieces using a

delimiter, and create a list out of it.

$_=‘Red:Blue:Green:White:255'; @details = split /:/, $_; foreach (@details) { print “$_\n”; }

– The first parameter to ‘split’ is a regular expression that specifies what to split on. – The second specifies what to split.

slide-76
SLIDE 76

Internet & Web Based Technology 76

  • Another example:

$_= “Indranil isg@iitkgp.ac.in 283496”; ($name, $email, $phone) = split / /, $_;

  • By default, ‘split’ breaks a string using space as

delimiter.

slide-77
SLIDE 77

Internet & Web Based Technology 77

The Join Function

  • ‘join’ is used to concatenate several elements into a

single string, with a specified delimiter in between.

$new = join ' ', $x1, $x2, $x3, $x4, $x5, $x6; $sep = ‘::’; $new = join $sep, $x1, $x2, $w3, @abc, $x4, $x5;

slide-78
SLIDE 78

Regular Expressions

slide-79
SLIDE 79

Internet & Web Based Technology 79

Introduction

  • One of the most useful features of Perl.
  • What is a regular expression (RegEx)?

– Refers to a pattern that follows the rules of syntax. – Basically specifies a chunk of text. – Very powerful way to specify string patterns.

slide-80
SLIDE 80

Internet & Web Based Technology 80

An Example: without RegEx

$found = 0; $_ = “Hello good morning everybody”; $search = “every”; foreach $word (split) { if ($word eq $search) { $found = 1; last; } } if ($found) { print “Found the word ‘every’ \n”; }

slide-81
SLIDE 81

Internet & Web Based Technology 81

Using RegEx

$_ = “Hello good morning everybody”; if ($_ =~ /every/) { print “Found the word ‘every’ \n”; }

  • Very easy to use.
  • The text between the forward slashes defines the

regular expression.

  • If we use “!~” instead of “=~”, it means that the

pattern is not present in the string.

slide-82
SLIDE 82

Internet & Web Based Technology 82

  • The previous example illustrates literal texts as

regular expressions.

– Simplest form of regular expression.

  • Point to remember:

– When performing the matching, all the characters in the string are considered to be significant, including punctuation and white spaces.

  • For example, /every / will not match in the previous

example.

slide-83
SLIDE 83

Internet & Web Based Technology 83

Another Simple Example

$_ = “Welcome to IIT Kharagpur, students”; if (/IIT K/) { print “’IIT K’ is present in the string\n”; { if (/Kharagpur students/) { print “This will not match\n”; }

slide-84
SLIDE 84

Internet & Web Based Technology 84

Types of RegEx

  • Basically two types:

– Matching

  • Checking if a string contains a substring.
  • The symbol ‘m’ is used (optional if forward slash used

as delimiter). – Substitution

  • Replacing a substring by another substring.
  • The symbol ‘s’ is used.
slide-85
SLIDE 85

Matching

slide-86
SLIDE 86

Internet & Web Based Technology 86

The =~ Operator

  • Tells Perl to apply the regular expression on the

right to the value on the left.

  • The regular expression is contained within

delimiters (forward slash by default).

– If some other delimiter is used, then a preceding ‘m’ is essential.

slide-87
SLIDE 87

Internet & Web Based Technology 87

Examples

$string = “Good day”; if ($string =~ m/day/) { print “Match successful \n"; } if ($string =~ /day/) { print “Match successful \n"; }

  • Both forms are equivalent.
  • The ‘m’ in the first form is optional.
slide-88
SLIDE 88

Internet & Web Based Technology 88

$string = “Good day”; if ($string =~ m@day@) { print “Match successful \n"; } if ($string =~ m[day[ ) { print “Match successful \n"; }

  • Both forms are equivalent.
  • The character following ‘m’ is the delimiter.
slide-89
SLIDE 89

Internet & Web Based Technology 89

Character Class

  • Use square brackets to specify “any value in the list
  • f possible values”.

my $string = “Some test string 1234"; if ($string =~ /[0123456789]/) { print "found a number \n"; } if ($string =~ /[aeiou]/) { print "Found a vowel \n"; } if ($string =~ /[0123456789ABCDEF]/) { print "Found a hex digit \n"; }

slide-90
SLIDE 90

Internet & Web Based Technology 90

Character Class Negation

  • Use ‘^’ at the beginning of the character class to

specify “any single element that is not one of these values”.

my $string = “Some test string 1234"; if ($string =~ /[^aeiou]/) { print "Found a consonant\n"; }

slide-91
SLIDE 91

Internet & Web Based Technology 91

Pattern Abbreviations

  • Useful in common cases

Not a space character \S Not a word character \W Not a digit, same as [^0-9] \D A space character (tab, space, etc) \s A word character, [0-9a-zA-Z_] \w A digit, same as [0-9] \d Anything except newline (\n)

.

slide-92
SLIDE 92

Internet & Web Based Technology 92

$string = “Good and bad days"; if ($string =~ /d..s/) { print "Found something like days\n"; } if ($string =~ /\w\w\w\w\s/) { print "Found a four-letter word!\n"; }

slide-93
SLIDE 93

Internet & Web Based Technology 93

Anchors

  • Three ways to define an anchor:

^ :: anchors to the beginning of string $ :: anchors to the end of the string \b :: anchors to a word boundary

slide-94
SLIDE 94

Internet & Web Based Technology 94

if ($string =~ /^\w/) :: does string start with a word character? if ($string =~ /\d$/) :: does string end with a digit? if ($string =~ /\bGood\b/) :: Does string contain the word “Good”?

slide-95
SLIDE 95

Internet & Web Based Technology 95

Multipliers

  • There are three multiplier characters.

* :: Find zero or more occurrences + :: Find one or more occurrences ? :: Find zero or one occurrence

  • Some example usages:

$string =~ /^\w+/; $string =~ /\d?/; $string =~ /\b\w+\s+/; $string =~ /\w+\s?$/;

slide-96
SLIDE 96

Substitution

slide-97
SLIDE 97

Internet & Web Based Technology 97

Basic Usage

  • Uses the ‘s’ character.
  • Basic syntax is:

$new =~ s/pattern_to_match/new_pattern/; What this does?

  • Looks for pattern_to_match in $new and, if found,

replaces it with new_pattern.

  • It looks for the pattern once. That is, only the first
  • ccurrence is replaced.
  • There is a way to replace all occurrences (to be

discussed shortly).

slide-98
SLIDE 98

Internet & Web Based Technology 98

Examples

$xyz = “Rama and Lakshman went to the forest”; $xyz =~ s/Lakshman/Bharat/; $xyz =~ s/R\w+a/Bharat/; $xyz =~ s/[aeiou]/i/; $abc = “A year has 11 months \n”; $abc =~ s/\d+/12/; $abc =~ s /\n$/ /;

slide-99
SLIDE 99

Internet & Web Based Technology 99

Common Modifiers

  • Two such modifiers are defined:

/i :: ignore case /g :: match/substitute all occurrences $string = “Ram and Shyam are very honest"; if ($string =~ /RAM/i) { print “Ram is present in the string”; } $string =~ s/m/j/g; # Ram -> Raj, Shyam -> Shyaj

slide-100
SLIDE 100

Internet & Web Based Technology 100

Use of Memory in RegEx

  • We can use parentheses to capture a piece of

matched text for later use.

– Perl memorizes the matched texts. – Multiple sets of parentheses can be used.

  • How to recall the captured text?

– Use \1, \2, \3, etc. if still in RegEx. – Use $1, $2, $3 if after the RegEx.

slide-101
SLIDE 101

Internet & Web Based Technology 101

Examples

$string = “Ram and Shyam are honest"; $string =~ /^(\w+)/; print $1, "\n"; # prints “Ra\n” $string =~ /(\w+)$/; print $1, "\n"; # prints “st\n” $string =~ /^(\w+)\s+(\w+)/; print "$1 $2\n"; # prints “Ramnd Shyam are honest”;

slide-102
SLIDE 102

Internet & Web Based Technology 102

$string = “Ram and Shyam are very poor"; if ($string =~ /(\w)\1/) { print "found 2 in a row\n"; } if ($string =~ /(\w+).*\1/) { print "found repeat\n"; } $string =~ s/(\w+) and (\w+)/$2 and $1/;

slide-103
SLIDE 103

Internet & Web Based Technology 103

Example 1

  • validating user input

print “Enter age (or 'q' to quit): "; chomp (my $age = <STDIN>); exit if ($age =~ /^q$/i); if ($age =~ /\D/) { print "$age is a non-number!\n"; }

slide-104
SLIDE 104

Internet & Web Based Technology 104

Example 2: validation contd.

  • File has 2 columns, name and age, delimited by one
  • r more spaces. Can also have blank lines or

commented lines (start with #).

  • pen IN, $file or die "Cannot open $file: $!";

while (my $line = <IN>) { chomp $line; next if ($line =~ /^\s*$/ or $line =~ /^\s*#/); my ($name, $age) = split /\s+/, $line; print “The age of $name is $age. \n"; }

slide-105
SLIDE 105

Some Special Variables

slide-106
SLIDE 106

Internet & Web Based Technology 106

$&, $` and $’

  • What is $&?

– It represents the string matched by the last successful pattern match.

  • What is $`?

– It represents the string preceding whatever was matched by the last successful pattern match.

  • What is $‘?

– It represents the string following whatever was matched by the last successful pattern match .

slide-107
SLIDE 107

Internet & Web Based Technology 107

– Example: $_ = 'abcdefghi'; /def/; print "$\`:$&:$'\n"; # prints abc:def:ghi

slide-108
SLIDE 108

Internet & Web Based Technology 108

  • So actually ….

– S` represents pre match – $& represents present match – $’ represents post match

slide-109
SLIDE 109

Associative Arrays

slide-110
SLIDE 110

Internet & Web Based Technology 110

Introduction

  • Associative arrays, also known as hashes.

– Similar to a list

  • Every list element consists of a pair, a hash key and a

value.

  • Hash keys must be unique.

– Accessing an element

  • Unlike an array, an element value can be found out by

specifying the hash key value.

  • Associative search.

– A hash array name must begin with a ‘%’.

slide-111
SLIDE 111

Internet & Web Based Technology 111

Specifying Hash Array

  • Two ways to specify:

– Specifying hash keys and values, in proper sequence. %directory = ( “Rabi”, “258345”, “Chandan”, “325129”, “Atul”, “445287”, “Sruti”, “237221” );

slide-112
SLIDE 112

Internet & Web Based Technology 112

– Using the => operator. %directory = ( Rabi => “258345”, Chandan => “325129”, Atul => “445287”, Sruti => “237221” ); – Whatever appears on the left hand side of ‘=>’ is treated as a double-quoted string.

slide-113
SLIDE 113

Internet & Web Based Technology 113

Conversion Array <=> Hash

  • An array can be converted to hash.

@list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list;

  • A hash can be converted to an array:

@list = %directory;

slide-114
SLIDE 114

Internet & Web Based Technology 114

Accessing a Hash Element

  • Given the hash key, the value can be accessed

using ‘{ }’.

  • Example:

@list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; print “Atul’s number is $directory{“Atul”} \n”;

slide-115
SLIDE 115

Internet & Web Based Technology 115

Modifying a Value

  • By simple assignment:

@list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; $directory{Sruti} = “453322”; $directory{‘Chandan’} ++;

slide-116
SLIDE 116

Internet & Web Based Technology 116

Deleting an Entry

  • A (hash key, value) pair can be deleted from a hash

array using the “delete” function.

– Hash key has to be specified. @list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; delete $directory{Atul};

slide-117
SLIDE 117

Internet & Web Based Technology 117

Swapping Keys and Values

  • Why needed?

– Suppose we want to search for a person, given the phone number. @list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; %revdir = reverse %directory; print “$revdir{237221} \n”;

slide-118
SLIDE 118

Internet & Web Based Technology 118

Using Functions ‘keys’, ‘values’

  • ‘keys’ returns all the hash keys as a list.
  • ‘values’ returns all the values as a list.

@list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; @all_names = keys %directory; @all_phones = values %directory;

slide-119
SLIDE 119

Internet & Web Based Technology 119

An Example

  • List all person names and telephone numbers.

@list = qw (Rabi 258345 Chandan 325129 Atul 445287 Sruti 237221); %directory = @list; foreach $name (keys %directory) { print “$name \t $directory{$name} \n”; }

slide-120
SLIDE 120

Subroutines

slide-121
SLIDE 121

Internet & Web Based Technology 121

Introduction

  • A subroutine …..

– Is a user-defined function. – Allows code reuse. – Define ones, use multiple times.

slide-122
SLIDE 122

Internet & Web Based Technology 122

How to use?

  • Defining a subroutine

sub test_sub { # the body of the subroutine goes here # …….. }

  • Calling a subroutine

– Use the ‘&’ prefix to call a subroutine. &test_sub; &gcd ($val1, $val2); # Two parameters – However, the ‘&’ is optional.

slide-123
SLIDE 123

Internet & Web Based Technology 123

Subroutine Return Values

  • Use the ‘return’ statement.

– This is also optional. – If the keyword ‘return’ is omitted, Perl functions return the last value evaluated.

  • A subroutine can also return a non-scalar.
  • Some examples are given next.
slide-124
SLIDE 124

Internet & Web Based Technology 124

Example 1

$name = ‘Indranil'; welcome(); # call the first sub welcome_namei(); # call the second sub exit; sub welcome { print "hi there\n"; } sub welcome_name { print "hi $name\n"; # uses global $name variable }

slide-125
SLIDE 125

Internet & Web Based Technology 125

Example 2

# Return a non-scalar sub return_alpha_and_beta { return ($alpha, $beta); } $alpha = 15; $beta = 25; @c = return_alpha_and_beta; # @c gets (5,6)

slide-126
SLIDE 126

Internet & Web Based Technology 126

Passing Arguments

  • All arguments are passed into a Perl function

through the special array $_.

– Thus, we can send as many arguments as we want.

  • Individual arguments can also be accessed as $_[0],

$_[1], $_[2], etc.

slide-127
SLIDE 127

Internet & Web Based Technology 127

Example 3

# Two different ways to write a subroutine to add two numbers sub add_ver1 { ($first, $second) = @_; return ($first + $second); } sub add_ver2 { return $_[0] + $_[1]; # $_[0] and $_[1] are the first two # elements of @_ }

slide-128
SLIDE 128

Internet & Web Based Technology 128

Example 4

$total = find_total (5, 10, -12, 7, 40); sub find_total { # adds all numbers passed to the sub $sum = 0; for $num (@_) { $sum += $num; } return $sum; }

slide-129
SLIDE 129

Internet & Web Based Technology 129

‘my’ variables

  • We can define local variables using the ‘my’

keyword.

– Confines a variable to a region of code (within a block { } ). – ‘my’ variable’s storage is freed whenever the variable goes

  • ut of scope.

– All variables in Perl is by default ‘global’.

slide-130
SLIDE 130

Internet & Web Based Technology 130

Example 5

$sum = 7; $total = add_any (20, 10, -15); # $total gets 15 sub add_any { # local variable, won't interfere # with global $sum my $sum = 0; for my $num (@_ ) { $sum += $num; } return $sum; }

slide-131
SLIDE 131

Writing CGI Scripts in Perl

slide-132
SLIDE 132

Internet & Web Based Technology 132

Introduction

  • Perl provides with a number of facilities to facilitate

writing of CGI scripts.

– Standard library modules.

  • Included as part of the Perl distribution.
  • No need to install them separately.

#!/usr/bin/perl use CGI qw (:standard);

slide-133
SLIDE 133

Internet & Web Based Technology 133

  • Some of the functions included in the CGI.pm (.pm

is optional) are:

– header

  • This prints out the “Content-type” header.
  • With no arguments, the type is assumed to be

“text/html”. – start_html

  • This prints out the <html>, <head>, <title> and <body>

tags.

  • Accepts optional arguments.
slide-134
SLIDE 134

Internet & Web Based Technology 134

– end_html

  • This prints out the closing HTML tags, </body>, >/html>.
  • Typical usages and arguments would be illustrated

through examples.

slide-135
SLIDE 135

Internet & Web Based Technology 135

Example 1 (without using CGI.pm)

#!/usr/bin/perl print <<TO_END; Content-type: text/html <HTML> <HEAD> <TITLE> Server Details </TITLE> </HEAD> <BODY> Server name: $ENV{SERVER_NAME} <BR> Server port number: $ENV{SERVER_PORT} <BR> Server protocol: $ENV{SERVER_PROTOCOL} </BODY> </HTML> TO_END

slide-136
SLIDE 136

Internet & Web Based Technology 136

Example 2 (using CGI.pm)

#!/usr/bin/perl -wT use CGI qw(:standard); print header (“text/html”); print start_html ("Hello World"); print "<h2>Hello, world!</h2>\n"; print end_html;

slide-137
SLIDE 137

Internet & Web Based Technology 137

Example 3: Decoding Form Input

sub parse_form_data { my %form_data; my $name_value; my @nv_pairs = split /&/, $ENV{QUERY_STRING}; if ( $ENV{REQUEST_METHOD} eq ‘POST’ ) { my $query = “”; read (STDIN, $query, $ENV{CONTENT_LENGTH}); push @nv_pairs, split /&/, $query; }

slide-138
SLIDE 138

Internet & Web Based Technology 138

foreach $name_value (@nv_pairs) { my ($name, $value) = split /=/, $name_value; $name =~ tr/+/ /; $name =~ s/%([\da-f][\da-f])/chr (hex($1))/egi; $value =~ tr/+/ /; $value =~ s/%([\da-f][\da-f])/chr (hex($1))/egi; $form_data{$name} = $value; } return %form_data; }

slide-139
SLIDE 139

Internet & Web Based Technology 139

Using CGI.pm

  • The decoded form value can be directly accessed

as:

$value = param (‘fieldname’);

  • An equivalent Perl code as in the last example using

CGI.pm

– Shown in next slide.

slide-140
SLIDE 140

Internet & Web Based Technology 140

Example 4

#!/usr/bin/perl -wT use CGI qw(:standard); my %form_data; foreach my $name (param() ) { $form_data {$name} = param($name); }

slide-141
SLIDE 141

Internet & Web Based Technology 141

Example 5: sending mail

#!/usr/bin/perl -wT use CGI qw(:standard); print header; print start_html (“Response to Guestbook”); $ENV{PATH} = “/usr/sbin”; # to locate sendmail

  • pen (MAIL, “| /usr/sbin/sendmail –oi –t”);

# open the pipe to sendmail my $recipient = ‘xyz@hotmail.com’; print MAIL “To: $recipient\n”; print MAIL “From: isg\@cse.iitkgp.ac.in\n”; print MAIL “Subject: Submitted data\n\n”;

slide-142
SLIDE 142

Internet & Web Based Technology 142

foreach my $xyz (param()) { print MAIL “$xyz = “, param($xyz), “\n”; } close (MAIL); print <<EOM; <h2>Thanks for the comments</h2> <p>Hope you visit again.</p> EOM print end_html;