Practical Extraction and Report Language Perl is a language of - - PowerPoint PPT Presentation

practical extraction and report language perl is a
SMART_READER_LITE
LIVE PREVIEW

Practical Extraction and Report Language Perl is a language of - - PowerPoint PPT Presentation

Practical Extraction and Report Language Perl is a language of getting your job done There is more than one way to do it Larry Wall VI, March 2005 Page 1 Practical Extraction and Report Language http://perl.oreilly.com "


slide-1
SLIDE 1

Page 1

VI, March 2005

Practical Extraction and Report Language « Perl is a language of getting your job done » Larry Wall « There is more than one way to do it »

slide-2
SLIDE 2

Page 2

VI, March 2005

Practical Extraction and Report Language

http://perl.oreilly.com

" Perl is both a programming language and an application on your computer that runs those programs "

slide-3
SLIDE 3

Page 3

VI, March 2005

Perl history

1969 UNIX was born at Bell Labs. 1970 Brian Kernighan suggested the name "Unix" and the operating system we know today was born. 1972 The programming language C is born at the Bell Labs (C is one of Perl's ancestors). 1973 “grep” is introduced by Ken Thompson as an external utility: Global REgular expression Print. 1976 Steven Jobs and Steven Wozniak found Apple Computer (1 April). 1977 The computer language awk is designed by Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan (awk is one of Perl's ancestors).

A few dates:

slide-4
SLIDE 4

Page 4

VI, March 2005

Perl history

1987 Perl 1.000 is unleashed upon the world

NAME perl | Practical Extraction and Report Language SYNOPSIS perl [options] filename args DESCRIPTION Perl is a interpreted language optimized for scanning arbitrary text files, extracting information from those text files, and printing reports based on that

  • information. It's also a good language for many system management tasks. The language

is intended to be practical (easy to use, efficient, complete) rather than beautiful (tiny, elegant, minimal). It combines (in the author's opinion, anyway) some of the best features

  • f C, sed, awk, and sh, so people familiar with those languages should have little difficulty

with it (Language historians will also note some vestiges of csh, Pascal, and even BASIC|PLUS). Expression syntax corresponds quite closely to C expression syntax. If you have a problem that would ordinarily use sed or awk or sh, but it exceeds their capabilities or must run a little faster, and you don't want to write the silly thing in C, then perl may be for you. There are also translators to turn your sed and awk scripts into perl scripts OK, enough hype.

slide-5
SLIDE 5

Page 5

VI, March 2005

Perl history

1994 Perl5: last major release (Currently Perl 5.8.6). 1996 Creation of the CPAN repository of modules and documentation ( Comprehensive Perl Archive Network). 2005 Perl 5.8.6 Supported Operating Systems: Unix systems / Macintosh (OS 7-9 and X) / Windows / VMS Perl Features Perls database integration interface (DBI) supports thirdparty databases including Oracle, Sybase, Postgres, MySQL and others. Perl works with HTML, XML, and other markup languages . Perl supports Unicode. Perl is Y2K compliant. Perl supports both procedural and objectoriented programming. Perl interfaces with external C/C++ libraries through XS or SWIG. Perl is extensible There are over 500 third party modules available from (CPAN).

slide-6
SLIDE 6

Page 6

VI, March 2005

Perl history

Perl and the Web

Perl is the most popular web programming language due to its text manipulation capabilities and rapid development cycle. Perl's CGIpm module, part of Perl's standard distribution, makes handling HTML forms simple. Perl can handle encrypted Web data, including ecommerce transactions. Perl can be embedded into web servers (mod_perl) to speed up processing by as much as 2000%. Perl's DBI package makes webdatabase integration easy.

slide-7
SLIDE 7

Page 7

VI, March 2005

Perl Hello world !

My first program (hello.pl):

computerX: vioannid$ which perl /usr/bin/perl computerY: vioannid$ which perl /usr/local/bin/perl

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

The first line of a Perl program is called "command interpretation" or "Shebang line". This line refers to the "#!" and tells the computer that this is a Perl program. To find out whether you should use /usr/bin/perl OR /usr/local/bin/perl, type: "which perl" in your shell:

slide-8
SLIDE 8

Page 8

VI, March 2005

Perl Hello world !

My first program (hello.pl):

use strict; A command like use strict is called a pragma. Pragmas are instructions to the Perl interpreter to do something special when it runs your program. "use strict" does two things that make it harder to write bad software: It makes you declare all your variables, and it makes it harder for Perl to mistake your intentions when you are using subroutines ALL STATEMENTS ENDS IN A SEMICOLON ";" (similar to the use of the period "." in the English language)

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world " print "Hello world" ; #tell the program to exit exit ;

slide-9
SLIDE 9

Page 9

VI, March 2005

Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

use warnings; Comments are good, but the most important tool for writing good Perl is the "warnings". Turning on warnings will make Perl yelp and complain at a huge variety of things that are almost always sources of bugs in your programs. Perl normally takes a relaxed attitude toward things that may be problems: it assumes that you know what you're doing, even when you don't…

slide-10
SLIDE 10

Page 10

VI, March 2005

Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

Comments All lines starting with "#" are not taken into account in the execution of the program. Good comments are short, but instructive They tell you things that aren't clear from reading the code. Blank lines or spaces are also not taken into account in the execution of the program. However, they help in the reading of the code.

slide-11
SLIDE 11

Page 11

VI, March 2005

Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

Print statement: … prints ! By default, the standard output is the shell window from which the program is executed. ALL STATEMENTS ENDS IN A SEMICOLON ";" (similar to the use of the period "." in the English language)

slide-12
SLIDE 12

Page 12

VI, March 2005

Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

The exit statement: Tells the computer to exit the program. Although not explicitely required in Perl, it is definitely common.

slide-13
SLIDE 13

Page 13

VI, March 2005

Perl Hello world !

My first program (hello.pl):

#!/usr/local/bin/perl use strict; use warnings; #tell the program to print "Hello world" print "Hello world" ; #tell the program to exit exit ;

(Do not forget to make the file executable: vioannid$ chmod a+x perl_01.pl ) vioannid$ ./perl_01.pl Hello worldvioannid$

  • utput:
slide-14
SLIDE 14

Page 14

VI, March 2005

Perl Hello world !! Print:

#!/usr/local/bin/perl use strict; use warnings; #play with the print statement #words separated by newline print "Hello\nworld\n" ; #words separated by tabs & a final newline print "Hello\tworld\n" ; #usage of the period to cat strings print "Hello"."world"."\n"; #tell the program to exit exit ; vioannid$ ./perl_02.pl Hello world Hello world Helloworld vioannid$

Important: Unix & all Unix flavors: \n Mac OS : \r Windows: \r\n

slide-15
SLIDE 15

Page 15

VI, March 2005

Perl variables

Perl has 3 data types: scalars / arrays / hashes

scalars

a single string (of any size, limited only by the available memory), or a number, or a reference to something Scalar values are always named with '$' (even when referring to a scalar that is part of an array or a hash). The '$' symbol works semantically like the English word "the" in that it indicates a single value is expected.

my $variable_1 = "Hello world !\n"; #note the quotes my $variable_two = 30; #note the absence of quotes my $marks[4]; # the fifth element of the array "marks"

slide-16
SLIDE 16

Page 16

VI, March 2005

Perl variables

Perl has 3 data types: scalars / arrays / hashes

arrays (of scalars)

Normal arrays are ordered lists of scalars indexed by number (starting with 0). Entire arrays are denoted by '@', which works much like the word "these" or "those" does in English, in that it indicates multiple values are expected.

my @numbers = ("One", "Two", "Three", "Four", "Five"); my @numbers = (1..5); #same as "@numbers = (1, 2, 3, 4, 5);" my $numbers[0] = "One"; my $numbers[1] = "Two"; … my @anyarray = (6, "hello", @numbers);

Five Four Three Two One 4 3 2 1

index value

slide-17
SLIDE 17

Page 17

VI, March 2005

Perl variables

Perl has 3 data types:

hashes (associative arrays of scalars)

Hashes are unordered collections of scalar values indexed by their associated string key. Entire hashes are denoted by '%'

my %var = ("a","first","b","3"); my %codon3 = ( "TTT" => "Phe", "TTA" => "Leu", ); print $codon3{'TTT'};

Tyr TAT Cys TGT Ser TCT Phe TTT

Value Key

slide-18
SLIDE 18

Page 18

VI, March 2005

Perl special variables (small extract)

$_ The default input and patternsearching space. $& The string matched by the last successful pattern match. $` The string preceding whatever was matched by the last successful pattern match. $' The string following whatever was matched by the last successful pattern match. $! If a system or library call fails, it sets this variable This means that the value of $! is meaningful only immediately after a failure. $/ The input record separator, newline by default . $$ The process number of the Perl running this script. @ARGV commandline arguments (space separation by default). note: $ARGV[0] first commandline argument …

slide-19
SLIDE 19

Page 19

VI, March 2005

Perl variables

Programs using variables :

#!/usr/local/bin/perl use strict; use warnings; my $name = "John Doe"; print "Hello $name !\n" ; exit ; #!/usr/local/bin/perl use strict; use warnings; my $name = $ARGV[0]; print "Hello $name !\n" ; exit ; #!/usr/local/bin/perl use strict; use warnings; print "\nEnter your name (then press \"return\" when done):\t"; #get information from the #terminal window my $name = <STDIN>; print "Hello $name !\n" ; exit ;

Interpolation & quoting: the quotes have different significations …

my $price = '$100'; print "the price is $price";

#this is called interpolation …

slide-20
SLIDE 20

Page 20

VI, March 2005

Perl variables

Program using variables :

#!/usr/local/bin/perl use strict; use warnings; my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav", "Kyrill", "Petr", "Sebastien"); print "Hello\n @names !\n" ; exit ;

Some arrays functions: sort sorts all the elements of an array. reverse inverses the order of all the elements of an array. shift, unshift takes the first element, places an element at the first position of the array. pop, push takes the last element, places an element at the last position of the array.

slide-21
SLIDE 21

Page 21

VI, March 2005

Perl statement modifiers

Any simple statement may optionally be followed by a SINGLE modifier, just before the terminating semicolon (or block ending). The possible modifiers are: if (EXPR) { } unless (EXPR) { } while (EXPR ) { } until (EXPR ) { } foreach (LIST ) { } The EXPR following the modifier is referred to as the "condition". Its truth or falsehood determines how the modifier will behave. if executes the statement once if and only if the condition is true . unless is the opposite, it executes the statement if the condition is false (unless the condition is true). The foreach modifier is an iterator: it executes the statement once for each item in the LIST (with $_ aliased to each item in turn). while repeats the statement while the condition is true. until does the opposite, it repeats the statement until the condition is true (or while the condition is false): The while and until modifiers have the usual "while loop" semantics (conditional evaluated first).

slide-22
SLIDE 22

Page 22

VI, March 2005

Perl statement modifiers

if / if else / if elsif else

#!/usr/local/bin/perl use strict; use warnings; print "\nEnter your name (then press \"return\" when done):\t"; #get information from the terminal window my $name = <STDIN>; #remove trailing "\n" if any chomp $name; if ($name eq "Couchepin") { print "Hello Mr President !\n" ; } else { print "Hello $name !\n" ; } exit ;

slide-23
SLIDE 23

Page 23

VI, March 2005

Perl statement modifiers

if / if else / if elsif else (name.pl) :

#!/usr/local/bin/perl use strict; use warnings; print "\nEnter your name (then press \"return\" when done):\t"; #get information from the terminal window my $name = <STDIN>; #remove trailing "\n" if any chomp $name; if ($name eq "Couchepin") { print "Hello Mr President !\n" ; } elsif ($name eq "Falquet") { print "Good day to you Master $name !\n" ; } else { print "Hello $name !\n" ; } exit ;

slide-24
SLIDE 24

Page 24

VI, March 2005

Perl statement modifiers

Perl looping the for/foreach loop :

"Passing an array":

foreach my $element ( @array ) { # do something with the element }

"Passing a hash":

foreach my $key (keys %hash) { print "The value of $key is $hash{$key}\n"; }

"specify 3 EXPR inside the (): initial state, condition and loop expression": for ($i = 0; $i <= 10; $i=$i+1 ) {

#execute the contents of the block as long as $i is less than, or equal to 10 or while $i is smaller than 10 }

slide-25
SLIDE 25

Page 25

VI, March 2005

Perl statement modifiers

Perl looping the for/foreach loop :

#!/usr/local/bin/perl use strict; use warnings; my @names = ("Pedro", "Claire", "Yemima", "Fabien" , "RochPhilippe", "Francisco", "Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela", "Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam", "Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane", "Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh"); foreach my $name (@names) { print "Hello $name !\n"; } exit ;

slide-26
SLIDE 26

Page 26

VI, March 2005

Perl statement modifiers

Perl looping the for/foreach loop :

#!/usr/local/bin/perl use strict; use warnings; my $counter; for ($counter=1;$counter<=10;$counter++){ print "I can count up to $counter !\n"; } exit ;

slide-27
SLIDE 27

Page 27

VI, March 2005

Perl statement modifiers

Perl looping the while loop

while ( condition ) { #execute the contents of the block }

ATTENTION: Infinite Loop !!!

while (1) { #execute the contents of the block forever ! }

True/False

In Perl some variables are considered true:

  • integer with a nonzero value
  • string with nonzero length
  • array with at least one element
  • hash with at least one key/value pair

For example: $lang = "Perl"; # < true $version = 5.6; # < true $zero = 0; # < false $empty = ""; # < false @states = (); # < false %table = (1 => "one"); # < true

slide-28
SLIDE 28

Page 28

VI, March 2005

#!/usr/bin/perl use strict; use warnings; my $number = 1; while ($number<=10) { print "I can count up to $number !"; $number+=1; #Ha ! } exit ;

Perl statement modifiers

Perl looping the while loop

#!/usr/local/bin/perl use strict; use warnings; my $number = 1; while ($number<=10) { print "I can count up to $number !"; } exit ; #really ?

Tip: To stop a "looping" script press CTRL+C …

slide-29
SLIDE 29

Page 29

VI, March 2005

Perl statement modifiers

Perl looping while loop / do until while loop do until

"Activity" is executed at least once ! "Activity" may never be executed.

slide-30
SLIDE 30

Page 30

VI, March 2005

Perl operators

Perl operators

Arithmetic

+ addition

  • subtraction

* multiplication / division

Numeric comparison

== equality != inequality < less than > greater than <= less than or equal >= greater than or equal

String comparison

eq equality ne inequality lt less than gt greater than le less than or equal ge greater than or equal

Why do we have separate numeric and string comparisons?

Because we don't have special variable types, and Perl needs to know whether to sort numerically (where 99 is less than 100) or alphabetically (where 100 comes before 99).

slide-31
SLIDE 31

Page 31

VI, March 2005

Perl operators

Perl operators

#!/usr/local/bin/perl use strict; use warnings; my $x = 100; my $y = 99; if ($x > $y) { print "\"$x\" is numerically greater than \"$y\"\n" ; } else { print "\"$x\" is numerically smaller than \"$y\"\n" ; } if ($x gt $y) { print "\"$x\" is alphabetically greater than \"$y\"\n" ; } else { print "\"$x\" is alphabetically smaller than \"$y\"\n" ; } exit ;

Output: "100" is numerically greater than "99" "100" is alphabetically smaller than "99"

slide-32
SLIDE 32

Page 32

VI, March 2005

Perl operators

Perl operators

Boolean logic

&& and ||

  • r

! not

Miscellaneous

= assignment . string concatenation x string multiplication .. range operator (creates a list of numbers)

Many operators can be combined with a "=" as follows:

$a += 1; # same as $a = $a + 1 #same as $a++ $a -= 1; # same as $a = $a - 1 #same as $a-- $a .= "\n"; # same as $a = $a. "\n";

slide-33
SLIDE 33

Page 33

VI, March 2005

Perl functions

Functions in Perl are called subroutines

Functions are useful to avoid typing redundant code over and over. Functions help in the clarity of scripts. There are already many available functions in Perl:

http://searchcpanorg/~nwclark/perl-5.8.6/pod/perlfunc.pod

syntax of Perl subroutines: sub (list of arguments) { list of statements to execute return some value }

slide-34
SLIDE 34

Page 34

VI, March 2005

Perl functions

#!/usr/local/bin/perl use strict; use warnings; my $height = 220; my $weight = 120; #to calculate the BFI you need the heigth in cm and the weight in kg my $bfi = &cal($height, $weight); print "$bfi\n"; exit; sub cal { if (@_ != 2) { die "&cal should get exactly two arguments!\n" ; } my ($cm, $kg) = @_ ; my $index = ($kg)/(($cm / 100)*($cm / 100)); return $index; }

Output: 24.7933884297521 Notice on Body Fat Index (BFI): BFI <20 => weight is too low 20 < BFI < 25 => weight is correct BFI > 25 => Oups !

slide-35
SLIDE 35

Page 35

VI, March 2005

Perl functions

#!/usr/local/bin/perl use strict; use warnings; my @names = ("Pedro", "Claire", "Yemima", "Fabien", "Uta"); foreach (@names) { my $size = length($_); print "*"x($size+2)"\n"; print "*$_*\n"; print "*"x($size+2)"\n"; } exit ;

Output:

******* *Pedro* ******* ******** *Claire* ******** ******** *Yemima* ******** ******** *Fabien* ******** ***** *Uta* ***** my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Uta"); my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique"); my @names3 = ("Lionel", "Michael", "Charlotte", "Subhash", "Adam"); my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Viviane"); my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh");

What if you need this "pretty print" more than once ?

slide-36
SLIDE 36

Page 36

VI, March 2005

Perl functions

#!/usr/local/bin/perl use strict; use warnings; my @names1 = ("Pedro", "Claire", "Yemima", "Fabien" ,"Francisco"); my @names2 = ("Sandra Yukie", "Simona", "Christophe", "Dominique", "Michaela"); my @names3 = ("Lionel", "Gabriele", "Michael", "Charlotte", "Subhash", "Adam"); my @names4 = ("Sebastian", "Tu", "Sergey", "Olusegun", "Joel", "Uta", "Viviane"); my @names5 = ("Stanislav", "Kyrill", "Petr", "Sebastien", "Haleh"); &pretty_print(@names1); &pretty_print(@names2); &pretty_print(@names3); &pretty_print(@names4); &pretty_print(@names5); exit ; sub pretty_print { foreach (@_) { my $size = length($_); print '*'x($size+2),"\n"; print "*$_*\n"; print '*'x($size+2),"\n"; } }

slide-37
SLIDE 37

Page 37

VI, March 2005

Perl File handles

A "file handle" is a connection between your Perl script and the outside world.

You can open a file for input or output using the open() function.

  • pen(INFILE, "input.txt") or die "Can't open input.txt: $!";
  • pen(OUTFILE, ">output.txt") or die "Can't open output.txt: $!";
  • pen(LOGFILE, ">>logfile") or die "Can't open logfile: $!";

print() can also take an optional first argument specifying which filehandle to print to:

print STDERR "This is your final warning\n"; print OUTFILE $record; print LOGFILE $logmessage;

use whatever name you like BUT: STDIN, STDOUT, STDERR !

slide-38
SLIDE 38

Page 38

VI, March 2005

Perl File handles

Perl special file handles

There are three connections that always exist and are always "open" when your program starts: STDIN, STDOUT, and STDERR. Actually, these names are file handles. File handles are variables used to manipulate files. STDIN reads from standard input which is usually the keyboard in normal Perl script (or input from a Browser in a CGI script. Cgi-lib.pl reads from this automatically.) STDOUT (Standard Output) and STDERR (Standard Error) by default write to a console (or a browser in CGI). We have been using the STDOUT file handle without knowing it for every print() statement during this presentation. The print() function uses STDOUT as the default if no

  • ther file handle is specified.
slide-39
SLIDE 39

Page 39

VI, March 2005

Perl File handles

You can read from an open filehandle using the "<>" operator.

In scalar context it reads a single line (or a single record) from the filehandle, and in list context it reads the whole file in, assigning each line to an element of the list:

my $line = <INFILE>; my @lines = <INFILE>;

Reading in the whole file at one time is called slurping. It can be useful but it may be a memory

  • hog. Most text file processing can be done a line at a time with Perl's looping constructs.

The "<>" operator is most often seen in a while loop:

while <INFILE> { # assigns each line in turn to $_ print "Just read in this line: $_"; }

When you're done with your filehandles, you should close() them (though Perl will clean up after you if you forget…):

close INFILE;

You can modify the regular record separator "\n" by something else: $/= "\/\/\n"; for a file containing SwissProt entries or $/=">"; for a fasta file)

slide-40
SLIDE 40

Page 40

VI, March 2005

Perl regular expressions

Idea: powerful way to search for text patterns …

>sw:THIO_RAT/110 VKLIESKEAFQEALAAAGDKLVVVDFSATWCGPCKMIKPFFHSLCDKY …… >te:CB530525/66168 VKQIESKYAFQEALNSAGEKLVVVDFSATWCGPCKMIKPFFHSLSEKY …… >tr:Q5R9M3_PONPY/210 VKQIESKTAFQEALDAAGDKLVVVDFSATWCGPCKMIKPFFHSLSEKY …… >tg:NT039170_956/56151 VKLIESKEAFQEALAAERDKLVMVDFSATWCGPCKMIKPFFHSSCDKY …… >te:CV502349/88193 VSLITTKESWDQKLAEAKKegKIVIANFSASWCGPCRMISPFYCELKY …… >sw:TRXL2_ARATH/98174 ITSAEQFLNALKDAGDRLVIVDFYGTWCGSCRAMFPKLCKFGHTAKEH …… >te:OMY_1368_2/13111 ISSEEQWEEALSGPGLLVIEVYQRWCGPCKAVQNIFRKLRSHTHHTEY …… >te:CA246724/110160 SKATYDEQWAAhkSSGKLMVIDFSASWCGPCRFIEPAFKELTHTASRF …… >tr:Q84XR8_CHLRE/68169 ILTADTYHGFLEKNAEKLVVTDFYAVWCGPCKVIAPEIERTLANEMMT …… >tg:AL772421_11/578 KLVVIEFGASWCEPSRRIAPVFAEYAKKMNKDKNDHDKDGDKDGMKEF ……

slide-41
SLIDE 41

Page 41

VI, March 2005

Perl