Introduction to Perl Scott Hazelhurst - - PowerPoint PPT Presentation

introduction to perl
SMART_READER_LITE
LIVE PREVIEW

Introduction to Perl Scott Hazelhurst - - PowerPoint PPT Presentation

Introduction to Perl Introduction to Perl Scott Hazelhurst http://www.bioinf.wits.ac.za/~scott/perl.pdf August 2013 Introduction to Perl Introduction and Motivation Introduction and Motivation Practical Extraction and Report Language


slide-1
SLIDE 1

Introduction to Perl

Introduction to Perl

Scott Hazelhurst http://www.bioinf.wits.ac.za/~scott/perl.pdf August 2013

slide-2
SLIDE 2

Introduction to Perl Introduction and Motivation

Introduction and Motivation

Practical Extraction and Report Language

◮ General language – Intended for systems

programming, scripting

◮ Popular with systems programmers & Web

developers

◮ Relatively new language (ca. 1990) ◮ Big language – support for concurrency and OO. ◮ Portable.

slide-3
SLIDE 3

Introduction to Perl Introduction and Motivation

Perl powerful and flexible language:

◮ Many devotees ◮ Many criticisms of the language

Easy to write difficult-to-understand code.

◮ Focus on writing readable code (for yourself,

  • thers)
slide-4
SLIDE 4

Introduction to Perl Introduction and Motivation

Objectives

◮ Describe basic features of Perl ◮ Write simple Perl programs ◮ Use basic matching facilities

Lots of resources:

◮ Books ◮ http://cpan.mirror.ac.za/,

www.cpan.org, www.perl.org

◮ perldoc, info perl

Assume: knowledge of programming, C-like language

slide-5
SLIDE 5

Introduction to Perl Perl’s Data Structures

Perl’s Data Structures I

Perl is an imperative language.

◮ State is represented by a set of variables. ◮ Program is an ordered sequence of commands. ◮ Computation is accomplished by the execution

  • f these commands in the specified order

Will explore Perl’s OO features later.

slide-6
SLIDE 6

Introduction to Perl Perl’s Data Structures

Flexible (perhaps too flexible) language.

◮ Scalars: numbers, strings ◮ Arrays, lists ◮ Hash tables ◮ References

Variables have a special prefatory character to indicate to which genre of variable (%, $, @, . . . )

slide-7
SLIDE 7

Introduction to Perl Perl’s Data Structures

Feature Warning

◮ Variables automatically declared ◮ Variables given default values ◮ Implicit type coercion common ◮ Meaning dependent on context

Good idea to use the “ use strict; use warnings” pragmas

slide-8
SLIDE 8

Introduction to Perl Perl’s Data Structures Scalar types

Scalar types

All scalar variables have a $ as prefix: e.g. $x, $year$, $etc. $c= 10; $f = 9/5* $c +32; $cname = ’Introduction to Perl ’; print "Course $cname: size $c\n";

slide-9
SLIDE 9

Introduction to Perl Perl’s Data Structures String operations

Interpolation

◮ Inside a double-quoted string, Perl does variable

interpolation, and interprets escape characters print "Course $cname size \t $f\n;";

slide-10
SLIDE 10

Introduction to Perl Perl’s Data Structures String operations

Interpolation

◮ Inside a double-quoted string, Perl does variable

interpolation, and interprets escape characters print "Course $cname size \t $f\n;";

◮ Not done inside single-quoted strings.

print ’Course $cname size \t $f\n’;

slide-11
SLIDE 11

Introduction to Perl Perl’s Data Structures String operations

Other string operations

Concatenation: . $fruits= ’apples and pears ’; $base = ’pies ’; $full = $fruits . ’ make ’ . $base; $full .= ’: ’ . fruits;

◮ Length of string: length. Substrings: substr ◮ Repeat operator: x ◮ Powerful & flexible string matching and processing –

regular expressions.

◮ Type conversion to and from integers as necessary !!?!!

slide-12
SLIDE 12

Introduction to Perl Perl’s Data Structures Artithmetic operations

Arithmetic Operations

The basic Perl arithmetic operations are: +, -, *, /, %, **.

◮ Default: floating point arithmetic ◮ There are many arithmetic procedures: int,

sqrt,. . . Short-cuts: Perl has many C-like features. eg. $x = 3; $x += $y+1; $x++;

  • -$y;
slide-13
SLIDE 13

Introduction to Perl Perl’s Data Structures Truth and Logical operations

Truth and Logical operations

  • 1. Any string is true except for the empty string

and ”0”;

  • 2. Any number is true except for 0
  • 3. Any reference is true
  • 4. Any undefined value is false
slide-14
SLIDE 14

Introduction to Perl Perl’s Data Structures Truth and Logical operations

Logical operators

High binding Low binding && and ||

  • r

! not Short-circuit evaluation! Logical expressions returns the last value evaluated. e.g. 2 or 3 and 5 2 and (0 or 5) $a ||= 2

slide-15
SLIDE 15

Introduction to Perl Perl’s Data Structures Truth and Logical operations

Relational operators

Comparison Numeric String Equal, Not equal ==, != eq, ne Less than (equal) <, <= lt, le Greater than (equal) >, >= gt, ge Comparison <=> cmp Smart matching ~~ ~~

slide-16
SLIDE 16

Introduction to Perl Perl’s Data Structures Truth and Logical operations

File operators

Example Name Example Name

  • e $a

File Exists

  • T $a

File is a text file

  • r $a

Readable file

  • w $a

Writable file

  • d $a

File is a directory

  • f $a

Regular file

slide-17
SLIDE 17

Introduction to Perl Standard input and output

Standard input and output

Default: file handle STDIN associated with keyboard (input), STDOUT with console output.

◮ Input is line oriented.

slide-18
SLIDE 18

Introduction to Perl Standard input and output

Output print command by default sends output to STDOUT. print "Hello"; same as print STDOUT "Hello";

slide-19
SLIDE 19

Introduction to Perl Standard input and output

Input A reference to <STDIN> waits for input from console.

◮ data typed in is returned

print "Enter the temp in C: "; $c = <STDIN>; print "Temperature in F is ". (9/5*$c+32);

slide-20
SLIDE 20

Introduction to Perl Control Structures Making decisions

Control structures

Conditional

◮ if/else/elsif:

if ($x < $y) {$x=1} else {$y=1};

◮ unless: unless ($ok) { die "Error in file I/O"; } ◮ no explicit switch/case ◮ implicit through use of short-circuit

(-e "f.dat") or ($numf++)

◮ ternary conditional operator —

cond ? ex1 : ex2

slide-21
SLIDE 21

Introduction to Perl Control Structures Making decisions

Example unless (-e "data.dat") { die "File data.dat not found" };

slide-22
SLIDE 22

Introduction to Perl Control Structures Making decisions

Example $c = $a cmp $b; if ($c ==0) { print "The strings are the same .\n } elsif ($c <0) { print "$a comes before $b\n"; } else { print "$b comes before $a\n"; }

slide-23
SLIDE 23

Introduction to Perl Control Structures Loops

Loops

◮ while: while (cond} { ... } ◮ for statement: C-style:

for (init; condition; change) { ... }

◮ foreach

slide-24
SLIDE 24

Introduction to Perl Control Structures Loops

Example while loop

Read in a list of numbers terminated by 0. Compute the sum. $sum =0; $num=<STDIN >; while ($num != 0) { $sum=$sum+$num; $num=<STDIN >; } print "Sum is $sum\n";

slide-25
SLIDE 25

Introduction to Perl Control Structures Loops

use strict; my $i; my $j; for ($i=1; $i <11; $i++) { for ($j=0; $j <$i; $j++) { print "*"; } print "\n"; }

slide-26
SLIDE 26

Introduction to Perl Control Structures Loops

use strict; my $i; for ($i=1; $i <11; $i++) { print "*" x $i; print "\n"; }

slide-27
SLIDE 27

Introduction to Perl Control Structures Other flow of control. . .

Other flow of control. . .

◮ next, last continue and break ◮ scalar range operator: ..

don’t use unless you’re an expert.

◮ goto

don’t use

slide-28
SLIDE 28

Introduction to Perl Simple process control

Simple process control

Can execute Unix commands or executables using the backtick operator ‘ $today = ‘date "+%C%y%m%d"‘ ; print "Today is $today\n";

slide-29
SLIDE 29

Introduction to Perl Simple process control

Other system calls

system Similar to backtick but returns return code $x = system("ls"); print "Returned <$x >\n"; exec Similar to system, but does not wait for completion.

slide-30
SLIDE 30

Introduction to Perl File operations

File operations

To read data from a file or write it to a file, use a file handle.

◮ Need to associate file handle with external file. ◮ File can be virtual – could be a pipe.

slide-31
SLIDE 31

Introduction to Perl File operations Files in general

Files in General

File handle is the Perl construct used to manipulate files

◮ Open it – associate with external file or pipe ◮ Use it ◮ Close it

slide-32
SLIDE 32

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

slide-33
SLIDE 33

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

slide-34
SLIDE 34

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

slide-35
SLIDE 35

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

◮ open(DATAF, ">>figs.dat")

append

slide-36
SLIDE 36

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

◮ open(DATAF, ">>figs.dat")

append

◮ open(DATAF, "| output-pipe-cmd");

set up output filter

slide-37
SLIDE 37

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

◮ open(DATAF, ">>figs.dat")

append

◮ open(DATAF, "| output-pipe-cmd");

set up output filter

◮ open(DATAF, "input-pipe-cmd| "); set up

input filter

slide-38
SLIDE 38

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

◮ open(DATAF, ">>figs.dat")

append

◮ open(DATAF, "| output-pipe-cmd");

set up output filter

◮ open(DATAF, "input-pipe-cmd| "); set up

input filter

slide-39
SLIDE 39

Introduction to Perl File operations Files in general

◮ open(DATAF, "figs.dat")

read

◮ open(DATAF, "<figs.dat")

read

◮ open(DATAF, ">figs.dat")

write

◮ open(DATAF, ">>figs.dat")

append

◮ open(DATAF, "| output-pipe-cmd");

set up output filter

◮ open(DATAF, "input-pipe-cmd| "); set up

input filter To print to a file that is open for writing, use the related file handle in the print statement.

slide-40
SLIDE 40

Introduction to Perl File operations Input from files

Input from files

Suppose that INP is a file handle. A reference to <INP> does the following:

◮ Returns the next line of input; ◮ Consumes the input – in the case of a file,

advances the file pointer to the next line of the file.

◮ NB: end-of-line marker read in.

slide-41
SLIDE 41

Introduction to Perl File operations Input from files

for($i=0; $i <3; $i++) { $x=<INP >; print "**$x##" } Assuming that file contains apple, banana, cherry.

slide-42
SLIDE 42

Introduction to Perl File operations Input from files

for($i=0; $i <3; $i++) { $x=<INP >; print "**$x##" } Assuming that file contains apple, banana, cherry. **apple ##**banana ##**cherry ##

slide-43
SLIDE 43

Introduction to Perl File operations Input from files

To add up the numbers in a file with three numbers.

  • pen(INP ,"nums.txt");

$x1 = <INP >; $x2 = <INP >; $x3 = <INP >; print "Answer is " . $x1 + $x2 + $x3;

  • r
  • pen(INP ,"nums.txt");

print "Answer is " . (<INP > + <INP > + <INP >);

slide-44
SLIDE 44

Introduction to Perl File operations Input from files

What’s the difference between

print "Answer is " . (<INP > + <INP > + <INP >); and print "Answer is . (<INP > + <INP > + <INP >) ";

slide-45
SLIDE 45

Introduction to Perl File operations Input from files

Add up odd numbers in file

print "Enter the file name: "; $namef = <STDIN >;

  • pen(DATAF , $namef );

while($x = <DATAF >) { if ($x % 2) { $sum = $sum + $x; } } close(DATAF ); print "The sum is $sum\n";

slide-46
SLIDE 46

Introduction to Perl File operations Input from files

A common Perl idiom is

  • pen(DATAF ,$namef) or die "Can’t open
slide-47
SLIDE 47

Introduction to Perl File operations Implicit operands

Implicit Operands

Perl uses implicit operands extensively.

◮ We are told that this is a feature. ◮ The main culprit: $_ or $ARG

Set (among other places) by reference to a file

  • handle. Many operators use $ARG if not given

explicit operator explicitly.

◮ The following prints out the contents of a file:

while (<DATAF>) {print};

slide-48
SLIDE 48

Introduction to Perl File operations Implicit operands

Exercise

Write a Perl program that reads integers from a file called nums.dat and counts how many numbers are

◮ less than zero ◮ between zero and 10 ◮ greater than or equal to 11

If there is no such file, your program should print an error message and halt.

slide-49
SLIDE 49

Introduction to Perl Arrays/lists

Arrays/lists

Array variables are prefixed by @ my @days; @days = ("Sun", "Mon", "Tue", "Wed", "Thu", "F

◮ Arrays indexed from 0

slide-50
SLIDE 50

Introduction to Perl Arrays/lists

Arrays/lists

Array variables are prefixed by @ my @days; @days = ("Sun", "Mon", "Tue", "Wed", "Thu", "F

◮ Arrays indexed from 0 ◮ Individual element are scalars: so $days[0]

slide-51
SLIDE 51

Introduction to Perl Arrays/lists

Common to use foreach

@days = ("Sun", "Mon", "Tue", "Wed", " foreach $d (@days) { print "Day $d\n"; }

slide-52
SLIDE 52

Introduction to Perl Arrays/lists

Quote words

Short hand for arrays of words @days = qw(Sun Mon Tue Wed Thu Fri Sat

slide-53
SLIDE 53

Introduction to Perl Arrays/lists

@ARGV – program’s arguments

@ARGV: Predefined variable array

◮ Contains the program’s arguments – values

passed by the caller.

◮ If the program is run as follows:

./example.pl apple pear 1 2 3 array @ARGV contains those values

◮ $ARGV[0] is apple; ◮ $ARGV[3] is 2;

slide-54
SLIDE 54

Introduction to Perl Arrays/lists

Array semantics

Expressions are evaluated in scalar or list context.

◮ Similar or identical expressions can evaluate to

different things in different contexts.

slide-55
SLIDE 55

Introduction to Perl Arrays/lists

Array semantics

Expressions are evaluated in scalar or list context.

◮ Similar or identical expressions can evaluate to

different things in different contexts. Context determined by operation:

◮ LHS of assignment determines context ◮ Operators or functions may determine context

slide-56
SLIDE 56

Introduction to Perl Arrays/lists

Array context

◮ print @days yields SunMonTueWedThuFriSat ◮ @weekdays = @days[1..5] array $weekdays set to

Mon .. Fri

◮ @days[1] is a one element array

slide-57
SLIDE 57

Introduction to Perl Arrays/lists

Array context

◮ print @days yields SunMonTueWedThuFriSat ◮ @weekdays = @days[1..5] array $weekdays set to

Mon .. Fri

◮ @days[1] is a one element array

Scalar context

◮ $numdays = @days; sets $numdays to 7

slide-58
SLIDE 58

Introduction to Perl Arrays/lists

Array context

◮ print @days yields SunMonTueWedThuFriSat ◮ @weekdays = @days[1..5] array $weekdays set to

Mon .. Fri

◮ @days[1] is a one element array

Scalar context

◮ $numdays = @days; sets $numdays to 7 ◮

$y = ("Sun","Mon","Tue","Wed","Thu","Fri","Sat") sets $y to Sat

slide-59
SLIDE 59

Introduction to Perl Arrays/lists

Array context

◮ print @days yields SunMonTueWedThuFriSat ◮ @weekdays = @days[1..5] array $weekdays set to

Mon .. Fri

◮ @days[1] is a one element array

Scalar context

◮ $numdays = @days; sets $numdays to 7 ◮

$y = ("Sun","Mon","Tue","Wed","Thu","Fri","Sat") sets $y to Sat

◮ $weekdays = $days[1..5] sets $weekdays to 5 ◮ $#days is 6 – the highest index of @days

slide-60
SLIDE 60

Introduction to Perl Arrays/lists

Useful functions

◮ push : adds something to the right of the array ◮ pop : removes the rightmost element of the

array.

◮ shift/unshift ◮ sort: returns a sorted list

by default string order ascending, but can be changed.

slide-61
SLIDE 61

Introduction to Perl Arrays/lists

Arrays are inherently 1D

◮ Need references to handle multi-dimensional

arrays

slide-62
SLIDE 62

Introduction to Perl Arrays/lists Asides

Asides on strings

◮ splitting

@x = split ";", "1;2;3"; foreach $s (@x) { print "$s\n"; }

◮ chop, chomp. ◮ see regexes later

slide-63
SLIDE 63

Introduction to Perl Hash tables

Hash tables

Unordered set of scalars which allows fast information retrieval:

◮ Elements accessed/indexed by a string value

associated with it

◮ Variables prefixed by a %, e.g. %currency ◮ Individual elements are scalar:

$currency["South Africa"]

slide-64
SLIDE 64

Introduction to Perl Hash tables

$currency["South Africa"]="ZAR"; $currency["Britain"] = "GBP";

slide-65
SLIDE 65

Introduction to Perl Hash tables

Example

%day2num = ("Sun",0, "Mon",1, "Tue",2, "Wed",3, "Thu",4, "Fri",5, "Sat",6) Better is %day2num= ("Sun"=>0, "Mon"=>1, "Tue"=>2, "Wed"=>3, "Thu"=>4, "Fri"=>5, "Sat"=>6) Referred to as $day2num{"Wed"}

slide-66
SLIDE 66

Introduction to Perl Hash tables

Useful hash operations

◮ keys returns list of keys in hash table.

Order non-deterministic.

slide-67
SLIDE 67

Introduction to Perl Hash tables

Useful hash operations

◮ keys returns list of keys in hash table.

Order non-deterministic.

◮ values returns list values stored in the hash

  • table. Order of no significance.
slide-68
SLIDE 68

Introduction to Perl Hash tables

Useful hash operations

◮ keys returns list of keys in hash table.

Order non-deterministic.

◮ values returns list values stored in the hash

  • table. Order of no significance.

◮ sort keys %table lists the keys in

lexicographic order

slide-69
SLIDE 69

Introduction to Perl Hash tables

Example

# read in from file while ($name = <INP>) { $reg = <INP>; $carreg{$name} = $reg; } # print out in order of name foreach $n (sort keys %carreg) { print "Car reg of $n is $carreg{$n} \n"; }

slide-70
SLIDE 70

Introduction to Perl Hash tables

Can also tell sort how to sort:

◮ sort {$a <=> $b} list sorts the list in

numeric order.

sort {$table{$a} cmp $table{$b}} keys %table sorts the keys so that corresponding hash table elements are in order

slide-71
SLIDE 71

Introduction to Perl Hash tables

#print out in order of car reg foreach $n (sort {$carreg{$a} cmp $carreg{$b}} keys %carreg) { print "$carreg{$n} owned by $n\n"; }

slide-72
SLIDE 72

Introduction to Perl Procedures

Procedures

Perl has procedures but its parameter passing mechanism is poor.

◮ Suppose we have a procedure plus that adds

up two numbers. Called: $x = &plus($a, $b); where $a and $b are the parameters. Call by value semantics

slide-73
SLIDE 73

Introduction to Perl Procedures

Declaring a subroutine

To declare a subroutine: sub NAME BLOCK e.g. sub printhead { print "Name Age Number Balance\n"; } ... &printhead();

slide-74
SLIDE 74

Introduction to Perl Procedures

Parameters

Procedures have one formal parameter: @_

◮ The values of the actual parameter are given to

the formal parameter (call-by-value).

◮ @_: a list local to the procedure.

sub plus { sub plus { $x = $_[0]; my ($x,$y) = @_; $y = $_[1]; return $x+$y; return $x + $y; } }

slide-75
SLIDE 75

Introduction to Perl Procedures

A common idiom uses shift sub plus { $x = shift @_; $y = shift @_; return $x + $y; }

slide-76
SLIDE 76

Introduction to Perl Procedures

Example

A procedure to add up a list: sub addlist {$sum = 0; foreach $num (@_) {$sum += $num}; return $sum;} sub proclist { my ($f1,$f2,@nums) = @_; foreach $n (@nums) {$sum = $sum+$n; } return ($f1*$f2*$sum); } $x = &addlist(1,2,3,5,6,9); $y = &proclist(3,4,1,1,0,2);

slide-77
SLIDE 77

Introduction to Perl Procedures

The following does not work (s sub dotprod { (@a,@b) = @_; ... ... } @x = (1,2,3); @y = (4,5,6); &dotprod(@x,@y);

slide-78
SLIDE 78

Introduction to Perl Procedures

◮ Formal parameter is an unstructured list (possible danger

in passing multiple lists as parameters)

◮ Forward declarations: sub addlist;

Telling Perl that addlist is a procedure which will be declared elsewhere.

◮ Can have anonymous procedures that get assigned to

variables.

◮ Weakness in parameter system can be overcome by using

references.

◮ Default: all variables are global wherever defined. This is

even recognised by the Perl community to be a Bad Thing Declare variables local inside procedures. Best way: prefix first use of variable in a procedure with with my. Lexical scoping.

slide-79
SLIDE 79

Introduction to Perl References

References

References are mechanisms for a variable to refer to something, e.g. (1) another variable; (2) a piece of data; (3); a function Symbolic references Can turn a string into a variable $x = 123; $y = "x"; $$y = 5; print $x;

slide-80
SLIDE 80

Introduction to Perl References

Hard references

Can use the \ operator to dereference a variable, and -> to dereference. @a = (10,20,30,40,50); $x = \@a; for $e ( @{$x} ) { print "$e\n"; } print $x->[0];

slide-81
SLIDE 81

Introduction to Perl References

Call by reference

sub count { my ($fname , $rcount) = @_; chomp $fname; unless (-T $fname) { return; }

  • pen(FINP ,$fname );

while (<FINP >) { $$rcount ++ } close(FINP ); }

slide-82
SLIDE 82

Introduction to Perl References

@files = ‘ls -1 $ARGV [0]‘; $n = 0; for $f (@files) { &count($f ,\$n); } print "Total number of lines is $n\n";

slide-83
SLIDE 83

Introduction to Perl References

Passing multiple arrays

sub dotprod { ($a,$b)=@_; .... } @x = (1,2,3); @y = (4,5,6); &dotprod(\@x,\@y); Note that only scalars are passed.

slide-84
SLIDE 84

Introduction to Perl References

Refs to anonymous arrays and hashes

Square brackets creates an array – returns a reference $a = [ 10, 20, 30, 40];

◮ Error: @a = [ 10, 20, 30, 40];

Curly braces creates a hash – returns a reference $day = {"sun"=>0, "mon"=>1, "tues"=> .....}

slide-85
SLIDE 85

Introduction to Perl Modules

Modules

A module is a collection of code, data structures that can be used as a library

◮ Often see it in OO programming

Magic word is use use IO;

◮ To access something is in a module use ::. So,

module::thing

◮ Modules can be nested.

slide-86
SLIDE 86

Introduction to Perl Modules

Example

use IO:: Compress :: Bzip2; IO:: Compress :: Bzip2 :: bzip2 ("do.pl", "do.pl.bz

slide-87
SLIDE 87

Introduction to Perl Modules

Example

use IO:: Compress :: Bzip2; IO:: Compress :: Bzip2 :: bzip2 ("do.pl", "do.pl.bz Can also import specific things use IO:: Compress :: Bzip2 qw(bzip2 $Bzip2Error ); unless (bzip2 ("do.plx", "do.pl.bz2")) { warn "Compression failed returning : $Bzip2 }

slide-88
SLIDE 88

Introduction to Perl Objects

Objects

Data structures which

◮ contain data, know functions that can apply to

them;

◮ anonymous ◮ accessed through references ◮ typically organised in classes

slide-89
SLIDE 89

Introduction to Perl Objects

use Bio::Seq; $seqio = Bio::SeqIO ->new(’-format ’ => ’embl ’ , -file $seqobj = $seqio ->next_seq (); $seqstr = $seqobj ->seq (); $seqstr = $seqobj ->subseq (10 ,50); @features = $seqobj -> get_SeqFeatures (); foreach my $feat ( @features ) { print "Feature ",$feat ->primary_tag , " starts ",$feat ->start , " ends ", $feat ->end ," strand ",$feat ->strand ,"\n"; }

slide-90
SLIDE 90

Introduction to Perl Regular expressions

Regular expressions, matching, and more

Perl’s regular expression support powerful

◮ concise way of describing a set of strings.

Typical: a command uses a regular expression to process some argument.

◮ Example:

split uses a regex to split a string

$line = "Gauteng;Johannesburg GP 7"; @info = split("[ ;]", $line); ($prov, $capital,$reg, $pop) = split("[ ;] ", $line);

slide-91
SLIDE 91

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ Most characters stand for themselves: F Fred 6312

slide-92
SLIDE 92

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ Most characters stand for themselves: F Fred 6312 ◮ \ | ( ) [ { ^ $ * + ? .

are metacharacters (have special meaning)

◮ escape with a backslash for the chars:

Fred stands for the string (Fred)

slide-93
SLIDE 93

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ Most characters stand for themselves: F Fred 6312 ◮ \ | ( ) [ { ^ $ * + ? .

are metacharacters (have special meaning)

◮ escape with a backslash for the chars:

Fred stands for the string (Fred)

◮ To group things together, use parentheses.

slide-94
SLIDE 94

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ Most characters stand for themselves: F Fred 6312 ◮ \ | ( ) [ { ^ $ * + ? .

are metacharacters (have special meaning)

◮ escape with a backslash for the chars:

Fred stands for the string (Fred)

◮ To group things together, use parentheses. ◮ To specify alternatives, use |

(green|red) apples stands for green apples or red apples

slide-95
SLIDE 95

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ Most characters stand for themselves: F Fred 6312 ◮ \ | ( ) [ { ^ $ * + ? .

are metacharacters (have special meaning)

◮ escape with a backslash for the chars:

Fred stands for the string (Fred)

◮ To group things together, use parentheses. ◮ To specify alternatives, use |

(green|red) apples stands for green apples or red apples

slide-96
SLIDE 96

Introduction to Perl Regular expressions

Specifying Regular Expressions

◮ A list of characters in square brackets matches any of the

characters.

◮ [YyNn] matches any of an upper or lower case “y” or

“n”.

◮ [A-Za-z0-9] is all the alphanumeric characters

slide-97
SLIDE 97

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace;

slide-98
SLIDE 98

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit;

slide-99
SLIDE 99

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit; ◮ \w a word charater, \W a non-word character

slide-100
SLIDE 100

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit; ◮ \w a word charater, \W a non-word character ◮ . anything but a \n

slide-101
SLIDE 101

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit; ◮ \w a word charater, \W a non-word character ◮ . anything but a \n ◮ m{3} stands for mmm

(map){2,3} stands for mapmap or mapmapmap

◮ m* stands for 0 or more m’s

slide-102
SLIDE 102

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit; ◮ \w a word charater, \W a non-word character ◮ . anything but a \n ◮ m{3} stands for mmm

(map){2,3} stands for mapmap or mapmapmap

◮ m* stands for 0 or more m’s ◮ m+ stands for 1 or more m’s

slide-103
SLIDE 103

Introduction to Perl Regular expressions

◮ \n new line; \t tab; \s a whitespace; ◮ \d digit; \D non-digit; ◮ \w a word charater, \W a non-word character ◮ . anything but a \n ◮ m{3} stands for mmm

(map){2,3} stands for mapmap or mapmapmap

◮ m* stands for 0 or more m’s ◮ m+ stands for 1 or more m’s ◮ Many others. Rules long and complex.

slide-104
SLIDE 104

Introduction to Perl Regular expressions

Processing with regular expressions

Various commands that allow

◮ finding a pattern matching a regular expression in a string; ◮ extracting out the regular expression; ◮ substituting; ◮ or other modification

e.g. matching, substitution, translation, substitution.

slide-105
SLIDE 105

Introduction to Perl Regular expressions

Splitting up input

split(pat,string) Takes the input string and splits the string wherever the pattern pat occurs.

slide-106
SLIDE 106

Introduction to Perl Regular expressions

Splitting up input

split(pat,string) Takes the input string and splits the string wherever the pattern pat occurs.

◮ returns a list of strings as a result ◮ can choose whether the split pattern is part of the

returned string or not

slide-107
SLIDE 107

Introduction to Perl Regular expressions

Splitting up input

split(pat,string) Takes the input string and splits the string wherever the pattern pat occurs.

◮ returns a list of strings as a result ◮ can choose whether the split pattern is part of the

returned string or not $x = split /\*|::/, $c processes the string $c, splits it wherever a * or :: appear, and returns the split list. So, if $c="Dat: 23 * Mon: 11; LgA -632 ::: LgB -217* a", $x= ?

slide-108
SLIDE 108

Introduction to Perl Regular expressions

Matching

◮ m/regexpr/ returns true if the regular expression matches

$_.

slide-109
SLIDE 109

Introduction to Perl Regular expressions

Matching

◮ m/regexpr/ returns true if the regular expression matches

$_. The m is optional in most cases (unless slash in string) m-/apple-

slide-110
SLIDE 110

Introduction to Perl Regular expressions

Matching

◮ m/regexpr/ returns true if the regular expression matches

$_. The m is optional in most cases (unless slash in string) m-/apple-

◮ To match a particular string use the binding operator: =~

$inp =~ m/(Pascal)|(\WC\W)|(C\+\+)/;

slide-111
SLIDE 111

Introduction to Perl Regular expressions

Matching

◮ m/regexpr/ returns true if the regular expression matches

$_. The m is optional in most cases (unless slash in string) m-/apple-

◮ To match a particular string use the binding operator: =~

$inp =~ m/(Pascal)|(\WC\W)|(C\+\+)/;

◮ A number of modifiers available: i (case insensitive) g

(global) Count number of matches in a string: while ($inp =~ m/(Pascal)|(\WC\W)|(C\+\+)/g) { $wrd++ }

slide-112
SLIDE 112

Introduction to Perl Regular expressions

Extracting patterns

* Can refer to and extract out subexpressions matched $1, $2, . . .

◮ To find a repeat m/\b(\w+)\s+\1/ ◮ To extract out

while ($line = <INP>) { $line =~ m/(\w+)\s+(\w+)/; $name = $1; $reg = $2; $carreg{$name}=$reg; } Look for repeats:

slide-113
SLIDE 113

Introduction to Perl Regular expressions

Substitution

s/regexprA/regexprB/ substitutes regular expression A by regular expression B. Default string processed is $ . s/Apple/Orange/; Replaces the first occurrence of Apple in $ with Orange Similar modifiers to m: i, g,... Use the binder operator to choose the string chosen: $name = "Alan Turing"; $name =~ s/(\w)\w*/\1\./; $c = ($name =~ s/(\w)\w*/\1\./); $c = ($name =~ s/(\w)\w*/\1\./g);

slide-114
SLIDE 114

Introduction to Perl Regular expressions

Translation

Use the tr or y operator.

◮ Operates on strings, rather than regular expressions.

tr/range1/range2/ replaces any character in range1 and replaces it with the corresponding character in range2.

◮ $c =~ tr/a-z/A-Z/ change every character in $c to

uppercase.

◮ $cnt = $c =~ tr/a-z/a-z/ sets $cnt to the number of

lower case letters in $c.

slide-115
SLIDE 115

Introduction to Perl And more

And more

◮ maths functions ◮ more sophisticated I/O ◮ lots of built in variables: e.g. \$NR ◮ modules, OO ◮ IPC, networking ◮ process control