Object Oriented Programming (OOP) and introduction to BioPerl - - PDF document

object oriented programming oop
SMART_READER_LITE
LIVE PREVIEW

Object Oriented Programming (OOP) and introduction to BioPerl - - PDF document

Object Oriented Programming (OOP) and introduction to BioPerl Laurent Falquet (original course by Marco Pagni), Basel October, 2006 Swiss Institute of Bioinformatics Swiss EMBnet node Overview in 3 parts Motivation for OOP The key


slide-1
SLIDE 1

Object Oriented Programming (OOP)

and introduction to BioPerl

Laurent Falquet (original course by Marco Pagni), Basel October, 2006 Swiss Institute of Bioinformatics Swiss EMBnet node

LF Basel October 2006

Overview in 3 parts

Motivation for OOP The key concepts of OOP Using BioPerl objects

slide-2
SLIDE 2

LF Basel October 2006

Part I Motivation for OOP

LF Basel October 2006

Background

OOP didn't come out of

the blue. It has strong historical roots in other paradigms and practices.

It came about to address

problems commonly grouped together as the "software crisis".

The "software crisis"

manifests itself in

  • 1. cost overruns
  • 2. user dissatisfaction

w ith the final product

  • 3. buggy softw are
  • 4. brittle softw are
slide-3
SLIDE 3

LF Basel October 2006

Complexity

Software is inherently complex

because

w e attempt to solve

problems in complex domains

w e are forced by the size of

the problem to w ork in teams

softw are is incredibly

malleable building material

discrete systems are prone

to unpredictable behavior

softw are systems consist

  • f many pieces, many of

w hich communicate.

Some factors that impact on

and reflect complexity in software

The number of names

(variables, functions, etc) that are visible

Constraints on the time-

sequence of operations (real-time constraints)

Memory management

(garbage collection and address spaces)

Concurrency Event driven user

interfaces.

LF Basel October 2006

How do humans cope w ith complexity in everyday life?

Humans deal with

complexity by abstracting details away.

E.g., Surfing the Internet

doesn't require know ledge of internal processors registers; sufficient to think of a computer as simple visualization tool.

To be useful, an

abstraction (model) must be smaller than what it represents.

E.g., road map vs

photographs of terrain vs physical model.

slide-4
SLIDE 4

LF Basel October 2006

Exercise 1

Memorize as many numbers from the following sequence as you

  • can. I'll show them for 30 seconds. Now write them down.

1759376099873461324287593345108941120765934 How many did you remember? How many could you remember with unlimited amounts of time?

LF Basel October 2006

Exercise 2

Write down as many of the following telephone numbers

as you can:

Pizza: Friend 2: Friends 1: Fax: Post Office: Parents: Co-worker: Boss: Cellular: Home:

slide-5
SLIDE 5

LF Basel October 2006

Answ er to the Exercises 1 and 2

By abstracting the details

  • f the numbers away and

grouping them into a new concept (telephone number) we have increased our information handling capacity by nearly an order of magnitude!

Working with abstractions

lets us handle more information.

LF Basel October 2006

Exercise 3

How many of these (unrelated) concepts can you memorize

in 30 seconds?

slide-6
SLIDE 6

LF Basel October 2006

Answ er to the Exercises 3

Miller (Psychological

Review, vol 63(2)):

"The Magical

Magical Number Number Seven Seven, Plus or Minus Tw o: Some Limits on Our Capacity for Processing Information"

Working with abstractions

lets us handle more information (e.g. phone numbers), but we're still limited by Miller's

  • bservation.

What if you have more

than 7 things to juggle in your head simultaneously?

LF Basel October 2006

Hierarchy

A common strategy: form

a hierarchy to classify and

  • rder our abstractions.

Common examples are

military, large

companies, administration

Linnaeus’ classification

system of organism

EC numbers for

enzymatic reactions

UNIX file system.

slide-7
SLIDE 7

LF Basel October 2006

Decomposition

Divide and conquer is a

handy skill for many thorny life problems.

We want to compose a system

from small pieces, rather than build a large monolithic system, because the former can be made more reliable.

Failure in one part, if properly

designed, won't cause failure

  • f the whole. This depends on

the issue of coupling.

We can beat this grim view of

a system composed of many parts by properly decomposing and decoupling. Another reason is that we can divide up the work more easily.

LF Basel October 2006

Object technology

Nothing unique about

forming abstractions, but in OOP this is a main focus of activity and

  • rganization.

We can take advantage

  • f the natural human

tendency to anthropomorphise.

We'll call our

abstractions objects.

We'll put our abstractions

into a hierarchy to keep them organized and minimize redundancy.

Natural way to "divide

and conquer" the large state spaces we face (complexity).

slide-8
SLIDE 8

LF Basel October 2006

Part II The Key Concepts of OOP

LF Basel October 2006

Class and Instance

A class is a part of a

program that describes the properties of an object. These properties fall into two broad categories:

attributes - the data

associated to an object,

methods - the functions

  • r procedures or

subroutines that comes along w ith an object.

An instance of an object

is a member of a class which has received particular values to its attributes.

slide-9
SLIDE 9

LF Basel October 2006

Class and Instance example

A "square" class could

have size and color attributes and methods to alter them:

Two instances of the

square class may consist in a large blue square and a tiny red one: set_size, set_color, perimeter Methods size, color Attributes square Name

LF Basel October 2006

Encapsulation

The attributes of an object

usually receive some degree of privateness.

Private attributes are

not accessible from

  • utside the object.

Public attributes can be

directly accessed by any

  • ther objects.

The methods of an object

usually receive some degree of privateness.

Encapsulation: the

values of an object's attributes should only be altered by its ow n

  • methods. Private

attributes should alw ays be favored.

slide-10
SLIDE 10

LF Basel October 2006

Encapsulation

A "person" class could

have two attributes, name and credit card number for example.

Nobody w ant its credit

card number being public!

Encapsulation hides the

implementation away from the user.

One should be able to

drive a car w ithout know ing how the engine w orks…

LF Basel October 2006

Benefit of encapsulation

Encapsulation is a technique

for minimizing interdependencies among

  • bjects by defining a strict

external interface.

This way, internal coding can

be changed without affecting the interface, as long as the new implementation supports the same (or upwards compatible) external interface.

slide-11
SLIDE 11

LF Basel October 2006

Methods

One could broadly distinguish

four kinds of methods

A method that creates a

new object is called a constructor.

A method that destroys an

  • bject is called a

destructor.

It is frequent to define

many simple access methods just to set or get attribute values. These methods ensure data encapsulation.

Other methods that permit

the object to perform some useful actions.

LF Basel October 2006

Hierarchy through inheritance

Classes can have children

that is, one class can be created out of another class.

A sub-class inherits all

the attributes and methods

  • f the super-class, and

may have additional attributes and behaviors.

Inheritance aids in the

reuse of code. familyname Attributes parent Name familyname Attributes child Name

slide-12
SLIDE 12

LF Basel October 2006

Hierarchy example

A very simple class diagram:

Vehicule Attr: Meth: speed start stop Air-Vehicule Attr: Meth: nr-wing take-off land Land-Vehicule Attr: Meth: nr-wheel Plane Attr: Meth: fuel Bike Attr: Meth: Car Attr: Meth: fuel

LF Basel October 2006

A real example of hierarchy

The BioPerl class diagram:

slide-13
SLIDE 13

LF Basel October 2006

Polymorphism

Polymorphism means the

ability to request that the same operations be performed by a wide range of different objects.

Polymorphism is a

consequence of

  • inheritance. From

previous example, any vehicle can start and stop.

Think to a computer

desktop where one like to

  • pen, resize and close any

window, whatever is the content.

LF Basel October 2006

Objects In Perl Are Deceptively Simple

An object instance is

simply a reference that happens to know which class it belongs to (a reference is a scalar, just like a number or a string).

A class is simply a

package that happens to provide methods to deal with object references.

A method is simply a

subroutine that expects an object reference (or a package name, for class methods) as the first argument.

A class inherits through

@ISA array

slide-14
SLIDE 14

LF Basel October 2006

Class vs module

package MyMod; sub f { … } sub g { … } … 1; package MyObj; sub new { # constructor my $class = ref(shift); # get class ref my $self = { @_ }; # set attributes bless($self, $class); # create object return($self); # return instance ref } sub other_methods { … } … 1;

LF Basel October 2006

Class vs module

use MyMod; … MyMod::f($param); my $a = MyMod::g();

use MyObj; … # call constructor my $instance = MyObj->new($attr); # call method $instance->other_methods();

slide-15
SLIDE 15

LF Basel October 2006

Part III Using BioPerl Objects

LF Basel October 2006

What is BioPerl?

It is a collection of Perl

modules for processing data for the life sciences

A project made up of

biologists, bioinformaticians, computer scientists

An open source toolkit of

building blocks for life sciences applications

http://www.bioperl.org First work in 1996 Bioperl 1.0 was released in

May 2002

Current version 1.5.1 October

2005

Part of the open-bio.org

foundation (BioJava, BioPython, BioPerl, EMBOSS, BioMoby)

slide-16
SLIDE 16

LF Basel October 2006

What to expect from BioPerl?

If you're looking for a script

built to fit your exact need it's likely you won't find it.

What you will find is a diverse

set of Perl modules that will enable you to write your own script, and a community of people who are willing to help you.

  • The toolkit is divided into several packages,

most people will only want to deal with the Core package

  • Core package provides the main parsers, this is

the basic package and it's required by all the

  • ther packages
  • Run package provides w rappers for executing

some 60 common bioinformatics applications

  • Ext package is for C-language extensions

including some alignment algorithms and an interface to the Staden IO library

  • GUI package includes some basic w idgets in

Perl-Tk

  • BioPerl db is a subproject to store sequence and

annotation data in a BioSQL relational database

  • Pedigree package is for manipulating genotype,

marker, and individual data for linkage studies

  • Microarray package has preliminary objects for

manipulating some microarray data formats

  • Netw ork package parses and analyzes protein-

protein interaction data

  • Pipeline package is a project for creating

analysis pipelines out of bioperl-run modules

LF Basel October 2006

Code Sample

The following piece of code

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); print $seq_obj->seq(),"\n";

prints out

acgt

let's have a look at it, a line after the other.

slide-17
SLIDE 17

LF Basel October 2006

Line 1: Invoking Perl

This line tells your operating system where to find the

Perl interpreter

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); print $seq_obj->seq(),"\n";

nothing object-oriented here!

LF Basel October 2006

Line 2: Import class

This line tells Perl to use a module on your machine

called Seq.pm found in the directory Bio

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); print $seq_obj->seq(),"\n";

The code of the object class Bio::Seq is located in this

module (as well as the associated documentation).

The :: notation reflects the module organization into the

file system.

slide-18
SLIDE 18

LF Basel October 2006

What is Bio::Seq ?

The Bio::Seq object, or

"Sequence object", or "Seq object", is ubiquitous in BioPerl, it contains a single sequence and associated names, identifiers, and properties.

This generic "Sequence

  • bject" could be either

protein or DNA, and it is not linked to a particular format, like the SwissProt, the EMBL or the GenBank ones.

LF Basel October 2006

Line 3: Create instance

This line creates a sequence object (in memory)

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); print $seq_obj->seq(),"\n";

The Perl variable $seq_obj refers to an instance of the

Bio::Seq class

new is a subroutine found in the module Bio/Seq.pm. The

function call Bio::Seq->new acts as the constructor of the

  • bject. ‘-seq’=>’acgt’ assign the value to the attribute
slide-19
SLIDE 19

LF Basel October 2006

More details

In BioPerl, most constructors take arguments

under the form of key=value pairs. This is to provide maximal flexibility to the programmer. Many other keys (e.g., '-id' or '-desc') are available for the Bio::Seq->new constructor.

Read the documentation! http://doc.bioperl.org/

LF Basel October 2006

Line 4: Method call

This line prints out what is returned by the method seq() of the

  • bject $seq_obj (actually it is acgt)

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); print $seq_obj->seq(),"\n";

The -> notation means that one specifically intends to call the

subroutine seq that is attached to $seq_obj. Indeed, a different

  • bject might have a method named seq, with a possibly different

implementation if it belong to a different class (polymorphism).

slide-20
SLIDE 20

LF Basel October 2006

More methods

The BioPerl documentation

tells us that the Bio::Seq object have many other methods. Some like seq() return scalar, for example

Read the documentation! The following piece of code

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('- seq'=>'acgt'); print $seq_obj->seq(),"\n"; print $seq_obj->alphabet(),"\n"; print $seq_obj->subseq(3,4),"\n";

prints out

acgt dna gt

LF Basel October 2006

More methods

The BioPerl documentation

tells us that Bio::Seq object have methods that return a new instance of Bio::Seq object

Read the documentation! The following piece of code

#!/usr/local/bin/perl use Bio::Seq; $seq_obj=Bio::Seq->new('-seq'=>'acgt'); $seq_obj_2=$seq_obj->trunc(1,3); print $seq_obj_2->seq(),"\n"; $seq_obj_3=$seq_obj_2->revcom(); print $seq_obj_3->seq(),"\n"; $seq_obj_4=$seq_obj_2->translate(); print $seq_obj_4->seq(),"\n";

prints out

acg cgt T

slide-21
SLIDE 21

LF Basel October 2006

More objects

The Bio::SeqIO object is responsible for reading/writing sequence to file. It

provides support for the various database formats. The next script creates a 'sequence' object and save it to a file named 'test.seq' under FASTA format:

#!/usr/local/bin/perl use Bio::Seq; use Bio::SeqIO; $seq_obj=Bio::Seq->new('-seq'=>'acgt’, ‘-id’=>‘#12345’, ‘-desc’=>‘example 1’); $seqio_obj = Bio::SeqIO->new(‘-file’=>’>test.seq’, ‘-format’=>’fasta’); $seqio_obj->write_seq($seq_obj);

saves the file test.seq containing

>#12345 example 1 acgt

LF Basel October 2006

Simple change…

Case one replaces '-format' => 'fasta' with '-format' => 'embl' in

the Bio::SeqIO constructor, one gets

ID #12345 standard; DNA; UNK; 4 BP. XX AC unknown; XX DE example 1 XX FH Key Location/Qualifiers FH XX SQ Sequence 4 BP; 1 A; 1 C; 1 G; 1 T; 0 other; acgt 4 //

slide-22
SLIDE 22

LF Basel October 2006

A format converter

#!/usr/local/bin/perl use Bio::SeqIO; $in = Bio::SeqIO->new('-file' => 'infile' , '-format' => 'Fasta'); $out = Bio::SeqIO->new('-file' => '>outfile', '-format' => 'EMBL'); while (my $seq=$in->next_seq()) { $out->write_seq($seq); }

LF Basel October 2006

What's next w ith BioPerl?

There are several tutorials

and plenty of examples to help you start with BioPerl

Read the documentation and

play with the examples.

Many have learned through

practice.

slide-23
SLIDE 23

LF Basel October 2006

What's next w ith OOP?

The design of object internals

was not covered in details here, because this requires some familiarity with

  • programming. This is not

especially difficult.

There are problems that greatly

benefit from OOP, and others that are more easily managed without.

Applied improperly, or by

people without the skills, knowledge, and experience, OOP doesn't solve any problems, and might even make things worse. It can be an important piece of the solution, but isn't a guarantee

  • r a silver bullet.

Many programmers like the

OOP way. Maybe you too? ;- )