Bioinformatics Vocabulary Processing, analyzing, experimenting with - - PowerPoint PPT Presentation

bioinformatics vocabulary
SMART_READER_LITE
LIVE PREVIEW

Bioinformatics Vocabulary Processing, analyzing, experimenting with - - PowerPoint PPT Presentation

Bioinformatics Vocabulary Processing, analyzing, experimenting with data Where does the data come from? How do we get it? What does it mean? What do we do with it? From nucleotide to protein to gene Identification is


slide-1
SLIDE 1

Genome Revolution: COMPSCI 004G 2.1

Bioinformatics Vocabulary

 Processing, analyzing, experimenting with data

  • Where does the data come from?
  • How do we get it?
  • What does it mean?
  • What do we do with it?

 From nucleotide to protein to gene

  • Identification is important
  • Annotation is important
slide-2
SLIDE 2

Genome Revolution: COMPSCI 004G 2.2

What does DNA (data) look like?

 TGAAC v ACTTG

  • Which direction is right?

 What is a base-pair?

  • nucleotide?

 What is a protein, how coded?

  • Identification?

 What is an amino acid?

  • Codon? Coding?

 Why are proteins important?

  • Finding? Using?…

http://www.blc.arizona.edu/Molecular_Graphics/DNA_Structure/DNA_Tutorial.HTML

slide-3
SLIDE 3

Genome Revolution: COMPSCI 004G 2.3

How do we get CGATC into software?

http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/sequencing.html

slide-4
SLIDE 4

Genome Revolution: COMPSCI 004G 2.4

From Shotgun to Gene

 Comparing two approaches

  • HGP: human genome project
  • Celera Genomics

 Why was there a race? Is the race over?

  • Who owns the data?
  • What public good does the data serve?

 Should scientists be concerned about public policy?

  • Was the Manhattan project like the HGP?
slide-5
SLIDE 5

Genome Revolution: COMPSCI 004G 2.5

What is a program? What is code?

 Instructions in a language a computer executes

  • Languages have different characteristics,

strengths, weaknesses

  • Scheme, BASIC, C++, Fortran, Java, Perl, PHP, …

 Computer executes one instruction at a time

  • Memory and state of machine change
  • Execute the next instruction
  • Repeat
  • Stop, run out of memory, pull plug, …
slide-6
SLIDE 6

Genome Revolution: COMPSCI 004G 2.6

From browser to genome analysis

 Netscape, first widely distributed browser

  • Who wrote it?
  • What operating systems did it run on?
  • What does it mean for a program to run?

 When you execute a Google query what happens?

  • Where does code run?
  • How do you see the results?

 Search at NCBI, jimwatsonsequence, …

  • Where does the code execute?
slide-7
SLIDE 7

Genome Revolution: COMPSCI 004G 2.7

Writing a program

 Create the program using a computer language

  • Design, test, document, maintain, …

 Test and debug the program

  • Does the program do what you want?
  • How do you know what the program does?
  • How do you fix it?
  • What skills are needed?
slide-8
SLIDE 8

Genome Revolution: COMPSCI 004G 2.8

More on understanding programs

 You write code in Java, or Perl, or C++ or php or …

  • The code must run/execute somewhere
  • You must understand what it does (how?)
  •  In your mind and on paper simulate/understand

computer’s execution of your code

  • What you wrote, not what you meant
  • How do you make a drawing?
slide-9
SLIDE 9

Genome Revolution: COMPSCI 004G 2.9

Creating a Program

Specify the problem

  • remove ambiguities
  • identify constraints

Develop algorithms, design classes, design software architecture

Implement program

  • revisit design
  • test, code, debug
  • revisit design

Documentation, testing, maintenance of program

From ideas to electrons

slide-10
SLIDE 10

Genome Revolution: COMPSCI 004G 2.10

Writing and Understanding Java

 Language independent skills in programming

  • What is a loop, how do you design a program?
  • What is an array, how do you access files?

 However, writing programs in any language

requires understanding the syntax and semantics of the programming language

  • Syntax is similar to rules of spelling and

grammar:

  • i before e except after c
  • Two spaces after a period, then use a capital letter
slide-11
SLIDE 11

Genome Revolution: COMPSCI 004G 2.11

Syntax and Semantics

 Semantics is what a program (or English sentence)

means

  • You ain’t nothing but a hound dog.
  • La chienne de ma tante est sur votre tete.

 At first it seems like the syntax is hard to master,

but the semantics are much harder

  • Natural languages are more forgiving than

programming languages.

slide-12
SLIDE 12

Genome Revolution: COMPSCI 004G 2.12

Toward an Understanding of Java

 Traditional first program, doesn’t convey power of

computing but it illustrates basic components of a simple program

public class SayHello { // traditional first program public static void main(String[] args) { System.out.println("Hello World!"); } }

 This program must be edited/typed, compiled and

executed

slide-13
SLIDE 13

Genome Revolution: COMPSCI 004G 2.13

How Things Work: PrintLots.java

public class PrintLots { // … public void once(){ twice(); twice(); } public static void main(String[] args){ PrintLots printer = new PrintLots(); printer.once(); } }

slide-14
SLIDE 14

Genome Revolution: COMPSCI 004G 2.14

Java Vocabulary

 Variable, object, identifier, method, call

  • Name of something: object or method
  • The car starts, the dog barks, I speak

 Invoke or call method: method lives in object (or in

a class)

  • An object is an instance of a class
  • My car is a Volvo 850, yours is a BMW …
  • My car starts, yours stops: v850.start();

850.start();

slide-15
SLIDE 15

Genome Revolution: COMPSCI 004G 2.15

Methods/Functions can return values

 What does the square root function do?

  • When called with parameters of 4, 6.2, -1

 What does the method getGcount() return?

public class DNAstuff { public int getGcount(String dna) { int total = 0; for(int k=0; k < dna.length(); k++){ if (dna.charAt(k) == 'g'){ total = total + 1; } } return total; } }

slide-16
SLIDE 16

Genome Revolution: COMPSCI 004G 2.16

Lydia Kavraki

Awards

  • Grace Murray Hopper
  • Brilliant 10

"I like to work on problems that will generally improve the quality of

  • ur life,"

What's the thing you love most about science? “Working with students and interacting with people from diverse intellectual backgrounds. Discovery and the challenge of solving a tough problem, especially when it can really affect the quality of our lives. I find the whole process energizing.”

slide-17
SLIDE 17

Genome Revolution: COMPSCI 004G 2.17

John Kemeny (1926-1982)

Invented BASIC, assistant to

Einstein, Professor and President of Dartmouth

"If you have a large number of unrelated ideas, you have to get quite a distance away from them to get a view of all of them, and this is the role of abstraction." "...it is the greatest achievement

  • f a teacher to enable his

students to surpass him."

slide-18
SLIDE 18

Genome Revolution: COMPSCI 004G 2.18

Anatomy of for-loop

String s = new String("AGTCCG"); String rs = new String(""); for(int k=0; k < 3; k++){ rs = rs + s.charAt(k); }

Initialization happens once

Loop test evaluated

  • If true body executes
  • If false skip after loop

After loop body, increment executed and test re-evaluated

What should be true about test?

What about body?

What about together?

slide-19
SLIDE 19

Genome Revolution: COMPSCI 004G 2.19

Program Style

 People who use your program don’t read your code

  • You’ll write programs to match user needs

 People who maintain or modify your program do read code

  • Must be readable, understandable without you next door
  • Use a consistent programming style, adhere to conventions

 Identifiers are names of functions, parameters, (variables,

classes, …)

  • Sequence of letters, numbers, underscore __ characters
  • Cannot begin with a number (we won’t begin with __)
  • big_head vs. BigHead, we’ll use AlTeRnAtInG format
  • Make identifiers meaningful, not droll and witty
slide-20
SLIDE 20

Genome Revolution: COMPSCI 004G 2.20

Equality of values and objects

int x = 3*12; if (x == 36) {is-executed} String s = new String("genetic"); String t = s.substring(0,4); if (t == "gene") {not executed} if (t.equals("gene")) {is-executed}

Primitive types are boxes

Object types are labels on boxes

  • If we don't call new there's no box for the label
  • No box is called null, it means no object referred to or referenced

by variable/pointer/reference

slide-21
SLIDE 21

Genome Revolution: COMPSCI 004G 2.21

s t

Objects and values

 Primitive variables are boxes

  • think memory location with value

 Object variables are labels that are put on boxes

String s = new String("genome"); String t = new String("genome"); if (s == t) {they label the same box} if (s.equals(t)) {contents of boxes the same}

What's in the boxes? "genome" is in the boxes

slide-22
SLIDE 22

Genome Revolution: COMPSCI 004G 2.22

Objects, values, classes

For primitive types: int, char, double, boolean

  • Variables have names and are themselves boxes (metaphorically)
  • Two int variables assigned 17 are equal with ==

For object types: String, Sequence, others

  • Variables have names and are labels for boxes
  • If no box assigned, created, then label applied to null
  • Can assign label to existing box (via another label)
  • Can create new box using new

Object types are references or pointers or labels to storage

slide-23
SLIDE 23

Genome Revolution: COMPSCI 004G 2.23

Don Knuth (Art of Programming)

“My feeling is that when we prepare a program, it can be like composing poetry or music; as Andrei Ershov has said, programming can give us both intellectual and emotional satisfaction, because it is a real achievement to master complexity and to establish a system of consistent rules.” “We have seen that computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty.”

slide-24
SLIDE 24

Genome Revolution: COMPSCI 004G 2.24

Ada Lovelace, 1816-1853

Daughter of Byron, advocate of work of Charles Babbage, designer

  • f early “computer” (the Analytical

Engine)

  • Made Babbage’s work

accessible “It would weave algebraic patterns the way the Jacquard loom weaved patterns in textiles”

Tutored in mathematics by Augustus de Morgan

Marched around the billiard table playing the violin

Ada is a notable programming language