biology cs
play

Biology & CS Evolution organisms over time Ecology - PDF document

Biology Different levels Biology & CS Evolution organisms over time Ecology interactions among organisms and environment Individual organisms Anatomy, Physiology Philip Chan Cell Biology cells Molecular


  1. Biology � Different levels Biology & CS � Evolution � organisms over time � Ecology � interactions among organisms and environment � Individual organisms � Anatomy, Physiology Philip Chan � Cell Biology � cells � Molecular Biology � chemical molecules Molecular Biology Molecular Biology � DNA � DNA � Stands for? � Dioxyribonucleic Acid � Double helix structure � Watson and Crick, 1953 � Nobel Prize in Physiology or Medicine, 1962 Genome Genome � Chromosomes � Chromosomes � inside where? � inside the cell nucleus � ? pairs 1

  2. Genome Genome � Chromosomes � Chromosomes � inside the cell nucleus � inside the cell nucleus � 23 pairs (one determines what?) � 23 pairs (one determines gender) Genome Genome � Chromosomes � Chromosomes � inside the cell nucleus � inside the cell nucleus � 23 pairs (one determines gender) � 23 pairs (one determines gender) � contains genetic information � contains genetic information � copied during cell division � copied during cell division � made of DNA � made of DNA � Gene � Gene � ? � (roughly) segments of DNA that encode proteins � Genome � Human: ? genes Genome DNA to Protein � Chromosomes � Transcription � inside the cell nucleus � DNA -> RNA � 23 pairs (one determines gender) � Translation � contains genetic information � copied during cell division � RNA -> Protein � made of DNA � Genes � (roughly) segments of DNA that encodes proteins � Genome � Human: 20,000-25,000 genes 2

  3. DNA Encoding for Proteins Sequencing Human Genome � DNA � Human Genome Project � Sequence of nucleotides � International (governments/universities) � 4 possible nucleotides: � Adenine (A), Cytosine (C), Guanine (G), Thymine (T) � Celera Corporation (US) � [Thymine (T) becomes Uracil (U) in RNA] � Many short sequences � Protein � Algorithms to merge them into longer � Sequence of amino acids sequences � 20 possible amino acids � Complete genome sequence in ~2003 � How many nucleotides are needed to encode one amino acid? Why Study the Genome? Comparing Genes � Understanding how genes, proteins, … � After a gene is found interact with each other � Biologist might not know its function � Find “similarities” with genes of known function � Understanding diseases � Mistakes in copying DNA � Mutations cause changes in DNA Cancer (1984) Cystic Fibrosis (1989) � Cancer-causing gene is similar to a normal � Cystic Fibrosis is a fatal disease associated growth gene with abnormal secretions (clogs in lungs). � Cancer might be caused by a normal growth � A segment of the Cystic Fibrosis gene is gene being switched on at the wrong time similar to the sequence for ATP binding proteins. � A good gene doing the right thing at the wrong time � These proteins affect cell membrane and secretions 3

  4. Similarity/Distance of Sequences Similarity/Distance of Sequences � Position by position � Position by position � ACACAC � ACACAC � CACACA � CACACA � Hamming Distance = 6 � Hamming Distance = 6 � Shift the second sequence by one character � ACACAC_ � _CACACA � Distance = 2 Subsequence Longest Common � Subsequence � Sequence of characters that might NOT be Subsequence consecutive � ATTGCTA � TTGC -> subsequence � AGCA -> subsequence � ATTA -> subsequence Problem 1 � TGTT -> not a subsequence � TCG -> not a subsequence Common Subsequence Common Subsequence � Given two sequences � Given two sequences � ATCTGAT � ATCTGAT � TGCATA � TGCATA � Common subsequences ? � Common subsequences � TCTA � TA 4

  5. Problem Formulation Longest Common Subsequence (LCS) � Many different common subsequences � Given (input) � Two sequences v, w � Want to find the longest � Find (output) � Longest common substring of v and w (simpler � Length of LCS helps determine similarity of problem) two sequences/genes Algorithm Algorithm 1 � Any ideas? � Find common subsequence of length 1 � Find common subsequence of length 2 � … Algorithm 1 Algorithm 1 � Find common substring of length 1 � Find common substring of length 1 � Find common substring of length 2 � Find common substring of length 2 � … � … � What is the time complexity? � What is the time complexity? � Are we repeating unnecessary work? 5

  6. Algorithm 2 Algorithm 2 � Observation: � Observation: � If common substring of length L+1 exists � If common substring of length L+1 exists � Common substring of length L must also exists � Common substring of length L must also exists � Idea? � Idea � Use common substring of length L to find common substring of length L+1 Algorithm 2 Algorithm ? � Observation: � Tree Search � If common substring of length L+1 exists � Common substring of length L must also exists � What would be the nodes and branches? � Idea � Could recursion help? � Use common substring of length L to find common substring of length L+1 � Time complexity? � Time complexity? Algorithm 3 Algorithm 3 � Consider � Consider � String v, indexed by i � String v, indexed by i � String w, indexed by j � String w, indexed by j � LCS(i, j) returns the length of LCS ending at � LCS(i, j) returns the length of LCS ending at i,j i,j � LCS(i, j) = � LCS(i - 1, j - 1) + 1 if v[i] = w[j] � 0 otherwise 6

  7. Algorithm 3 Algorithm 3 � Consider � Consider � String v, indexed by i � String v, indexed by i � String w, indexed by j � String w, indexed by j � LCS(i, j) returns the length of LCS ending at � LCS(i, j) returns the length of LCS ending at i,j i,j � LCS(i, j) = � LCS(i, j) = � LCS(i - 1, j - 1) + 1 if v[i] = w[j] � LCS(i - 1, j - 1) + 1 if v[i] = w[j] � 0 otherwise � 0 otherwise � Different initial i,j pairs � Different initial i,j pairs � Any redundant work? Algorithm 3 Algorithm 3 � Dynamic programming � Eliminate redundant work A B A B � By storing partial answers 0 0 0 0 0 � LCS[] is a table B 0 � LCS[i, j] is the length of LCS ending at i, j � LCS[i, j] = A 0 � LCS[i - 1, j - 1] + 1 if v[i] = w[j] B 0 � 0 otherwise A 0 Algorithm 3 Algorithm 3 A B A B A B A B 0 0 0 0 0 0 0 0 0 0 B 0 0 1 0 1 B 0 0 1 0 1 A 0 A 0 1 0 2 0 B 0 B 0 A 0 A 0 7

  8. Algorithm 3 Algorithm 3 A B A B A B A B 0 0 0 0 0 0 0 0 0 0 B 0 0 1 0 1 B 0 0 1 0 1 A 0 1 0 2 0 A 0 1 0 2 0 B 0 0 2 0 3 B 0 0 2 0 3 A 0 1 0 3 0 A 0 1 0 3 0 Problem Formulation Problem � Given (input) � String editing � Two sequences v, w � Transform one string to another by keeping/adding/deleting characters � Can also be viewed as aligning two strings � Find (output) � Any ideas? � Longest common subsequence of v and w � Skipping character(s) is allowed -- T G C A T -- A -- C A T -- C -- T G A T C LCS: Example Edit Graph for LCS Problem 0 0 1 2 3 4 5 5 6 6 7 i coords: A T C T G A T C j 0 1 2 3 4 5 6 7 8 elements of v -- T G C A T -- A -- C i 0 T elements of w 1 A T -- C -- T G A T C G 0 1 2 2 3 3 4 5 6 7 8 j coords: 2 C 3 (0,0) � (0,1) � (1,2) � (2,2) � (3,3) � (4,3) � (5,4) � (5,5) � (6,6) � (6,7) � (7,8) A 4 positions in v : 1 < 3 < 5 < 6 < 7 Matches shown in red T 5 positions in w : 2 < 3 < 4 < 6 < 8 A 6 Every common subsequence is a path in 2-D grid C 7 8

  9. Edit Graph for LCS Problem Edit Graph for LCS Problem A T C T G A T C A T C T G A T C j j 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Every path is a i 0 i 0 common T T 1 1 subsequence. Every diagonal G G 2 2 edge adds an C C 3 3 extra element to common A A 4 4 subsequence T T 5 5 LCS Problem: Find a path with A A 6 6 maximum number of C C 7 7 diagonal edges Computing LCS Computing LCS The length of LCS( v i , w j ) is computed by: i -1 ,j i -1 ,j -1 1 0 i,j -1 i,j 0 s i-1, j s i, j = max s i, j-1 s i-1, j-1 + 1 if v i = w j s i-1,j + 0 s i,j = MAX s i,j -1 + 0 s i-1,j -1 + 1, if v i = w j Dynamic Programming Example Dynamic Programming Example Initialize 1 st row and 1 st column to be all zeroes. S i,j = S i-1, j-1 � value from NW +1, if v i = w j � value from North (top) Or, to be more max S i-1, j � value from West (left) S i, j-1 precise, initialize 0 th row and 0 th column to be all zeroes. 9

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend