MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN - PowerPoint PPT Presentation

MOTIFS DISTRIBUTION IN DNA SEQUENCES St´ ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math´ ematique et Informatique Appliqu´ ees Bio-Info-Math Workshop, Tehran, April 2005 S. Robin (Motif statistics in DNA) 0

Biological interest of motif statistics Four examples Ex 1 : Promoter motifs = structured motifs where polyme- rase binds to DNA ≃ 100 bps gene v w 16 bps ≤ d ≤ 18 bps Which structured motifs occur almost ( too ?) systematically in upstream regions of the genes of a given species ? S. Robin (Motif statistics in DNA) 1

RecBCD , � Chi 5' 3' A l # � # c c � 3' 5' 3' � , � � Chi 5' � B l # � # Ex 2 : CHI motifs in bacterial genomes X c X c X � 3' X X X 5' � � 3' � , � � Chi 5' � C l # # � # # # # # # # # # # # ` # ` c # # # ` c # # � ` Crossover Hot-spot Initiator : defense function of the genome 3' ` ` 5' ` ` ` � � 3' � � against the degradation activity of an enzyme � , Chi � l � � � 5' � � 3' D # # � # # # # # # # # # # # # ` ` c # # # ` # c # � ` 3' ` ` ` ` ` ` ` ` ` 5' ` ` Known in several bacterial Fig. 9.1 { Mo d � ele d'inter action entr e R e cBCD et Chi. genomes : A�n d' � etudier la fr equence � du motif Chi=GCTGGTGG dans la s � equence d' E. c oli , nous a v ons a just � e successiv emen t c hacun des mo d � eles M 0 , M 1 , : : : et M 6 , et calcul � e les statistiques U asymptotiquemen t gaussiennes cen tr � ees r � eduites corresp ondan tes (P artie I) ; le mo d � ele M 0 est E. coli : gctggtgg celui o u � les bases de la s � equence son t supp os � ees ind � ep endan tes. Le T ableau 9.1 mon tre que Chi est le 8-mot le plus sur-repr � esen t � e lorsque l'on a juste l'un des trois mo d � eles M 0 , M 1 et M 2 sur la s � equence. De plus, il reste parmi les 8-mots les plus H. influenza : gNtggtgg sur-repr � esen t � es lorsque l'on augmen te l'ordre du mo d � ele mark o vien. Le fait que Chi soit excep- tionnellemen t fr � equen t dans c haque mo d � ele traduit donc une forte con train te vis a � vis de tous ses sous-mots de longueur 2 � a 7 car le nom bre de GCTGGTGG est toujours plus imp ortan t que ( Figure : Schbath, 95 ) celui pr � edit par les di-, tri-, t etra-n � ucl � eotides, etc : : : Is this motif unexpectedly frequent in some regions of the genome ? If so, these regions may contain crucial functions. S. Robin (Motif statistics in DNA) 2 136

Ex 3 : Palindromes = self-complementary words g t t a a c | | | | | | c a a t t g Palindromes of length 6 are restriction sites (i.e. frailty sites) of the genome of E. coli . If they are especially avoided in some regions, these regions may be of major importance for the organism. S. Robin (Motif statistics in DNA) 3

Ex 4 : Detection of unknown motifs – Motifs with favorable functions should be unexpectedly frequent , – Motifs with damaging functions should be unexpectedly rare Even when we know nothing about them (except their length) , such motifs may be detected only because they have unexpected frequencies S. Robin (Motif statistics in DNA) 4

A model : what for ? Model = Reference To be able to decide if something is unexpected, we first need to know what to expect. To avoid artifacts, the model should typically account for • the frequencies of nucleotides, or di-, or tri-nucleotides in the sequence, • the overlapping structure of the word, • eventually, the overall frequency of the word in the sequence The choice of the model (Markov chain / compound Poisson process) depends on the question. ( R., Rodolphe & Schbath ; 05 ) S. Robin (Motif statistics in DNA) 5

Overlapping structure of the word Some words can overlap themselves (see Conway (Gardner, 74) ; Guibas & Odlyzko, 81 ). Such words tend to occur in clumps and have a less regular distribution along the sequence. Cdf of the distance between two occurrences under model M00 : w = ( gatc ) w = ( aaaa ) E ( Y ) = 256 bps E ( Y ) = 256 bps V ( Y ) = (256 . 2 bps) 2 V ( Y ) = (326 . 7 bps) 2 S. Robin (Motif statistics in DNA) 6

32 CHAPITRE 2. OCCURRENCES DE MOTIFS � et des distan es Y qui les s � eparen t. La �gure 2.1 illustre es d � e�nitions. P ar on v en tion, la p osition d'une o urren e est d � e�nie par la p osition de la derni ere � lettre dans la s � equen e. Cette on v en tion est ommo de, mais arbitraire et absolumen t pas g � en erale. � X 4 X 1 Probabilities and distributions of interest w w w w w w S Y Positions, distances, counts 4 N ( w ) = 6 Y Fig. 2.1 { O urr en es d'un mot w : X et X sont les p ositions de ses pr emi � er e et qua- 1 4 4 tri � eme o urr en es ; Y est la distan e entr e deux o urr en es su essives et Y la distan e umul � ee d'or dr e 4. Notre ob je tif est de d � eterminer les distributions exa tes des v ariables al � eatoires (v.a.) r X , X , Y et Y . L'in t � er et ^ des distan es um ul � ees appara ^ �tra dans le hapitre 3. n M � etho de d'obten tion de la distribution • Probability for a motif to occur in a sequence : X 1 La distribution de la p osition X = X de la premi ere � o urren e est obten ue � a partir 1 − → promoter motifs de sa fon tion g � en er � atri e (de probabilit � e). En notan t p ( x ) = Pr f X = x g , la fon tion g � en � eratri e � de X est d � e�nie par X • Distribution of the number of occurrences : N X x � ( t ) = p ( x ) t : X • Distribution of the occurrences along the sequence : Y r , N ( x ) − x � 1 N ( x − y ) − → CHI motifs, palindromes Cette fon tion g � en eratri e � est obten ue par un raisonnemen t en deux � etap es. ( i ) On etablit � une r � e urren e sur les probabilit � es p ( x ) : S. Robin (Motif statistics in DNA) 7 p ( x ) = f [ p (1) ; : : : ; p ( x � 1)℄ (th eor � eme � 1, paragraphe 4.2.1). ( ii ) On d eduit � la fon tion g � en � eratri e � ( t ) en somman t ette r e urren e � sur x � 1 et en X x m ultiplian t par t (th eor � � eme 2, paragraphe 4.2.2). On obtien t la fon tion g � en eratri e � � de la distan e Y selon le m ^ eme prin ip e. Cette Y distan e a un sens dans le mo d � ele M1 ar, dans e mo d � ele, les distan es s � eparan t les o urren es su essiv es son t ind � ep endan tes et iden tiquemen t distribu � ees (i.i.d.). r Les fon tions g � en � eratri es des p ositions ult erieures � X et des distan es um ul � ees Y n s'obtiennen t ensuite dire temen t gr^ a e � a l'ind � ep endan e des distan es : n � 1 r r � ( t ) = � ( t )[ � ( t )℄ ; � ( t ) = [ � ( t )℄ : (2.2) X X Y Y Y n

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN - PowerPoint PPT Presentation

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math ematique et Informatique Appliqu ees Bio-Info-Math Workshop, Tehran, April 2005 S. Robin (Motif statistics in DNA) 0 Biological

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

DNA evidence: two important features match between two DNA profiles frequency of the DNA profile in

DNA Nucleus Contains cells genetic info (DNA) controls cell functions DNA Structure

Self-Assembling DNA Self-Assembling DNA N. Jonoska Jonoska, N. C. , N. C. Seeman Seeman, DNA

Go Bananas! Introduction Tell you about DNA Show you how to extract DNA from a Banana

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

DNA Mo'f Discovery COMPSCI 260 Spring 2016 DNA motif discovery

Presented by Yvette Conley, PhD School of Nursing What we will cover during this webcast:

Binary attributes quantification with external information Alfonso Iodice DEnza

Specifying Plausibility Levels for Iterated Belief Change in the Situation Calculus Toryn Q.

Sampling Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and

R01 - Simple linear regression STAT 587 (Engineering) Iowa State University October 17, 2020

Identification Algorithms for Hybrid Systems Giancarlo Ferrari-Trecate Politecnico di Milano,

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G>A genetic polymorphism

Advancing clinical proteomics via analysis based on biological complexes: A tale of five

Sambuz

Useful Links

Newsletter

Mail Us

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN - PowerPoint PPT Presentation

MOTIFS DISTRIBUTION IN DNA SEQUENCES St ephane ROBIN robin@inapg.inra.fr UMR INA-PG / INRA, Paris Math ematique et Informatique Appliqu ees Bio-Info-Math Workshop, Tehran, April 2005 S. Robin (Motif statistics in DNA) 0 Biological

DNA D DNA Double bl Helix DNA stands for: DNA stands for: U d Under a Deoxyribose

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on DNA

Take out your DNA model DNA and the Human Genome DNA Model How was your How was your model

Table of Contents Why DNA Computing? The Structure of DNA DNA Computing Operations on

DNA Computing Information Processing with DNA Molecules Christian Jacob, 01/2002. Table of

Eastern Shores (GHOTES) DNA A Family Tree DNA Project Family Tree DNA Family Tree DNA or

DNA IN OUR FOOD? EXTRACTION OF DNA FROM STRAWBERRIES (GETTING THE DNA OUT OF STRAWBERRIES) -OR

The Design of Autonomous DNA The Design of Autonomous DNA Nanomechanical Devices: Devices:

DNA evidence: two important features match between two DNA profiles frequency of the DNA profile in

DNA Nucleus Contains cells genetic info (DNA) controls cell functions DNA Structure

Self-Assembling DNA Self-Assembling DNA N. Jonoska Jonoska, N. C. , N. C. Seeman Seeman, DNA

Go Bananas! Introduction Tell you about DNA Show you how to extract DNA from a Banana

A STUDY OF TORSION ANGLES OF RNA MOTIFS By Sai Teja Kshir Sagar Bioinformatics Independent

Network Motifs Bioinformatics: Sequence Analysis COMP 571 - Spring 2015 Luay Nakhleh, Rice

20-03-06 7. Learning Sequences/Behaviors How to use sequences/behaviors? Sequences and more

DNA Mo'f Discovery COMPSCI 260 Spring 2016 DNA motif discovery

Presented by Yvette Conley, PhD School of Nursing What we will cover during this webcast:

Binary attributes quantification with external information Alfonso Iodice DEnza

Specifying Plausibility Levels for Iterated Belief Change in the Situation Calculus Toryn Q.

Sampling Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and

R01 - Simple linear regression STAT 587 (Engineering) Iowa State University October 17, 2020

Identification Algorithms for Hybrid Systems Giancarlo Ferrari-Trecate Politecnico di Milano,

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G&gt;A genetic polymorphism

Advancing clinical proteomics via analysis based on biological complexes: A tale of five

Sambuz

Useful Links

Newsletter

Mail Us

Louvain centre for Toxicology and Applied Pharmacology ABCB1 1199G>A genetic polymorphism