Paul Kirk MASAMB 2016, Cambridge October 4, 2016 Central dogma of - - PowerPoint PPT Presentation

paul kirk
SMART_READER_LITE
LIVE PREVIEW

Paul Kirk MASAMB 2016, Cambridge October 4, 2016 Central dogma of - - PowerPoint PPT Presentation

Retroviruses integrate into a shared, non-palindromic motif Paul Kirk MASAMB 2016, Cambridge October 4, 2016 Central dogma of molecular biology (Crick, 1956) General transfers of biological sequential information: Protein translation RNA


slide-1
SLIDE 1

Retroviruses integrate into a shared, non-palindromic motif

Paul Kirk

MASAMB 2016, Cambridge October 4, 2016

slide-2
SLIDE 2

Central dogma of molecular biology (Crick, 1956)

General transfers of biological sequential information:

Protein

RNA

DNA

transcription translation replication

MRC | Medical Research Council

1 of 22

slide-3
SLIDE 3

Central dogma of molecular biology (Crick, 1956)

General transfers of biological sequential information:

Protein

RNA

DNA

transcription translation replication

There are also special transfers of sequential information.

MRC | Medical Research Council

1 of 22

slide-4
SLIDE 4

For example: retroviruses

Integrase Reverse transcriptase Protease viral RNA A retrovirus:

MRC | Medical Research Council

2 of 22

slide-5
SLIDE 5

For example: retroviruses

Integrase Reverse transcriptase Protease viral RNA A retrovirus:

Retroviruses are obligate parasites: they require a host cell to complete their “life”-cycle.

MRC | Medical Research Council

2 of 22

slide-6
SLIDE 6

For example: retroviruses

Integrase Reverse transcriptase Protease viral RNA A retrovirus:

Retroviruses are obligate parasites: they require a host cell to complete their “life”-cycle. Examples: HIV, HTLV-1, . . . .

MRC | Medical Research Council

2 of 22

slide-7
SLIDE 7

For example: retroviruses

host DNA

H O S T C E L L

MRC | Medical Research Council

3 of 22

slide-8
SLIDE 8

For example: retroviruses

host DNA

H O S T C E L L

INFECTION

MRC | Medical Research Council

3 of 22

slide-9
SLIDE 9

For example: retroviruses

viral RNA host DNA

H O S T C E L L

MRC | Medical Research Council

3 of 22

slide-10
SLIDE 10

For example: retroviruses

Reverse transcriptase

viral RNA viral DNA host DNA

H O S T C E L L

MRC | Medical Research Council

3 of 22

slide-11
SLIDE 11

For example: retroviruses

Integrase

Reverse transcriptase

viral RNA

SNIP!

viral DNA host DNA host DNA

H O S T C E L L

MRC | Medical Research Council

3 of 22

slide-12
SLIDE 12

For example: retroviruses

Integrase

Reverse transcriptase

viral RNA viral DNA host DNA host DNA provirus

H O S T C E L L

MRC | Medical Research Council

3 of 22

slide-13
SLIDE 13

Characterising retroviral integration sites

...ATCCCGCTTA...

HOST DNA

MRC | Medical Research Council

4 of 22

slide-14
SLIDE 14

Characterising retroviral integration sites

TGAC...CGT

...ATCCCGCTTA...

HOST DNA RETROVIRUS DNA INTERMEDIATE

MRC | Medical Research Council

4 of 22

slide-15
SLIDE 15

Characterising retroviral integration sites

TGAC...CGT

...ATCCCGCTTA...

CUT! HOST DNA RETROVIRUS DNA INTERMEDIATE

MRC | Medical Research Council

4 of 22

slide-16
SLIDE 16

Characterising retroviral integration sites

...ATCCCG CTTA...

HOST DNA PROVIRUS

TGAC...CGT

PASTE!

MRC | Medical Research Council

4 of 22

slide-17
SLIDE 17

Characterising retroviral integration sites

...ATCCCG CTTA...

HOST DNA PROVIRUS

TGAC...CGT

PASTE!

We would like to characterise the target integration site

  • i.e. the regions flanking the provirus
  • Is there a motif?

MRC | Medical Research Council

4 of 22

slide-18
SLIDE 18

Aligning integration sites

Given a collection of integration sites, we can align them according to the position of the provirus. . .

...ATCCCG CTTA...

INTEGRATION SITE 1

TGAC...CGT

...TTAGAG GGTA...

INTEGRATION SITE 2

TGAC...CGT

...AACGAA CTTC...

INTEGRATION SITE 3

TGAC...CGT

...TTCTCC CGGA...

INTEGRATION SITE 4

TGAC...CGT

...AGCTTC CTGC...

INTEGRATION SITE 5

TGAC...CGT

MRC | Medical Research Council

5 of 22

slide-19
SLIDE 19

Aligning integration sites

Given a collection of integration sites, we can align them according to the position of the provirus. . . . . . and then ignore/remove/mask the provirus sequence, so that we just look at the target sites:

...ATCCCG CTTA...

INTEGRATION SITE 1

...TTAGAG GGTA...

INTEGRATION SITE 2

...AACGAA CTTC...

INTEGRATION SITE 3

...TTCTCC CGGA...

INTEGRATION SITE 4

...AGCTTC CTGC...

INTEGRATION SITE 5 MRC | Medical Research Council

5 of 22

slide-20
SLIDE 20

Summarising a collection of target sites

Example (5 sequences) Sequences

...ATC... ...TTA... ...AAC... ...TTC... ...AGC...

Complements

...TAG... ...AAT... ...TTG... ...AAG... ...TCG...

Reverse complements

...GAT... ...TAA... ...GTT... ...GAA... ...GCT...

Consensus sequence

Just take the most frequent letter at each position: ...ATC...

Position probability matrix (PPM), P

Estimate the probability of each letter at each position: P =     A . . . 3/5 1/5 1/5 . . . T . . . 2/5 3/5 . . . C . . . 4/5 . . . G . . . 1/5 . . .    

MRC | Medical Research Council

6 of 22

slide-21
SLIDE 21

Summarising a collection of target sites

Example (5 sequences) Sequences

...ATC... ...TTA... ...AAC... ...TTC... ...AGC...

Complements

...TAG... ...AAT... ...TTG... ...AAG... ...TCG...

Reverse complements

...GAT... ...TAA... ...GTT... ...GAA... ...GCT...

Reverse complement PPM, P(RC)

The PPM for the reverse complement sequences: P(RC) =     A . . . 3/5 2/5 . . . T . . . 1/5 1/5 3/5 . . . C . . . 1/5 . . . G . . . 4/5 . . .     Note: we can get P(RC) from P (and vice versa) by swapping the rows A ↔ T and C ↔ G, and reversing the order of the columns.

MRC | Medical Research Council

7 of 22

slide-22
SLIDE 22

Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites

From 4,521 HTLV-1 target integration sites, we find the consensus:

AAGTGGATATCCACTT

From 13,442 HIV-1 target integration sites, we find the consensus:

TTTGGTAACCAAA

MRC | Medical Research Council

8 of 22

slide-23
SLIDE 23

Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites

From 4,521 HTLV-1 target integration sites, we find the consensus:

AAGTGGATATCCACTT

From 13,442 HIV-1 target integration sites, we find the consensus:

TTTGGTAACCAAA

MRC | Medical Research Council

8 of 22

slide-24
SLIDE 24

Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites

From 4,521 HTLV-1 target integration sites, we find the consensus:

AAGTGGATATCCACTT

TTCACCTATAGGTGAA

From 13,442 HIV-1 target integration sites, we find the consensus:

TTTGGTAACCAAA TTT

GG

T

A CC AAA

T

MRC | Medical Research Council

8 of 22

slide-25
SLIDE 25

Palindromic consensus sequences for HTLV-1 and HIV-1 target integration sites

From 4,521 HTLV-1 target integration sites, we find the consensus:

AAGTGGATATCCACTT

TTCACCTATAGGTGAA

From 13,442 HIV-1 target integration sites, we find the consensus:

TTTGGTAACCAAA TTT

GG

T

A CC AAA

T

The target integration sites are palindromic (as already known!)

MRC | Medical Research Council

8 of 22

slide-26
SLIDE 26

Palindromic PPMs for HTLV-1 and HIV-1 target integration sites

For both HTLV-1 and HIV-1, we have P(RC) ≈ P HTLV-1

Entries of PPM, P

0.1 0.2 0.3 0.4 0.5 0.6

Entries of reverse-complement PPM, P(RC)

0.1 0.2 0.3 0.4 0.5 0.6

P(RC) = P 95% credible region

HIV-1

Entries of PPM, P

0.1 0.2 0.3 0.4 0.5 0.6

Entries of reverse-complement PPM, P(RC)

0.1 0.2 0.3 0.4 0.5 0.6

P(RC) = P 95% credible region

MRC | Medical Research Council

9 of 22

slide-27
SLIDE 27

Palindromic sequence logos

HTLV-1:

0.0 0.1 0.2 0.3 0.4

bits

G C

T

A

C G

T A

G

C

A

T

  • 10
G C

A

T

C G T AC

G

T

A

C

G T

AT

C

A G

  • 5

G

C

A

T

T C A G

T

C

A G

C

T G

AG

C

A

T

1

C G

T

A

G C A

TA G

T C

G T CC

G

T

A

A

G

T

C

G

A C

T

G

C

A

T

G C T A C G T A

11

C G

T

A

G C

A

T

C G

T A

  • 1
  • 13 -12 -11
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2

2 3 4 5 6 10 9 8 7 12 13

HIV-1:

0.0 0.1 0.2 0.3 0.4

bits G C

T A

G

C

A

T

G C A TC G

A T

C G A TC G

T A

G

C A

TC

G A

T

C

G

A

T

C

T

A G T

C

A

G

C G

A

T

G C

T

A

G C

T

A

A

G

T

C

G

A

T C

G

C

T A

G

T C

A

C G T

A

G C

T A

C T A G

C

T A

G T AC

G

T A

C G

A

T

  • 10
  • 5
  • 1
  • 12 -11
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2

1 11 2 3 4 5 6 10 9 8 7 12

MRC | Medical Research Council

10 of 22

slide-28
SLIDE 28

An attack of aibohphobia

  • There is an almost unbelievable amount of symmetry (!)

MRC | Medical Research Council

11 of 22

slide-29
SLIDE 29

An attack of aibohphobia

  • There is an almost unbelievable amount of symmetry (!)
  • Is this “real”? Do we see evidence of the symmetry within

individual sequences, or just at the level of these summaries?

MRC | Medical Research Council

11 of 22

slide-30
SLIDE 30

An attack of aibohphobia

  • There is an almost unbelievable amount of symmetry (!)
  • Is this “real”? Do we see evidence of the symmetry within

individual sequences, or just at the level of these summaries?

  • We introduce a palindrome index to quantify “how

palindromic” each sequence is

MRC | Medical Research Council

11 of 22

slide-31
SLIDE 31

The palindrome index

AAGTGGATATCCACTT

MRC | Medical Research Council

12 of 22

slide-32
SLIDE 32

The palindrome index

AAGTGGATATCCACTT

s-8 s-7 s-6 s-5 s-1 s-2 s-3 s-4 s1 s2 s3 s4 s8 s7 s6 s5 S =

MRC | Medical Research Council

12 of 22

slide-33
SLIDE 33

The palindrome index

AAGTGGATATCCACTT

s-8 s-7 s-6 s-5 s-1 s-2 s-3 s-4 s1 s2 s3 s4 s8 s7 s6 s5 S =

Define ρ(S) = 1 n

n

  • i=1

I(si = c(s−i)), where 2n is the sequence length, I is the indicator function, and c(x) is the complement of x (e.g. c(T) = A).

MRC | Medical Research Council

12 of 22

slide-34
SLIDE 34

The palindrome index

AAGTGGATATCCACTT

s-8 s-7 s-6 s-5 s-1 s-2 s-3 s-4 s1 s2 s3 s4 s8 s7 s6 s5 S =

Define ρ(S) = 1 n

n

  • i=1

I(si = c(s−i)), where 2n is the sequence length, I is the indicator function, and c(x) is the complement of x (e.g. c(T) = A). (In practice, we use an “adjusted for chance” version, which is maximally 1, and is 0 if S is no more palindromic than expected by chance.)

MRC | Medical Research Council

12 of 22

slide-35
SLIDE 35

Observed palindrome indices

  • 0.39 -0.27 -0.15 -0.04 0.08 0.19

0.31 0.42 0.54 0.65 0.88

Frequency

200 400 600 800 1000 1200

HTLV-1

Distribution of API scores API for consensus

Adjusted Palindrome Index, API

  • 0.39
  • 0.26
  • 0.13 -0.01

0.12 0.24 0.37 0.5 0.62 0.75 0.87

Frequency

500 1000 1500 2000 2500 3000 3500

HIV-1

Distribution of API scores API for consensus

Adjusted Palindrome Index, API

0.88 0.87

MRC | Medical Research Council

13 of 22

slide-36
SLIDE 36

Where do the palindromes come from?

  • The individual sequences are not palindromic

MRC | Medical Research Council

14 of 22

slide-37
SLIDE 37

Where do the palindromes come from?

  • The individual sequences are not palindromic
  • So why do we see palindromes when we average over a large

number of sequences?

MRC | Medical Research Council

14 of 22

slide-38
SLIDE 38

Where do the palindromes come from?

  • One possible explanation is that we have a mix of “forward” and

“reverse complement” sequence orientations,

MRC | Medical Research Council

15 of 22

slide-39
SLIDE 39

Where do the palindromes come from?

  • One possible explanation is that we have a mix of “forward” and

“reverse complement” sequence orientations, e.g. in the noiseless case

Sequence 1: AATTTAAGTGGAT (Forward) Sequence 2: ATCCACTTAAATT (Reverse complement) Sequence 3: ATCCACTTAAATT (Reverse complement) Sequence 4: AATTTAAGTGGAT (Forward) Sequence 5: ATCCACTTAAATT (Forward) Sequence 6: AATTTAAGTGGAT (Reverse complement)

P =     A 1 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 T 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 1 C 0.5 0.5 0.5 G 0.5 0.5 0.5     = P(RC)

MRC | Medical Research Council

15 of 22

slide-40
SLIDE 40

Analogy

If we have a sample of many real numbers, and we take their mean and find it to be exactly zero, one possibility is that this mean is representative of the sample:

MRC | Medical Research Council

16 of 22

slide-41
SLIDE 41

Analogy

If we have a sample of many real numbers, and we take their mean and find it to be exactly zero, one possibility is that this mean is representative of the sample: Another possibility is that we have 2 symmetric components, one positive and one negative:

MRC | Medical Research Council

16 of 22

slide-42
SLIDE 42

Mixture modelling

  • We model the sequences as coming from two populations

◮ one with PPM P; and ◮ one with reverse complement PPM P(RC).

π(S) = ωπ(S|P) + (1 − ω)π(S|P(RC)).

MRC | Medical Research Council

17 of 22

slide-43
SLIDE 43

Mixture modelling

  • We model the sequences as coming from two populations

◮ one with PPM P; and ◮ one with reverse complement PPM P(RC).

π(S) = ωπ(S|P) + (1 − ω)π(S|P(RC)).

  • Here, ω is the proportion of sequences coming from the

population with PPM P.

MRC | Medical Research Council

17 of 22

slide-44
SLIDE 44

Mixture modelling

  • We model the sequences as coming from two populations

◮ one with PPM P; and ◮ one with reverse complement PPM P(RC).

π(S) = ωπ(S|P) + (1 − ω)π(S|P(RC)).

  • Here, ω is the proportion of sequences coming from the

population with PPM P.

  • The parameters, ω and P, can be estimated/inferred in

numerous ways. I will show results from using an EM-algorithm, but identical results are obtained by: (i) maximum profile likelihood; (ii) Gibbs sampling; (iii) greedy Gibbs.

MRC | Medical Research Council

17 of 22

slide-45
SLIDE 45

Unmixing the forward and reverse sequences

0.5 0.5

HIV-1 (13442 seqs)

0.0 0.1 0.2 0.3 0.4

bits

G C

A

T

G C A

T

G A C T G C A T G C A TG C A TG A C

TG

A C

T

C

G

A

T

T

A

C G

G

C A

T

G

C

A

T

G

C

T A

A

G

T

C

G

A

T

C

G

A C

T

G

T C

A

G C T AG C A T G C A TG A C T G C A TG C A

TG C

A

T

  • 10
  • 5
5 10
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1
1 2 3 4 6 7 8 9 11 12

HIV-1 (13442 seqs)

  • 10
  • 5
5 10
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1
1 2 3 4 6 7 8 9 11 12 0.0 0.1 0.2 0.3 0.4

bits C G

T

A

C G

T A

C G T AC T G A C G T A C G T A C G A TC

A G

T

C

T G A

C

T

A

G

T

C

A

G

C

G

A

T

C

G

T

A

C

G

T

A

A

T G

C

G

C

T

A T

C G

A

C T G A C G T A C G T AC G T A C T G AC G T

A

C G

T A

  • 10
  • 5
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1
5 10 1 2 3 4 6 7 8 9 11 12 13
  • 13

HTLV-1 (4521 seqs)

0.0 0.1 0.2 0.3 0.4 0.5

bits

G C A

T

G C

A T

G C A

T

G A C T C A TG C T A C G T AT A C GG

C A

T

G T A CG T A C G C A TG

C A

T

G

C

A

T

G

A

C

T

A G

T

C

G

A

T C

G

C

T

A

A

G

T C

G A C

T

G

C A

T

G A C T G C T AG C T AG C

A

T

G C

A

T

  • 10
  • 5
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1
5 10 1 2 3 4 6 7 8 9 11 12 13
  • 13

HTLV-1 (4521 seqs)

0.0 0.1 0.2 0.3 0.4 0.5

bits

C G

T A

C G

T A

C G A T C G A TC T G AC

G T

A

C

T G A T

C

A G

C

G

A

T

C

T

A

G

T C

A

G

C

T

G A

C

G

T A

C

G

T

A

C G T A C A T G C A T GC

G

T

A

A T G C G C A TC G A T G T AC T G A C G T

A

C G

T A

C G

T A

Subpopulation 1 Subpopulation 2

MRC | Medical Research Council

18 of 22

slide-46
SLIDE 46

Unmixing the forward and reverse sequences

  • 10
  • 5
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

5 10 1 2 3 4 6 7 8 9 11 12 13

  • 13

0.0 0.1 0.2 0.3 0.4 0.5

bits

G

C

A

T

G

C

T A

G

C A

T

G

A

C

T

G C A TG

C

T A

CG T AT

A C GG

C A

T

G

T A

C

G

T

A C

G

A C

TG

A C

T

G

C

A

T

G

A

C

T

G A

T

C

G

A

T

C

G

C

T

A

A

G

T C

G

A

C

T

G

C A

T

G

A C

T

G C T AG

C

T A

G

C

A

T

G

C

A

T

0.0 0.1 0.2 0.3 0.4 0.5

bits

G

C

A

T

G

A C

T

G

A

C

T

G C A

T

G C A

TG

C

A

TG A

C

TA

G

C

T

C

G

A

T

T

A

C G

G

A

C

T

G

C

A

T

G

C

T

A

A

G

T

C

G

A

T

C

G

A

C

T

G

T

C

A

G C T AG

C A

T

G

A C

TG A C

T

G C A

TG

C

A

TG C

A

T

  • 10
  • 5

5 10

  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

1 2 3 4 6 7 8 9 11 12

HIV-1 HTLV-1 MRC | Medical Research Council

18 of 22

slide-47
SLIDE 47

Unmixing the forward and reverse sequences

  • 10
  • 5
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

5 10 1 2 3 4 6 7 8 9 11 12 13

  • 13

0.0 0.1 0.2 0.3 0.4 0.5

bits

G

C

A

T

G

C

T A

G

C A

T

G

A

C

T

G C A TG C

T A

CG T AT

A C GG

C A

T

G T A

C

G

T

A C

G

A C

TG

A C

T

G

C

A

T

G

A

C

T

G A

T

C

G

A

T

C

G

C

T

A

A

G

T C

G

A

C

T

G

C A

T

G A C T G C T AG C

T A

G

C

A

T

G C

A

T

0.0 0.1 0.2 0.3 0.4 0.5

bits

G

C

A

T

G

A C

T

G A C

T

G C A

T

G C A

TG

C A

TG A

C

TA

G

C

T

C

G

A

T

T

A

C G

G

A C

T

G

C

A

T

G

C

T

A

A

G

T

C

G

A

T

C

G

A C

T

G

T C

A

G C T AG C A

T

G A C

TG A C

T

G C A TG

C

A

TG C

A

T

  • 10
  • 5

5 10

  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

1 2 3 4 6 7 8 9 11 12

HIV-1 HTLV-1

  • 10
  • 5
  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

5 10 1 2 3 4 6 7 8 9 11 12 13

  • 13

0.0 0.1 0.2 0.3 0.4 0.5

bits

G

C A

T

G

C A

T

G

A C

T

G

A C

T

G

C

A

T

G A C TA G

T CG

C A

T

G

T C A

C G A TT

A

G

C

G

A

C

T

G A

C

T

G C A

T

G

A

C

T

A

T G

C

G

A

T C

G

C A

T

G

C

T

A

G A

T C G

A C

T

G A

C

T

G

C A

T

G

A C

T

G

C

A

T

G C

A

T

ASLV

  • 10
  • 5

5 10

  • 11
  • 12
  • 9
  • 8
  • 7
  • 6
  • 4
  • 3
  • 2
  • 1

1 2 3 4 6 7 8 9 11 12 0.0 0.1 0.2 0.3 0.4 0.5

bits

G C A

T

G A C

T

G A T C G C A T G A C TG C

T A

G A C TA G

C

TG

C

A

T

G A

T CT G

A

C

G

A

C

T

G

C A

T

A G

C

T

G A

T CG

C

T

A

GT C G A

C T

G

A

C

T

G

A

C

T

G A C

T

G

C A

T

G C T AC G

A

T

MLV

MRC | Medical Research Council

18 of 22

slide-48
SLIDE 48

Summary

  • The palindrome is not observed within individual sequences.

MRC | Medical Research Council

19 of 22

slide-49
SLIDE 49

Summary

  • The palindrome is not observed within individual sequences.
  • Hypothesis: the palindrome results from a mixture of

sequences that contain a non-palindromic motif in approximately equal proportions in “forward” and “reverse complement” orientations

MRC | Medical Research Council

19 of 22

slide-50
SLIDE 50

Summary

  • The palindrome is not observed within individual sequences.
  • Hypothesis: the palindrome results from a mixture of

sequences that contain a non-palindromic motif in approximately equal proportions in “forward” and “reverse complement” orientations

  • Modelling this hypothesis revealed a common nucleotide motif

across 4 retroviruses: 5’-T(N1/2)[C(N0/1)T|(W1/2)C]CW-3’

MRC | Medical Research Council

19 of 22

slide-51
SLIDE 51

Summary

  • The palindrome is not observed within individual sequences.
  • Hypothesis: the palindrome results from a mixture of

sequences that contain a non-palindromic motif in approximately equal proportions in “forward” and “reverse complement” orientations

  • Modelling this hypothesis revealed a common nucleotide motif

across 4 retroviruses: 5’-T(N1/2)[C(N0/1)T|(W1/2)C]CW-3’

  • Potential implications for understanding retroviral integration.

MRC | Medical Research Council

19 of 22

slide-52
SLIDE 52

Summary

  • The palindrome is not observed within individual sequences.
  • Hypothesis: the palindrome results from a mixture of

sequences that contain a non-palindromic motif in approximately equal proportions in “forward” and “reverse complement” orientations

  • Modelling this hypothesis revealed a common nucleotide motif

across 4 retroviruses: 5’-T(N1/2)[C(N0/1)T|(W1/2)C]CW-3’

  • Potential implications for understanding retroviral integration.
  • True validation requires further structural information about

retroviral intasomes.

MRC | Medical Research Council

19 of 22

slide-53
SLIDE 53

Availability

  • Accepted for publication in Nature Microbiology.
  • Preprint:

◮ Kirk, Huvet, Melamed, Maertens & Bangham (2015). Retroviruses

integrate into a shared, non-palindromic motif. bioRxiv.

Matlab code (and the HTLV-1 dataset) are available online: http://www.mrc-bsu.cam.ac.uk/software/ bioinformatics-and-statistical-genomics/ Just click on retroCode to download!

MRC | Medical Research Council

20 of 22

slide-54
SLIDE 54

Acknowledgements

Charles Bangham Maxime Huvet Anat Melamed Goedele Maertens Sylvia Richardson MRC Biostatistics Unit Michael Stumpf Imperial College Theoretical Systems Biology group

MRC | Medical Research Council

21 of 22

slide-55
SLIDE 55

Thanks for listening!

@pauldwkirk http://www.mrc-bsu.cam.ac.uk/people/paul-kirk/

MRC | Medical Research Council

22 of 22

slide-56
SLIDE 56