CSE182-L14 Population Genetics: Basics Population Structure 377 - - PowerPoint PPT Presentation

cse182 l14
SMART_READER_LITE
LIVE PREVIEW

CSE182-L14 Population Genetics: Basics Population Structure 377 - - PowerPoint PPT Presentation

CSE182-L14 Population Genetics: Basics Population Structure 377 locations (loci) were sampled in 1000 people from 52 populations. 6 genetic clusters were obtained, which corresponded to 5 geographic regions (Rosenberg et al. Science


slide-1
SLIDE 1

CSE182-L14

Population Genetics: Basics

slide-2
SLIDE 2

Population Structure

  • 377 locations (loci) were sampled in 1000 people from 52

populations.

  • 6 genetic clusters were obtained, which corresponded to 5

geographic regions (Rosenberg et al. Science 2003)

Africa Eurasia East Asia America Oceania

slide-3
SLIDE 3

Population Genetics

  • What is it about our genetic makeup that makes us

measurably different?

  • These genetic differences are correlated with

phenotypic differences

  • With cost reduction in sequencing and genotyping

technologies, we will know the sequence for entire populations of individuals.

  • Here, we will study the basics of this

polymorphism data, and tools that are being developed to analyze it.

slide-4
SLIDE 4

What causes variation in a population?

  • Mutations (may lead to SNPs)
  • Recombinations
  • Other genetic events (may lead to

microsatellite repeats)

slide-5
SLIDE 5

Single Nucleotide Polymorphisms

00000101011 10001101001 01000101010 01000000011 00011110000 00101100110

Infinite Sites Assumption: Each site mutates at most

  • nce
slide-6
SLIDE 6

Short Tandem Repeats

GCTAGATCATCATCATCATTGCTAG GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATTGCTAGTTA GCTAGATCATCATCATCATCATTGC 4 3 5 3 3 5

slide-7
SLIDE 7

STR can be used as a DNA fingerprint

  • Consider a collection of

regions with variable length repeats.

  • Variable length repeats will

lead to variable length DNA

  • Vector of lengths is a finger-

print 4 2 3 3 5 1 3 2 3 1 5 3 positions individuals

slide-8
SLIDE 8

Recombination

00000000 11111111 00011111

slide-9
SLIDE 9

What if there were no recombinations?

  • Life would be simpler
  • Each seqence would have a single parent
  • The relationship is expressed as a tree.
slide-10
SLIDE 10

The Infinite Sites Assumption

0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 3 8 5

  • The different sites are linked. A 1 in position 8 implies 0 in

position 5, and vice versa.

  • Some phenotypes could be linked to the polymorphisms
  • Some of the linkage is “destroyed” by recombination
slide-11
SLIDE 11

Infinite sites assumption and Perfect Phylogeny

  • Each site is mutated

at most once in the history.

  • All descendants must

carry the mutated value, and all others must carry the ancestral value i

1 in position i 0 in position i

slide-12
SLIDE 12

Perfect Phylogeny

  • Assume an evolutionary model in which no

recombination takes place, only mutation.

  • The evolutionary history is explained by a tree in

which every mutation is on an edge of the tree. All the species in one sub-tree contain a 0, and all species in the other contain a 1. Such a tree is called a perfect phylogeny.

  • How can one reconstruct such a tree?
slide-13
SLIDE 13

The 4-gamete condition

  • A column i partitions the

set of species into two sets i0, and i1

  • A column is homogeneous

w.r.t a set of species, if it has the same value for all

  • species. Otherwise, it is

heterogenous.

  • EX: i is heterogenous w.r.t

{A,D,E} i A 0 B 0 C 0 D 1 E 1 F 1 i0 i1

slide-14
SLIDE 14

4 Gamete Condition

  • 4 Gamete Condition

– There exists a perfect phylogeny if and only if for all pair of columns (i,j), either j is not heterogenous w.r.t i0,

  • r i1.

– Equivalent to – There exists a perfect phylogeny if and only if for all pairs of columns (i,j), the following 4 rows do not exist

(0,0), (0,1), (1,0), (1,1)

slide-15
SLIDE 15

4-gamete condition: proof

  • Depending on which

edge the mutation j

  • ccurs, either i0, or i1

should be homogenous.

  • (only if) Every perfect

phylogeny satisfies the 4-gamete condition

  • (if) If the 4-gamete

condition is satisfied, does a prefect phylogeny exist?

i0 i1 i

slide-16
SLIDE 16

An algorithm for constructing a perfect phylogeny

  • We will consider the case where 0 is the ancestral

state, and 1 is the mutated state. This will be fixed later.

  • In any tree, each node (except the root) has a

single parent.

– It is sufficient to construct a parent for every node.

  • In each step, we add a column and refine some of

the nodes containing multiple children.

  • Stop if all columns have been considered.
slide-17
SLIDE 17

Inclusion Property

  • For any pair of columns i,j

– i < j if and only if i1 ⊇ j1

  • Note that if i<j then the

edge containing i is an ancestor of the edge containing i i j

slide-18
SLIDE 18

Example

1 2 3 4 5

A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 r

A B C D E

Initially, there is a single clade r, and each node has r as its parent

slide-19
SLIDE 19

Sort columns

  • Sort columns according to the

inclusion property (note that the columns are already sorted here).

  • This can be achieved by

considering the columns as binary representations of numbers (most significant bit in row 1) and sorting in decreasing order

1 2 3 4 5

A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0

slide-20
SLIDE 20

Add first column

  • In adding column i

– Check each edge and decide which side you belong. – Finally add a node if you can resolve a clade r

A B C D E

1 2 3 4 5

A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0

u

slide-21
SLIDE 21

Adding other columns

  • Add other

columns on edges using the

  • rdering

property

r

E B C D A

1 2 3 4 5

A 1 1 0 0 0 B 0 0 1 0 0 C 1 1 0 1 0 D 0 0 1 0 1 E 1 0 0 0 0 1 2 4 3 5

slide-22
SLIDE 22

Unrooted case

  • Switch the values in each column, so that 0

is the majority element.

  • Apply the algorithm for the rooted case
slide-23
SLIDE 23

Handling recombination

  • A tree is not sufficient as a sequence may have 2

parents

  • Recombination leads to loss of correlation between

columns

slide-24
SLIDE 24

Linkage (Dis)-equilibrium (LD)

  • Consider sites A &B
  • Case 1: No recombination

– Pr[A,B=0,1] = 0.25

  • Linkage disequilibrium
  • Case 2:Extensive

recombination – Pr[A,B=(0,1)=0.125

  • Linkage equilibrium

A B 1 1 1 1 1 1

slide-25
SLIDE 25