Recent adaptive selection in Tibet and Greenland Anders Albrechtsen - - PowerPoint PPT Presentation

recent adaptive selection in tibet and greenland
SMART_READER_LITE
LIVE PREVIEW

Recent adaptive selection in Tibet and Greenland Anders Albrechtsen - - PowerPoint PPT Presentation

Recent adaptive selection in Tibet and Greenland Anders Albrechtsen The bioinformatic Centre, Copenhagen University February 9, 2017 Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data Outline Signatures of


slide-1
SLIDE 1

Recent adaptive selection in Tibet and Greenland

Anders Albrechtsen

The bioinformatic Centre, Copenhagen University

February 9, 2017

slide-2
SLIDE 2

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Outline

1

Signatures of recent/ongoing selection

2

Tibet Introduction

3

Greenland Intro

4

SFS for NGS data Bias for low/medium depth sequencing data Genotype likelihood based SFS

slide-3
SLIDE 3

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

slide-4
SLIDE 4

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

slide-5
SLIDE 5

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Probability of fixation

slide-6
SLIDE 6

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

slide-7
SLIDE 7

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

slide-8
SLIDE 8

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Altitude adaption in Tibet

slide-9
SLIDE 9

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Altitude adaption in Tibet

Yi et al. 2010

  • Low oxygen has a large effect on fitness
  • People living in high altitude general have more birth defects
slide-10
SLIDE 10

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Oxygen and height

slide-11
SLIDE 11

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Altitude adaption in Tibet

Yi et al. 2010

  • The full exomes of 50 Tibetan individuals at an average

coverage of 18X.

  • Compared to 40 Han Chinese individuals sequenced at an

average of 6X (1000G).

  • Estimated joint allele frequencies for each SNP using Bayesian

approach.

slide-12
SLIDE 12

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Target Region

Target region: 34108810 bases long (non-redundant) Genome elements Of elements Total length Promoter 1,917 490,995 5-UTR 17,046 672,142 CDS 209,095 32,584,741 3-UTR 6,524 401,002 Intron 55,372 4,349,839 In target regions we used

  • read that mapped uniquely without multiple best hits
  • bases with a quality score of at least Q20
slide-13
SLIDE 13

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

PPARG

Scale chr3: Target 50 kb 12360000 12370000 12380000 12390000 12400000 12410000 12420000 12430000 12440000 12450000 12460000 12470000 Target region the sum of Depths for all individuals UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG Total depth 50000 _ 0 _

PPARG - zoom

Scale chr3: Target 1 kb 12396000 12396500 12397000 12397500 12398000 Target region the sum of Depths for all individuals UCSC Genes Based on RefSeq, UniProt, GenBank, CCDS and Comparative Genomics PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG PPARG Total depth 50000 _ 0 _

slide-14
SLIDE 14

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

2D site frequency spectrum

slide-15
SLIDE 15

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Population Branch Statistic (PBS)

PBS = TBS = (T TH + T TD − T HD)/2, T AB = log(1 − F AB

st )

slide-16
SLIDE 16

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Population frequencies

EPAS1 SNP allele frequencies Allele Tibetan Han Danish C 0.13 0.9125 1 G 0.87 0.0875

slide-17
SLIDE 17

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

EPAS1

  • type of hypoxia-inducible factors
  • active under low oxygen
  • variant of gene confers increased athletic performance - called

the ”super athlete gene”.

slide-18
SLIDE 18

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Genotyping in 366 individuals

Independent genotyping

  • 366 Tibetans
  • Genotyped for the EPAS1 SNP
  • Phenotypes availeble

Associations within the Tibetan population CC CG GG p-value N 10 84 272 Hemoglobin concentration 178 178.9 167.5 0.0013 erythrocyte counts 5.3 5.6 5.2 0.0015

slide-19
SLIDE 19

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Is this extreme compared to populations

slide-20
SLIDE 20

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Other genes with large FST

slide-21
SLIDE 21

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

EPAS1

slide-22
SLIDE 22

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Haplotype is extremely different

slide-23
SLIDE 23

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

How did they adapt so fast

slide-24
SLIDE 24

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Adaptive intergression

slide-25
SLIDE 25

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

conclusion

  • Tibetans have adapted to life in high altitude
  • A loci EPAS1 was found that has undergone strong adaptive

selection

  • The loci associated with hemoglobin concentrations and

erythrocyte counts

  • The mutations were introduced by Denisovan introgression
  • First (and only) example of adaptive introgression in humans
slide-26
SLIDE 26

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Human adaption to arctic environment

slide-27
SLIDE 27

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Brief overview of Greenland’s history

  • Inhabited on and off by different

Arctic cultures for ∼4500 years:

  • Visited by Vikings, Danish colony

from 1814, now autonomous country

slide-28
SLIDE 28

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

The modern Greenlandic population

  • Small: N≃57,000
  • Live in coastal towns
  • Descendents of Inuit
  • But most also have

European ancestry

  • On average ∼ 25%

From Moltke et al. 2014, AJHG

slide-29
SLIDE 29

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

A mutation causes 15% of type 2 diabetes in Greenland1

Very large almost recessive effect Rec model 2-h Glucose:3.8mmol/l T2D: OR=10.3 heredibility The variation explain 15% of all T2D in Greenland

1Moltke et al. Nature, 2014

slide-30
SLIDE 30

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Life in the Arctic is extreme: cold temperatures & fat-rich diet

slide-31
SLIDE 31

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Questions we recently tried to answer

Long term history Who are the ancestors of the Inuit and Greenlanders? Recent history How do modern Greenlanders relate to each other and Europe Disease and selective pressure Effect of being a small population - can we identify the genetic basis Adaptation How did the Inuit adept to the extreme environment

slide-32
SLIDE 32

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Effect of being a small and isolated population

Allele frequencies drift

  • By far the most important factor
  • Stronger effect in small populations

selection

  • Important for alleles with phenotypic effect
  • For small populations only alleles under very

strong selection will be significantly affected causal loci

  • loci with a strong effect will be at very low

frequency in large populations

  • loci with a strong effect can have a large

frequency in small populations all loci Allele frequencies will differ from all other populations

slide-33
SLIDE 33

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Allele frequencies and population size

slide-34
SLIDE 34

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Frequency spectrum of Inuit

slide-35
SLIDE 35

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

2D SFS between GL and Han

slide-36
SLIDE 36

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Split time from East Asia

Analyses of the exome data using ∂a∂i: Tree based on Fst

slide-37
SLIDE 37

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Recent changes in population size

slide-38
SLIDE 38

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

2D SFS and Fst

Fst from heterozygosity Fst = σB

σT = Htotal−Hsubpolulations Htotal

slide-39
SLIDE 39

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Selection scan using PBS - ((HAN, GR) CEU)

slide-40
SLIDE 40

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Top loci

FADS fatty acid desaturase. TBX15

  • TBX15 plays an important role in differentiation of brown

(subcutaneous) adipocytes.

  • Upon stimulation by cold exposure can produces heat by lipid
  • xidation.

FN3KRP

  • an enzyme that catalyzes fructosamines, psicosamines and

ribulosamines that protects against nonenzymatic glycation.

  • FN3KRP can act to counteract the negative fitness caused by

a PUFA rich diet.

slide-41
SLIDE 41

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Why selection?

  • Tested for association between top SNPs and metabolic traits
  • Marginally significant associations with multiple traits, including

LDL

  • Selected alleles associated with decreased weight and height:
slide-42
SLIDE 42

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Why selection?

  • The association with height replicates in Europe:
  • V
  • ADDITION (N = 0)

SDC (N = 1306) Inter99 (N = 6116)

  • D

D

  • Effect size (SD)
slide-43
SLIDE 43

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Why selection? Take 2

  • Testing for association w. red blood cell membrane fatty acid

composition:

  • Mutation seems to compensate for high-fat diet
  • Height due to effect of fatty acid composition on growth hormone

levels?

  • Either way, the results suggest that selection in this region is a new

example of human adaptation where we know the genetic basis

slide-44
SLIDE 44

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Conclusion

  • We find multiple loci with recent adaptation to life in the

arctic

  • As expected the genes are involved in poly unsaturated fatty

acid metabolism and cold adaption

  • Surprisingly the loci also affects high and weight
  • Mutations also have an effect in Europe
slide-45
SLIDE 45

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Exercises

slide-46
SLIDE 46

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

How are the SFS estimated?

Can we construct the SFS using NGS data Yes - but be careful

slide-47
SLIDE 47

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

When can calling SNPs and genotypes be a problem?

low/medium depth data

  • Capture data
  • low depth sequencing due to price
  • ancient DNA (only a finite amount of DNA)

What depth is high enough? Depends on the analysis

  • SFS is extremely sensitive to both genotype and SNP calling
  • admixture proportions are sensitive to genotype calling
  • ABBA-BABA (D-stats) can be used regardless of depth
slide-48
SLIDE 48

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Site frequency spectrum for low/medium depth data

slide-49
SLIDE 49

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

There are no possible filters than can solve the problem

slide-50
SLIDE 50

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Estimating SFS using uncertainty of the data

Likelihood of SFS for a single site: P(X s | η) =

2N

  • j=0

p(X s | η, J = j)p(J = j|η) ∝

2N

  • j=0

ηj

  • g∈{0,1,2}N

p(G = g | J = j)

N

  • i=1

P(X s

i | Gi = gi),

p(G = g | J = j) p(G = g | J = j) = 2N

j

  • 2

N

i I1(gi)

when 2N

i=1 gi = j, else 0

SFS for a region P(X | η) = r

s=1 P(X s | η)

fast calculations with dynamic programming and EM2

2Nielsen et al. SNP Calling, Genotype Calling, and Sample Allele Frequency

Estimation from New-Generation Sequencing Data

slide-51
SLIDE 51

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Site frequency spectrum for low/medium depth data

slide-52
SLIDE 52

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Site frequency spectrum for low/medium depth data

slide-53
SLIDE 53

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Emperical bayes

The full ML method is computational intensive. We therefore propose an emperical bayes approach. ˆ ηEB

s

(j) =

r

  • s=1

p(J = j | X s) = P(X s | J = j)p(J = j) 2N

j′=0 P(X s | J = j′)p(J = j′)

, with p(J = j) being our ML estimate from the optimization. For a region of size r the local SFS is: ˆ ηEB(j) = r

s=1 ηEB s

(j)

slide-54
SLIDE 54

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Selection scan using emperical bayes

LCT loci in Europeans based on 3X data

120 125 130 −2 −1 1 2

Tajima D chr2 100k windows

Position (MB) Tajima' D EB p1e−6mLike p1e−3mLike p1e−6HWE p1e−3HWE

slide-55
SLIDE 55

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

SFS based on genotype likelihoods

  • can be estimate even with low(ish) depth e.g. 2 X
  • Must be done with genotype likelihoods unless depth is high

(>10X)

  • Can be done in any dimension

1D thetas e.g. Tajimas pi, Tajimas D, Population sizes 2D fst and PBS >2D usefull for Demography inference

slide-56
SLIDE 56

Signatures of recent/ongoing selection Tibet Greenland SFS for NGS data

Time for exercises

Data from 1000 Genomes

  • 2500 individuals sequenced at low/medium depth (3-8X)
  • mulitple populations

Human genomes

  • 3Gb
  • BAM file size 5Gb per X

Reduced genome

  • 22 100k regions (one for each autosome)
  • 1Mb region on chr5
  • 3 x 10 individuals from
  • African(YRI), European (CEU), East Asian (JPT)