eQTL ANALYSIS BIG BIO David Pan THANKS BIG BIO eQTL Analysis - - PowerPoint PPT Presentation

eqtl analysis
SMART_READER_LITE
LIVE PREVIEW

eQTL ANALYSIS BIG BIO David Pan THANKS BIG BIO eQTL Analysis - - PowerPoint PPT Presentation

eQTL ANALYSIS BIG BIO David Pan THANKS BIG BIO eQTL Analysis eQTL - Expression Quantitative Trait Loci Linear regression to find association between gene expression and a specific variant/SNP/loci eQTL analysis is important for


slide-1
SLIDE 1

BIG BIO

eQTL ANALYSIS

David Pan

slide-2
SLIDE 2

THANKS

BIO BIG

slide-3
SLIDE 3

eQTL Analysis

  • eQTL - Expression Quantitative Trait Loci
  • Linear regression to find association

between gene expression and a specific variant/SNP/loci

  • eQTL analysis is important for determining

the genetic elements underlying variation and differences in gene expression

slide-4
SLIDE 4

REVIEW

slide-5
SLIDE 5

Double Stranded DNA

…CTCGTCACTTCACGTATG… |||||||||||||||||| …GAGCAGTGAAGTGCATAC…

slide-6
SLIDE 6

ALLELES

…CTCGTCACTTCACGTATG… …CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

How can I refer to these alleles?

Pos 14 Pos 7 Pos 4 Pos 2 Reference T G ACT GTA Alternate A C TCA

slide-7
SLIDE 7

ALLELES

…CTCGTCACTTCTC---TG… …CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

How can I refer to these alleles?

Pos 14 Pos 7 Pos 4 Pos 2 Ancestral T G ACT

  • Derived

A C TCA GTA

slide-8
SLIDE 8

ALLELE FREQUENCY

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

slide-9
SLIDE 9

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

Allele 1 T G ACT

  • Allele 2

A C TCA GTA

Pos 14 Pos 7 Pos 4 Pos 2

ALLELE FREQUENCY

slide-10
SLIDE 10

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

Allele 1 T G ACT

  • Allele 2

A C TCA GTA

Pos 14 Pos 7 Pos 4 Pos 2

Allele 1 6 3 7 5 Allele 2 4 7 3 5

ALLELE FREQUENCY

slide-11
SLIDE 11

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

Allele 1 T G ACT

  • Allele 2

A C TCA GTA

Pos 14 Pos 7 Pos 4 Pos 2

Allele 1 6 3 7 5 Allele 2 4 7 3 5 Allele 1 60% 30% 70% 50% Allele 2 40% 70% 30% 50%

ALLELE FREQUENCY

slide-12
SLIDE 12

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

Allele 1 T G ACT

  • Allele 2

A C TCA GTA

Pos 14 Pos 7 Pos 4 Pos 2

Allele 1 6 3 7 5 Allele 2 4 7 3 5 Allele 1 60% 30% 70% 50% Allele 2 40% 70% 30% 50% Major T C ACT

  • Minor

A G TCA GTA

ALLELE FREQUENCY

slide-13
SLIDE 13

REPRESENTING ALLELES

Chr Pos Ref Alt Ind1-H1 Ind1-H2 Ind2-H1 Ind2-H2 12 2,147,839 C T 1 1 1 12 2,147,913 T A 1 12 2,152,882 G-- ATC 1 1 1

Haplotype Matrix (Phased necessary)

Chr Pos Ref Alt Ind1 Ind2 12 2,147,839 C T 1 2 12 2,147,913 T A 1 12 2,152,882 G-- ATC 1 2

Genotype Matrix (Unphased or Phased) Other column options: Ancestral Allele, Derived Allele, rsID, genome feature, error

slide-14
SLIDE 14

VCF files

##fileformat=VCFv4.0 ##fileDate=20090805 ##source=myImputationProgramV3.1 ##reference=1000GenomesPilot-NCBI36 ##phasing=partial ##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth"> ##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele"> ##INFO=<ID=DB,Number=0,Type=Flag,Description="dbSNP membership, build 129"> ##INFO=<ID=H2,Number=0,Type=Flag,Description="HapMap2 membership"> ##FILTER=<ID=q10,Description="Quality below 10"> ##FILTER=<ID=s50,Description="Less than 50% of samples have data"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth"> ##FORMAT=<ID=HQ,Number=2,Type=Integer,Description="Haplotype Quality"> #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003 20 14370 rs6054257 G A 29 PASS NS=3;DP=14;AF=0.5;DB;H2 GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,. 20 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3 20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4 20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2 20 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP 0/1:35:4 0/2:17:2 1/1:40:3
slide-15
SLIDE 15

MINOR ALLELE FREQUENCY

slide-16
SLIDE 16

…CACGTCACTTCACGTATG… …CTCCTCTCATCAC---TG…

Pos 14 Pos 7 Pos 4 Pos 2

…CTCCTCACTTCACGTATG… …CTCCTCACTTCAC---TG… …CACGTCTCATCACGTATG… …CACGTCTCATCACGTATG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CTCCTCACTTCAC---TG… …CACCTCACTTCACGTATG…

Allele 1 T G ACT

  • Allele 2

A C TCA GTA

Pos 14 Pos 7 Pos 4 Pos 2

Allele 1 6 3 7 5 Allele 2 4 7 3 5 Allele 1 60% 30% 70% 50% Allele 2 40% 70% 30% 50% Major T C ACT

  • Minor

A G TCA GTA

MINOR ALLELE FREQUENCY

slide-17
SLIDE 17

DATA FOR EQTL ANALYSIS

slide-18
SLIDE 18

GENE EXPRESSION

Gene Ind1 Ind2 Ind3 Ind4 1 2 3 4 5 Individuals (n=100’s to 1000’s) Genes (n~20,000)

... ... ... ... ... ... ... ... ... ...

n

slide-19
SLIDE 19

COVARIATES

Covariate Ind1 Ind2 Ind3 Ind4 Genotype PC1 Genotype PC2 Genotype PC3 Age Age2 Sex Individuals (n=100’s to 1000’s) Covariates

... ... ... ... ... ...

slide-20
SLIDE 20

eQTL ANALYSIS

slide-21
SLIDE 21

eQTL ANALYSIS VISUALLY

AA AT TT

Alleles

slide-22
SLIDE 22

Linear regression: find the coefficients for the effect of expression on genotype when conditioned on the covariates in a linear model and test if they are significantly different than 0 Genotype ~ ß0 + ß1Expression + ß2Covariates

eQTL ANALYSIS MATH

Gene 1 Ind1 Ind2 Ind3 Ind4 Cov1 Cov2 Cov3 Ind1 Ind2 Ind3 Ind4 Geno 1 Ind1 Ind2 Ind3 Ind4

slide-23
SLIDE 23

cis-eQTL: trans-eQTL

cis-EQTL vs trans-eQTL

1Mb 1Mb Interchromosomal 1Mb 1Mb

OR: