Data analysis in (infra)Structure and project evaluation genetic - - PDF document

data analysis in
SMART_READER_LITE
LIVE PREVIEW

Data analysis in (infra)Structure and project evaluation genetic - - PDF document

Areas of expertise Data analysis in (infra)Structure and project evaluation genetic epi and advise Methodological advise (study design, Yurii Aulchenko Independent researcher & consultant planning of analyses, methods, yurii


slide-1
SLIDE 1

Data analysis in genetic epi

Yurii Aulchenko

Independent researcher & consultant yurii [dot] aulchenko [at] gmail [dot] com

Wednesday, November 23, 11

Areas of expertise

  • (infra)Structure and project evaluation

and advise

  • Methodological advise (study design,

planning of analyses, methods, software)

  • Teaching and training
  • Methods, algorithms and software

Wednesday, November 23, 11 Wednesday, November 23, 11

Data generation

Wednesday, November 23, 11

Analysis Data generation

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Data generation

Wednesday, November 23, 11

slide-2
SLIDE 2

Analysis

Meta-analysis & replication

Functional genomics Data generation

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

slide-3
SLIDE 3

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

slide-4
SLIDE 4

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Prepare (phenotypic & genotypic) data for genetic epidemiologic analysis

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

slide-5
SLIDE 5

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: establish relation of phenotype to genotype

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Wednesday, November 23, 11

slide-6
SLIDE 6

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process

TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

slide-7
SLIDE 7

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Aim: Provide initial insight into biological mechanisms underlying established genotype-phenotype relation

Lipid transform Regulation of protein transport Regulation of hydrolase activity Humoral immune response Response to nutrient Cholesterol metabolic process Response to nutrient levels Sterol metabolic process Response to extracellular stimulus Steroid metabolic process TC

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation ESP, GE02

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation ESP, GE02 GE03, this course

Wednesday, November 23, 11

slide-8
SLIDE 8

Analysis

Meta-analysis & replication

Functional genomics Data generation ESP, GE03, GE05, this course ESP, GE02 GE03, this course

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation ESP, GE03, GE05, this course ESP, GE02 GE03, this course ESP, GE03, this course

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation ESP, GE03, GE05, this course ESP, GE02 GE03, this course ESP, GE03, this course ESP, this course

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation

Wednesday, November 23, 11

Analysis

Meta-analysis & replication

Functional genomics Data generation Additive genetic models Analysis of interactions

Wednesday, November 23, 11

Singe SNP analysis

  • Analysis of each SNP in turn

independent of others

  • For each SNP, simple regression

performed, resulting in estimates of regression coefficients, their standard errors and the p-value

Wednesday, November 23, 11

slide-9
SLIDE 9

Linear regression model

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bg gi + ei where m is intercept, gi is the genotypic value, and ei is random residual error

Wednesday, November 23, 11

Linear regression

SNP genotype Phenotypic score 1 2 4.5 4.0 3.5 3.0 2.5 2.0

t, t- s ) e n a r

  • e

Wednesday, November 23, 11

Linear regression

SNP genotype Phenotypic score 1 2 4.5 4.0 3.5 3.0 2.5 2.0

t, t- s ) e n a r

  • e

m

Wednesday, November 23, 11

Linear regression

SNP genotype Phenotypic score 1 2 4.5 4.0 3.5 3.0 2.5 2.0

t, t- s ) e n a r

  • e

m

Wednesday, November 23, 11

Linear regression

SNP genotype Phenotypic score 1 2 4.5 4.0 3.5 3.0 2.5 2.0

t, t- s ) e n a r

  • e

bg m

Wednesday, November 23, 11

As a result...

Wednesday, November 23, 11

slide-10
SLIDE 10

Multiple testing

  • Hence nominal single test p-value

corresponding to experiment-wise type 1 error rate of 5% is << 0.05

  • Usual practice is to use a fixed

threshold of 5x10-8

  • Note: this threshold is defined for

GWAS of common variants in a population of European ancestry

Wednesday, November 23, 11

Linear regression model

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bg gi + ei where m is intercept, gi is the genotypic value, and ei is random residual error

Wednesday, November 23, 11

Linear regression model

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bg gi + ei where m is intercept, gi is the genotypic value, and ei is random residual error

Wednesday, November 23, 11

Linear regression model

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bg gi + ei where m is intercept, gi is the genotypic value, and ei is random residual error

I n d e p e n d e n c e a s s u m p t i

  • n

Wednesday, November 23, 11

What happens if violated?

Wednesday, November 23, 11

What happens if violated?

GWAS of skin color using the HapMap data

Wednesday, November 23, 11

slide-11
SLIDE 11

Relationship structure

Wednesday, November 23, 11

Relationship structure

Wednesday, November 23, 11

Methods to deal with genetic sub-structure

Wednesday, November 23, 11

Methods to deal with genetic sub-structure

ESP29 25.08.2010 Yurii Aulchenko Wednesday, November 23, 11

Genomic control

GWAS of skin color using the HapMap data

Wednesday, November 23, 11

Genomic control

GWAS of skin color using the HapMap data GWAS without any association

Wednesday, November 23, 11

slide-12
SLIDE 12

Genomic control

GWAS of skin color using the HapMap data GWAS without any association

The ides is to scale the test statistic “back to normal” by dividing it by the genomic inflation factor L (which is estimated from the data)

Wednesday, November 23, 11

Methods to deal with genetic sub-structure

ESP29 25.08.2010 Yurii Aulchenko Wednesday, November 23, 11

Linear mixed model

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bg gi + Gi + ei where m is intercept, gi is the genotypic value, Gi is a random effect with variance-covariance structure proportional to the relationship matrix, and ei is random residual error

Wednesday, November 23, 11

Structure of NFBC66 (Kang et al., 2010)

Wednesday, November 23, 11

Structure of NFBC66 (Kang et al., 2010)

Wednesday, November 23, 11

Test statistic inflation (Kang et al., 2010)

Wednesday, November 23, 11

slide-13
SLIDE 13

Test statistic inflation (Kang et al., 2010)

Wednesday, November 23, 11

Analysis Meta-analysis & Functional genomics Data generation Additive genetic models Analysis of interactions

Wednesday, November 23, 11

Interaction models

The value of the trait in i-th individual is assumed to follow linear model Yi = m + bf Fi + bg gi + bfg Fi gi + ei where m is intercept, Fi is the value of some “factor”, gi is the genotypic value, and ei is random residual error

Wednesday, November 23, 11

What could “F” be?

  • An environment (gene-environment

interaction)

  • Other genotype (gene-gene)
  • Indicator of transmitting parent

(imprinting models)

  • ... etc.

Wednesday, November 23, 11

A Genome-Wide Screen for Interactions Reveals a New Locus on 4p15 Modifying the Effect of Waist-to-Hip Ratio

  • n Total Cholesterol
  • A meta-analysis of genome-

wide association (GWA) data from 18 population-based cohorts with European ancestry (maximum N = 32,225).

  • Eight further cohorts (N =

17,102) for replication

  • SNP rs6448771

demonstrated genome-wide significant interaction with waist-to-hip-ratio (WHR) on total cholesterol (TC) with a combined P-value of 4.79×10−9

Wednesday, November 23, 11

!

A Genome-Wide Screen for Interactions Reveals a New Locus on 4p15 Modifying the Effect of Waist-to-Hip Ratio

  • n Total Cholesterol

Wednesday, November 23, 11

slide-14
SLIDE 14

Analysis of genetic interactions

Type Problem GxE

Heteroscedasticity (but use robust models helps)

GxG

Computationally intensive and methodologically challenging

Imprinting

No software implementation

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Best method Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Best method Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Best method Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Best method Data

Wednesday, November 23, 11

slide-15
SLIDE 15

Data analysis:

  • ther important aspect

Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Software Data

Wednesday, November 23, 11

Data analysis:

  • ther important aspect

Software Data Results

Wednesday, November 23, 11

Method to software

Method

Wednesday, November 23, 11

Method to software

Method Approximation

Wednesday, November 23, 11

Method to software

Method Approximation Algorithm

Wednesday, November 23, 11

slide-16
SLIDE 16

Method to software

Method Approximation Algorithm Implementation

Wednesday, November 23, 11

Method to software

Software Method Approximation Algorithm Implementation

Wednesday, November 23, 11

Doing genetic epi analysis

Wednesday, November 23, 11

Doing genetic epi analysis

  • Think about your design

Wednesday, November 23, 11

Doing genetic epi analysis

  • Think about your design
  • Do invest into QC & verification of

assumptions

Wednesday, November 23, 11

Doing genetic epi analysis

  • Think about your design
  • Do invest into QC & verification of

assumptions

  • Think of best possible method

Wednesday, November 23, 11

slide-17
SLIDE 17

Doing genetic epi analysis

  • Think about your design
  • Do invest into QC & verification of

assumptions

  • Think of best possible method
  • Look for software, test it

Wednesday, November 23, 11

Doing genetic epi analysis

  • Think about your design
  • Do invest into QC & verification of

assumptions

  • Think of best possible method
  • Look for software, test it
  • Do not forget about the context

Wednesday, November 23, 11 Wednesday, November 23, 11

Good luck with your analysis!

Wednesday, November 23, 11