Gene Regulation Bioinformatics Wyeth W. Wasserman University of - - PowerPoint PPT Presentation

gene regulation bioinformatics
SMART_READER_LITE
LIVE PREVIEW

Gene Regulation Bioinformatics Wyeth W. Wasserman University of - - PowerPoint PPT Presentation

Gene Regulation Bioinformatics Wyeth W. Wasserman University of British Columbia www.cisreg.ca The Grand Challenge: Reliably Define Cis-Regulatory Mechanisms of Regulons CLUSTERING EXPRESSION DATA SEQUENCE ANALYSIS Lake Barkley 2006 2


slide-1
SLIDE 1

Gene Regulation Bioinformatics

Wyeth W. Wasserman

University of British Columbia

www.cisreg.ca

slide-2
SLIDE 2

Lake Barkley 2006 2

The Grand Challenge: Reliably Define Cis-Regulatory Mechanisms of Regulons

EXPRESSION DATA SEQUENCE ANALYSIS CLUSTERING

slide-3
SLIDE 3

Inferring Gene Regulation from Expression Profiling Data

slide-4
SLIDE 4

Lake Barkley 2006 4

REGULATORY PATHWAY INFERENCE from CO-EXPRESSED GENES

  • What is the appeal?
  • Understand how perceived signals at surface

result in downstream changes in cell phenotype

  • TFs occasionally serve as therapeutically relevant

targets

  • PPARγ, Estrogen Receptor, Glucocorticoid Receptor
  • Builds on data from powerful profiling technologies
  • Expression profiling; ChIP-chip
slide-5
SLIDE 5

Lake Barkley 2006 5

Bioinformatics and Promoter Analysis

What can we do?

slide-6
SLIDE 6

Lake Barkley 2006 6

What can we do?

  • Predict Transcription Factor Binding Sites
slide-7
SLIDE 7

Lake Barkley 2006 7

Representing Binding Sites for a TF

  • A set of sites represented as a consensus
  • VDRTWRWWSHD (IUPAC degenerate DNA)

A 14 16 4 0 1 19 20 1 4 13 4 4 13 12 3 C 3 0 0 0 0 0 0 0 7 3 1 0 3 1 12 G 4 3 17 0 0 2 0 0 9 1 3 0 5 2 2 T 0 2 0 21 20 0 1 20 1 4 13 17 0 6 4

  • A matrix describing a set of sites:
  • A single site
  • AAGTTAATGA

Set of binding sites AAGTTAATGA CAGTTAATAA GAGTTAAACA CAGTTAATTA GAGTTAATAA CAGTTATTCA GAGTTAATAA CAGTTAATCA AGATTAAAGA AAGTTAACGA AGGTTAACGA ATGTTGATGA AAGTTAATGA AAGTTAACGA AAATTAATGA GAGTTAATGA AAGTTAATCA AAGTTGATGA AAATTAATGA ATGTTAATGA AAGTAAATGA AAGTTAATGA AAGTTAATGA AAATTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA Set of binding sites AAGTTAATGA CAGTTAATAA GAGTTAAACA CAGTTAATTA GAGTTAATAA CAGTTATTCA GAGTTAATAA CAGTTAATCA AGATTAAAGA AAGTTAACGA AGGTTAACGA ATGTTGATGA AAGTTAATGA AAGTTAACGA AAATTAATGA GAGTTAATGA AAGTTAATCA AAGTTGATGA AAATTAATGA ATGTTAATGA AAGTAAATGA AAGTTAATGA AAGTTAATGA AAATTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA AAGTTAATGA Logo – A graphical representation of frequency

  • matrix. Y-axis is information

content , which reflects the strength of the pattern in each column of the matrix

slide-8
SLIDE 8

Lake Barkley 2006 8

TGCTG = 0.9

Conversion of PFM to Position Specific Scoring Matrix (PSSM)

Add the following features to the matrix profile:

  • 1. Correct for nucleotide frequencies in genome
  • 2. Weight for the confidence (depth) in the pattern
  • 3. Convert to log-scale probability for easy arithmetic

A 5 0 1 0 0 C 0 2 2 4 0 G 0 3 1 0 4 T 0 0 1 1 1 A 1.6 -1.7 -0.2 -1.7 -1.7 C -1.7 0.5 0.5 1.3 -1.7 G -1.7 1.0 -0.2 -1.7 1.3 T -1.7 -1.7 -0.2 -0.2 -0.2

pfm pssm Log(

)

f(b,i)+ s(n) p(b)

slide-9
SLIDE 9

Lake Barkley 2006 9

What can we do?

  • Predict TFBS
  • Predict Cis-Regulatory Modules
slide-10
SLIDE 10

Lake Barkley 2006 10

Combinatorial interactions between TFs

slide-11
SLIDE 11

Lake Barkley 2006 11

CRM Models

Trained models take as input a set of TF binding profiles and return significant clusters of TFBS

  • 0.2

0.2 0.4 0.6 0.8 1 100 510 920 1330 1740 2150 2560 2970 3380 3790 4200 4610 5020 5430 5840

slide-12
SLIDE 12

Lake Barkley 2006 12

What can we do?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
slide-13
SLIDE 13

Lake Barkley 2006 13 % I dentity

Actin gene compared between human and mouse

200 bp Window Start Position (human sequence)

Phylogenetic Footprinting

slide-14
SLIDE 14

Lake Barkley 2006 14

What can we do?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
  • Motif Over-Representation
slide-15
SLIDE 15

Lake Barkley 2006 15

Co-Expressed Controls

Deciphering Regulation of Co- Expressed Genes

slide-16
SLIDE 16

Lake Barkley 2006 16

  • POSSUM Procedure

Set of co- expressed or co-precipitated genes Automated sequence retrieval from EnsEMBL Phylogenetic Footprinting Detection of transcription factor binding sites Statistical significance of binding sites Putative mediating transcription factors

ORCA ORCA

slide-17
SLIDE 17

Lake Barkley 2006 17

Statistical Methods for Identifying Over-represented TFBS

  • Z scores

– Based on the number of occurrences of the TFBS relative to background – Normalized for sequence length – Simple binomial distribution model

  • Fisher exact probability scores

– Based on the number of genes containing the TFBS relative to background – Hypergeometric probability distribution

slide-18
SLIDE 18

Lake Barkley 2006 18

Validation using Reference Gene Sets

TFs with experimentally-verified sites in the reference sets.

2.97e-01 3.286 10 COUP-TF 2.93e-01 3.353 10 HNF-1 1.69e-01 3.477 9 Irf-1 4.97e-02 4.485 9 Thing1-E47 1.61e-02 3.821 8 S8 2.63e-01 5.245 8 Irf-1 1.16e-01 4.070 7 Yin-Yang 2.93e-01 5.874 7 S8 4.20e-01 4.229 6 SOX17 1.09e-02 10.88 6 deltaEF1 4.66e-02 4.494 5 HNF-3beta 2.87e-03 11.22 5 TEF-1 1.60e-01 7.101 4 FREAC-4 3.83e-03 13.54 4 Myf 1.22e-01 9.822 3 Sox-5 1.25e-03 14.41 3 c-MYB_1 9.50e-03 11.00 2 HLF 8.05e-04 18.12 2 MEF2 8.83e-08 38.21 1 HNF-1 1.18e-02 21.41 1 SRF Fisher Z-score Rank Fisher Z-score Rank

  • B. Liver-specific (20 input; 12 analyzed)
  • A. Muscle-specific (23 input; 16 analyzed)
slide-19
SLIDE 19

Lake Barkley 2006 19

Empirical Selection of Parameters based

  • n Reference Studies
  • 20
  • 10

10 20 30 40 1.0E-09 1.0E-07 1.0E-05 1.0E-03 1.0E-01 Fisher p-value Z-score Muscle Liver NF-κB Z-score cutoff Fisher cutoff p65 c-Rel p50 NF-κB HNF-1 SRF TEF-1 MEF2 FREAC-2 Myf cEBP SP1 HNF-3β

slide-20
SLIDE 20

Lake Barkley 2006 20

C-Myc SAGE Data

  • c-Myc transcription factor dimerizes with the Max

protein

  • Key regulator of cell proliferation, differentiation and

apoptosis

  • Menssen and Hermeking identified 216 different

SAGE tags corresponding to unique mRNAs that were induced after adenoviral expression of c-Myc in HUVEC cells

  • They then went on to confirm the induction of 53

genes using microarray analysis and RT-PCR

slide-21
SLIDE 21

25 1.11e-01 10.17 10 bHLH Ahr-ARNT 19 3.88e-03 10.92 9 ETS Elk-1 20 1.55e-01 11.11 8 bHLH ARNT 20 1.55e-01 11.11 7 bHLH-ZIP n-MYC 12 4.40e-02 11.68 6 ZN-FINGER, C2H2 SP1 16 1.84e-01 11.90 5 bHLH-ZIP USF 13 1.61e-04 13.23 4 ETS SAP-1 12 2.16e-02 18.32 3 bHLH-ZIP Max 2 1.70e-02 20.17 2 ZN-FINGER, C2H2 Staf 7 5.35e-03 21.68 1 bHLH-ZIP Myc-Max

  • No. Genes

Fisher Z-score Rank TF Class

Induced Genes after Ectopic Expression of c-Myc (SAGE) (53 input; 36 analyzed)

slide-22
SLIDE 22

Lake Barkley 2006 22

C-Fos Microarray Experiment

  • In a study examining the role of

transcriptional repression in oncogenesis, Ordway et al. compared the gene expression profiles of fibroblasts transformed by c-fos to the parental 208F rat fibroblast cell line

  • We mapped the list of 252 induced Affymetrix

Rat Genome U34A GeneChip sequences to 136 human orthologs

slide-23
SLIDE 23

Lake Barkley 2006 23

15 7.67e-02 2.965 5 Unknown E2F 10 1.25e-01 3.626 4 bZIP CREB 1 2.98e-01 3.991 3 NUCLEAR RECEPTOR PPARgamma-RXRal 1 1.41e-01 8.899 2 ZN-FINGER, C2H2 RREB-1 45 2.60e-05 17.53 1 bZIP c-FOS

  • No. Genes

Fisher Z-score Rank TF Class

Induced Genes after Ectopic Expression of c-Fos (Affymetrix) (136 input; 86 analyzed)

slide-24
SLIDE 24

Lake Barkley 2006 24

NF-кB inhibition microarray study

slide-25
SLIDE 25

Lake Barkley 2006 25

92 1.66e-03 12.05 11 FORKHEAD FREAC-4 1 9.92e-02 13.2 10 PAIRED Bsap 111 2.29e-03 13.66 9 HOMEO Nkx 19 2.23e-03 14.72 8 REL p50 126 2.56e-02 15.38 7 HMG Sox-5 23 9.55e-04 15.4 6 TRP-CLUSTER Irf-1 135 1.23e-03 16.59 5 ETS SPI-B 6 5.74e-04 20.39 4 TRP-CLUSTER Irf-2 63 8.59e-08 26.02 3 REL c-REL 61 5.82e-11 32.58 2 REL NF-kappaB 62 5.66e-12 36.57 1 REL p65

  • No. Genes

Fisher Z-score Rank TF Class

Genes significantly down-regulated by the NF-κB pathway inhibitor (326 input; 179 analyzed)

slide-26
SLIDE 26

Lake Barkley 2006 26

Identifying over-represented pairs of TFBSs in co-expressed genes

d d Calculate a Fisher exact probability that the pair of sites is

  • ver-represented

Correct for multiple testing Background Target

slide-27
SLIDE 27

Lake Barkley 2006 27

cluster motif1 motif2 Hits No hits Hits No hits p-value Adjusted

4 CSRE STRE 15 46 362 6311 8.33E-07 6.49E-04 4 CSRE GCR1 43 18 2881 3792 1.62E-05 1.26E-02 7 STRE ADR1P 67 262 835 5838 6.38E-05 4.97E-02 7 STRE PHO2 70 259 881 5792 5.63E-05 4.39E-02 7 STRE TBP 69 260 868 5805 6.36E-05 4.96E-02 7 STRE UASPHR 55 274 628 6045 3.77E-05 2.94E-02 7 STRE GCR1 68 261 813 5860 1.58E-05 1.23E-02 8 STRE CAR1_r 25 150 372 6301 2.24E-05 1.75E-02 16 PAC RRPE 188 293 1958 4715 6.54E-06 5.10E-03 16 RRPE XBP1 424 57 5354 1319 5.11E-06 3.98E-03 16 RRPE SCB 411 70 5121 1552 2.78E-06 2.17E-03 16 RRPE PHO2 425 56 5388 1285 9.28E-06 7.24E-03 16 RRPE ROX1 273 208 3056 3617 2.09E-06 1.63E-03 16 RRPE TBP 425 56 5362 1311 3.74E-06 2.92E-03 16 RRPE FKH1 404 77 5097 1576 4.72E-05 3.68E-02 17 LYS14 RRPE 31 23 1857 4816 5.47E-06 4.27E-03 18 PAC RRPE 152 206 1958 4715 1.98E-07 1.55E-04 18 RAP1 RRPE 204 154 2901 3772 3.91E-07 3.05E-04 18 RRPE XBP1 326 32 5354 1319 3.08E-08 2.40E-05 18 RRPE SCB 309 49 5121 1552 6.59E-06 5.14E-03 18 RRPE PHO2 325 33 5388 1285 2.38E-07 1.86E-04 18 RRPE TBP 323 35 5362 1311 5.07E-07 3.96E-04 18 RRPE UASPHR 256 102 4051 2622 2.02E-05 1.57E-02 18 RRPE FKH1 312 46 5097 1576 4.20E-07 3.28E-04

Target Background

Over-represented Pairs of Sites in Yeast Fermentation Clusters

slide-28
SLIDE 28

Lake Barkley 2006 28

  • POSSUM Server
slide-29
SLIDE 29

Lake Barkley 2006 29

The Hidden Jewel

slide-30
SLIDE 30

Lake Barkley 2006 30

What can we do?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
  • Motif Over-Representation
  • Motif Discovery
slide-31
SLIDE 31

Lake Barkley 2006 31

Gibbs Sampling

(grossly over-simplified)

tgacttcc tgctacct agacctca ctgtagtg acgcatct cgatacgc ttcgctcc

1 2 3 4 5 6 7 8 A 2 0 2 2 2 1 0 1 C 0 2 3 3 2 1 6 2 G 0 4 1 0 1 0 1 1 T 4 1 1 2 2 5 0 2

slide-32
SLIDE 32

Lake Barkley 2006 32

There are problems…

Exploring limitations

slide-33
SLIDE 33

Lake Barkley 2006 33

Why can’t we do better?

  • Predict TFBS
slide-34
SLIDE 34

Lake Barkley 2006 34

Futility Conjuncture

Human Cardiac α-Actin gene analyzed with the JASPAR set of profiles

(each vertical line represents a TFBS prediction)

Futility Conjuncture: TFBS predictions are almost always wrong

Red boxes are protein coding exons - TFBS predictions excluded in this analysis

slide-35
SLIDE 35

Lake Barkley 2006 35

Why can’t we do better?

  • Predict TFBS
  • Predict CRMs
slide-36
SLIDE 36

Lake Barkley 2006 36

Cis-regulatory modules (CRMs) for specific expression in hepatocytes

slide-37
SLIDE 37

Lake Barkley 2006 37

Why can’t we do better?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
slide-38
SLIDE 38

Lake Barkley 2006 38

Regulatory Resolution Varies Widely Between Genes

Gene: NR2E1

slide-39
SLIDE 39

Lake Barkley 2006 39

Why can’t we do better?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
  • Motif Over-Representation
slide-40
SLIDE 40

Lake Barkley 2006 40

Ets TF Family

Structural classes of TFs often bind identical target sequences – we cannot specify which TF interacts with a motif.

slide-41
SLIDE 41

Lake Barkley 2006 41

Challenges for Motif Over- Representation

  • Methods fail when noise (genes not co-

regulated) exceeds 20-50%

  • Most expression profiling experiments are not

sufficiently resolved to identify such co- regulated clusters

  • Works well for studies linked to a primary TF response,

but fail over long time periods or complex (multi-pathway) responses

slide-42
SLIDE 42

Lake Barkley 2006 42

Why can’t we do better?

  • Predict TFBS
  • Predict CRMs
  • Phylogenetic Footprinting
  • Motif Over-Representation
  • Motif Discovery
slide-43
SLIDE 43

Lake Barkley 2006 43

Applied Pattern Discovery is Acutely Sensitive to Noise

True Mef2 Binding Sites

10 12 14 16 18 100 200 300 400 500 600

SEQUENCE LENGTH PATTERN SIMILARITY

  • vs. TRUE MEF2 PROFILE

Pink line is negative control with no Mef2 sites included

slide-44
SLIDE 44

Lake Barkley 2006 44

The Signal-to-Noise Battle

  • Background models
  • Phylogenetic footprinting
  • Motif combinations
  • Familial Binding Profiles
  • Concurrent motif discovery and expression

clustering

slide-45
SLIDE 45

Lake Barkley 2006 45

Where are we going now?

Snippets of Active Projects

slide-46
SLIDE 46

Lake Barkley 2006 46

An impending transition in promoter analysis…

  • Transitions in promoter analysis algorithms

separated by periods of slow progress

  • Focus on same tired reference collections using

progressively more convoluted algorithms

  • Advances can be triggered from new data

producing technologies, but more commonly from adopting principles well-known to laboratory researchers

  • CpG islands; CRMs; phylogenetic footprinting
  • The next transition: Incorporating data

from laboratory studies

slide-47
SLIDE 47

Lake Barkley 2006 47

Informed Motif Discovery

Enhance the Signal

  • r

Reduce the Noise

slide-48
SLIDE 48

Lake Barkley 2006 48

Informed Initial Choice

slide-49
SLIDE 49

Lake Barkley 2006 49

slide-50
SLIDE 50

Lake Barkley 2006 50

FBPs enhance sensitivity of pattern detection

slide-51
SLIDE 51

Lake Barkley 2006 51

A new direction?

  • Laboratory (WET) data indicating the

locations of regulatory regions and/or specific TFBS can constrain the motif discovery process to improve the success rate

  • Extension – We should be able to

determine how much WET data is required for successful prediction

slide-52
SLIDE 52

Lake Barkley 2006 52

TF binding data rod-specific genes METHOD predicted regulatory regions ( ) ( ) ( ) ( ) ( ) METHOD identification of overrepresented patterns corresponding to putative TFBS ( )

slide-53
SLIDE 53

Co-expressed genes Retrieve

  • rthologs

Align sequences Phylogenetic footprinting Prior prob of being part of a RR Prior prob of being part of a TFBS 2) Sample sites within regions 1) Sample regions Known RR Known TFBS Profile for known TF

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

Pattern discovery algorithm CRMs, TFBS and profiles

Knowledge Directed CRM Discovery

slide-54
SLIDE 54

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

ROC curve (exons excluded)

windows = 10 windows = 20 windows = 50 windows = 100 windows = 200 windows = 300

1 †“ specificity sensitivity 1 - specificity

slide-55
SLIDE 55

Lake Barkley 2006 55

Software Just Finished

  • Test all forms of prior knowledge
  • CRM Length
  • Locations of Known CRMs
  • Location of Known TFBS
  • PSSMs for Contributing TFs
  • Etc
  • A limitation - Where to get organized prior

data?

slide-56
SLIDE 56

Open-access regulatory sequence repository – an information mall

Stefan Kirov Elodie Portales-Casamar Jonathan Lim Jay Snoddy

slide-57
SLIDE 57

Lake Barkley 2006 57

PAZAR

Grand Bazaar, Istanbul

slide-58
SLIDE 58

Lake Barkley 2006 58

JASPAR: AN OPEN-ACCESS DATABASE OF TF BINDING PROFILES

slide-59
SLIDE 59

Lake Barkley 2006 59

slide-60
SLIDE 60
slide-61
SLIDE 61

Lake Barkley 2006 61

slide-62
SLIDE 62
slide-63
SLIDE 63
slide-64
SLIDE 64

Lake Barkley 2006 64

slide-65
SLIDE 65

Lake Barkley 2006 65

COHO

slide-66
SLIDE 66

Lake Barkley 2006 66

Retrieval/Browsing Interface

slide-67
SLIDE 67

Status

  • PAZAR – Database Implemented
  • API/Perl Modules – Available
  • Streamlined Submission Interface – Available
  • COHO - In Progress
  • Release impending
  • Open-Access/Open-Software: see www.pazar.info for

details

slide-68
SLIDE 68

Lake Barkley 2006 68

Putting It All Together

slide-69
SLIDE 69

Lake Barkley 2006 69

TF Candidate Assessment Project and Tool (TF CAT)

Debra Fulton and Wyeth Wasserman (UBC) Jared Roach (ISB) Gwenael Breard and Tim Hughes (UoT) Sarav Sundararajan and Rob Sladek (QGC/McGill)

slide-70
SLIDE 70

Lake Barkley 2006 70

Overview

  • Project Objective: Specify All Mouse and Human TFs
  • Trans Canada collaboration to compare lists of TFs

and reach consensus

  • TF Candidate Assessment Tool (TF CAT) linked to

WIKI system for storing opinions

slide-71
SLIDE 71

Lake Barkley 2006 71

Objectives

  • To derive a comprehensive collection of

human and mouse transcription factors

  • To establish methods for extraction of new

transcription factor candidates (TFCs)

  • To design tools for the assessment of the

preliminary list of TFCs and on-going assessment of new TFCs

slide-72
SLIDE 72

Lake Barkley 2006 72

TF List Compilation Methods

  • ISB

– Mostly manually curated

  • Toronto

– Assembled cDNA collection mined with PFAM DBDs and expanded by BLAST similarity

  • McGill

– Gene Ontology Annotations – InterPro domains with TF indicated in annotation

  • UBC

– SwissProt and InterPro scanned for reference to TFs and curated set of DNA Binding Domains used to select proteins

slide-73
SLIDE 73

Lake Barkley 2006 73

U.Toronto UBC ISB McGill

3230 Candidate Mouse TFs

slide-74
SLIDE 74

Basic Domain BHLH 4 197 Basic Domain bHLH-ZIP 6 Basic Domain bHSH (helix-span-helix) 5 Basic Domain bZIP 57 Basic Domain CTF/NF-1 4 Basic Domain Helix Loop Helix 121 Beta Scaffold CCAAT 20 74 Beta Scaffold Cold-Shock Domain 15 Beta Scaffold Dwarfin 20 Beta Scaffold Rel 10 Beta Scaffold Runt Domain 2 Beta Scaffold Stat 7 Helix Turn Helix Fork Head Domain 39 299 Helix Turn Helix Homeodomain 251 Helix Turn Helix Paired Box 9 Other Bromodomain 3 70 Other Jumonji 39 Other RFX Domain 7 Other T-Box 17 Other GCM Domain 2 Other TEA 4 Other Alpha-Helix High Mobility Group 33 180 Other Alpha-Helix HMG 142 Other Alpha-Helix MADS-Box 5 Winged Helix Turn Helix E2F/dimerisation partner 10 136 Winged Helix Turn Helix ARID Domain 15 Winged Helix Turn Helix ETS Domain 65 Winged Helix Turn Helix Tryptophan Clusters 46 Zinc Coordinating Loop-Sheet-Helix 3 892 Zinc Coordinating Zinc Finger - C4HC3 74 Zinc Coordinating Zinc Finger- C2H2 37 Zinc Coordinating Zinc Finger-Beta-Beta-Alpha 15 Zinc Coordinating Zinc Finger-C2H2 689 Zinc Coordinating Zinc Finger-C2HC 5 Zinc Coordinating Zinc Finger-C4 7 Zinc Coordinating Zinc Finger-Cx4-Cyx8-Hx3-C 2 Zinc Coordinating Zinc Finger-intertwined CCHC-HCCC 7 Zinc Coordinating Zinc Finger-NF-X1 Type 3 Zinc Coordinating Zinc Finger-Steroid Receptor-C4 50

Mouse TFCs

slide-75
SLIDE 75

Lake Barkley 2006 75

Presently Reviewing 3500 Candidates and Recording Judgment

1) TF Gene - there is adequate evidence to make this

judgement

2) TF Candidate Gene - there is some evidence for

transcription factor activity but current evidence is inadequate. This might include characterization inferred through homology

3) Not a TF Gene

  • a. there is no evidence that X is a transcription factor
  • b. there is evidence (computational or experimental) that X is

not a transcription factor

slide-76
SLIDE 76

Lake Barkley 2006 76

Analysis of Variation in TFBS

ACGCATAAGTTAATGAATAACAGAT ACGCATAAGTTAATGAATAACAGAT ACGCATAAGTTAATGAATAACAGAT ACGCATAAGTTAATGAATAACAGAT ACGCATAAGTTAATGAATAACAGAT ACGCATAAGTTAACGAATAACAGAT ACGCATAAGTTAACGAATAACAGAT ACGCATAAGTTAACGAATAACAGAT ACGCATAAGTTAACGAATAACAGAT

slide-77
SLIDE 77

Sequence Variation in TFBS

TSS AaGT

URF

Koivisto et al., 1994 Familial hypercholesterolemia LDLR I DeVivo et al., 2002 Endometrial cancer PR

  • Y. Olswang et al., 2002

Obesity PEPCK J Hager et al., 1998 Leptin levels Ob KY Zwarts et al., 2002 Coronary artery disease ABCA1 H Hackstein et al., 2001 Reduced soluble IL4R IL4Ralpha JC Engert et al., 2002 Elevated Body Mass Resistin JC Knight et al., 1999 Malaria Susceptibility TNFalpha S Otabe et al., 2000 Elevated Body Mass UCP3 PJ Bosma, et al., 1995 Gilbert’s Syndrome –jaundice UGT1A1 REFERENCE DISEASE/CONDITION (associated) GENE

slide-78
SLIDE 78

Lake Barkley 2006 78

Identifying allele-specific binding site predictions

1234567890123456789012345 ACGCATAAGTTAAtGAATAACAGAT .............c...........

  • 4
  • 2

2 4 1 2 3 4 5 6 7 8 9 10 11

Swt-Smt

2 1

  • 1
  • 2
slide-79
SLIDE 79

Lake Barkley 2006 79

RAVEN screenshots

slide-80
SLIDE 80

Lake Barkley 2006 80

Final Thoughts

  • The grand challenge remains for the analysis
  • f co-regulated human genes
  • Significant progress in the past five years

suggests that we will be able to decipher regulatory mechanisms for targeted experiments

  • Numerous attractive problems remain

available for bioinformatics students

slide-81
SLIDE 81

Thanks!

  • Tim Hughes Lab
  • Jared Roach Lab
  • Rob Sladek Lab
  • Jay Snoddy
  • Stefan Kirov (ORNL)

VANDERBILT

  • CIHR
  • IBM
  • MSFHR
  • MerckFrosst
  • GenomeBC
  • GenomeCanada
  • CFI

$

  • Malin Andersson
  • Jacob Odeberg
  • Boris Lenhard (UB)
  • James Mortimer
  • Brian Kennedy
  • Hennie van Vuuren
slide-82
SLIDE 82

Lake Barkley 2006 82

The Lab

  • Dora Pak
  • David Arenillas
  • Jonathan Lim
  • Miroslav Hatas
  • Jonathan Falkowski

Contributing Alumni

  • Carol Huang (MIT)
  • Albin Sandelin (RIKEN)
  • Elodie Portales-Casamar
  • David Martin
  • Jochen Brumm
  • Alice Chou
  • Debra Fulton
  • Shannan Ho Sui
  • Andrew Kwon
  • Raf Podowski
  • Nels Thorsteinson
  • Dimas Yusuf
slide-83
SLIDE 83

Lake Barkley 2006 83

THE END

Questions?