JASPAR, TFCAT and PAZAR Wyeth W. Wasserman University of British - - PowerPoint PPT Presentation

jaspar tfcat and pazar
SMART_READER_LITE
LIVE PREVIEW

JASPAR, TFCAT and PAZAR Wyeth W. Wasserman University of British - - PowerPoint PPT Presentation

JASPAR, TFCAT and PAZAR Wyeth W. Wasserman University of British Columbia www.cisreg.ca Reg-Creative 2006 1 Defining Cis-Regulatory Mechanisms for Co-Expressed Genes CLUSTERING GENOMICS DATA SEQUENCE ANALYSIS Reg-Creative 2006 2 3 AN


slide-1
SLIDE 1

Reg-Creative 2006 1

JASPAR, TFCAT and PAZAR

Wyeth W. Wasserman

University of British Columbia

www.cisreg.ca

slide-2
SLIDE 2

Reg-Creative 2006 2

Defining Cis-Regulatory Mechanisms for Co-Expressed Genes

GENOMICS DATA SEQUENCE ANALYSIS CLUSTERING

slide-3
SLIDE 3

Reg-Creative 2006 3

JASPAR: AN OPEN-ACCESS DATABASE OF TF BINDING PROFILES

slide-4
SLIDE 4

Reg-Creative 2006 4

Data Challenges

  • Need larger and more complete collections of

TFBS Profiles and Regulatory Sequence Annotation

  • Need annotated catalog of TFs both for

evaluation of results and for selection of candidate members from families of TFs with similar target site recognition

  • Need larger compendium of reference

collections for evaluation of system performance

slide-5
SLIDE 5

Reg-Creative 2006 5

TF Catalog – Taking inventory of mouse and human TFs

Debra Fulton and Wyeth Wasserman (UBC) Jared Roach (ISB) Gwenael Breard and Tim Hughes (UoT) Sarav Sundararajan and Rob Sladek (QGC/McGill)

slide-6
SLIDE 6

Reg-Creative 2006 6

U.Toronto UBC ISB McGill

3230 Candidate Mouse TFs

slide-7
SLIDE 7

Reg-Creative 2006 7

TFCat Review Process

  • Genes reviewed:

841

  • Assign category/judgement
  • Link PMIDs for category basis
  • Set biased for TFs with available literature
  • Positive TF

82%

  • DNA Binding

63%

– Sequence-specific subset 92%

  • Independent re-review process
slide-8
SLIDE 8

Reg-Creative 2006 8

DBD Super Class Taxonomy

(Luscombe/Thornton) BASIC DOMAIN (BD) proteins which include a basic DNA binding domain region; BETA SCAFFOLD (BS) characterized by large beta sheets structures used to bind DNA ; ZINC CLUSTERING (ZC) composed of tetrahedral coordination of 1 or 2 zinc ions by conserved cysteine and histidine residues; HELIX TURN HELIX(HTH) two alpha helices connected by a beta turn or longer linkers such as loops; WINGED HELIX TURN HELIX (WHTH) extension of HTH but includes a third alpha helix and an adjacent beta sheet; OTHER ALPHA HELIX (OAH) all proteins that use alpha-helices as method for DNA binding; OTHER (O) this superclass accommodates all other DNA-binding structures

slide-9
SLIDE 9

Reg-Creative 2006 9

Extensions to Luscombe Taxonomy

  • 1.1) Homeodomain-like

– 100) Myb Domain Family

  • 1.1) Helix-Turn-Helix

– 101) GTF2I

  • 1.2) Winged Helix-Turn-Helix

– 102) Forkhead Domain Family

  • 1.2) Winged Helix-Turn-Helix

– 103) RFX Domain Family

  • 2.1) Zinc-coordinating Group

– 104) GATA Domain Family

  • 2.1) Zinc-coordinating Group

– 105) Glial Cells Missing (GCM) Domain Family

  • 2.1) Zinc-coordinating Group

– 106) SMAD MH1 Domain

  • 4) Other Alpha-Helix Group

– 28) High Mobility Group-Box Family

  • 4) Other Alpha-Helix Group

– 107) Sand Domain Family

  • 6) Beta Hairpin_Ribbon Group

– 108) Methyl-CpG-binding domain, MBD family

  • 7) Other

– 109) High Mobility Group HMG- AT-hook Family

  • 7) Other

– 110) Runt Domain Family

  • 7) Other

– 111) IPT/TIG Domain Family

slide-10
SLIDE 10

7 DNA Polymerase-Beta Family 47 Enzyme Group 8 5 Stat Protein Family 38 Other 7 7 Rel Homology Region Family 37 Other 7 8 TIG Domain Family 111 Other 7 2 Runt Domain Family 110 Other 7 1 High Mobility Group HMG-AT-hook Family 109 Other 7 10 Transcription Factor T-Domain 34 Beta-Hairpin_Ribbon 6 1 Methyl-CpG-binding domain, MBD family 108 Beta Hairpin_Ribbon Group 6 2 TATA box-binding family 30 Beta-sheet group 5 18 High Mobility Group (HMG-box Family) 28 Other Alpha-Helix Group 4 3 Sand Domain Family 107 Other Alpha-Helix Group 4 4 MADS Box Family 29 Other Alpha Helix Group 4 44 Helix-Loop-Helix Family 22 Zipper-Type Group 3 53 Leucine Zipper Family 21 Zipper-Type Group 3 1 Loop-Sheet-Helix 19 Zinc-coordinating Group 2 34 Hormone-nuclear Receptor Family 18 Zinc-coordinating Group 2 370 BetaBetaAlpha-zinc finger family 17 Zinc-coordinating Group 2 5 SMAD MH1 Domain 106 Zinc-coordinating Group 2 2 Glial Cells Missing (GCM Domain Family) 105 Zinc-coordinating Group 2 8 GATA Domain Family 104 Zinc-coordinating Group 2 15 Ets Domain Family 16 Winged Helix-Turn-Helix 1.2 8 Transcription Factor Family 15 Winged Helix-Turn-Helix 1.2 6 Interferon Regulatory Factor 13 Winged Helix-Turn-Helix 1.2 2 RFX Domain Family 103 Winged Helix-Turn-Helix 1.2 19 Forkhead Domain Family 102 Winged Helix-Turn-Helix 1.2 122 Homeodomain Family 2 Helix-Turn-Helix Group 1.1 19 Myb Domain Family 100 Helix-Turn-Helix Group 1.1 6 GTF2I 101 Helix-Turn-Helix 1.1 TF Count Family Description Family Protein Group Description Protein Group

47% 15% 12% 04%

C l a s s i f i c a t i

  • n
slide-11
SLIDE 11

Reg-Creative 2006 11

TFCat Summary

  • Collection available
  • Ongoing curation
  • Website release pending
  • Building WIKI to collect user feedback
  • Linking to PAZAR
  • Questions? Debra Fulton is here
slide-12
SLIDE 12

Reg-Creative 2006 12

Open-access regulatory sequence repository – an information mall

Elodie Portales-Casamar Jonathan Lim Stefan Kirov Jay Snoddy Wyeth Wasserman

slide-13
SLIDE 13

Reg-Creative 2006 13

Transcriptional Regulatory Element Database

Numerous Regulatory Databases – No Coordination

slide-14
SLIDE 14

Reg-Creative 2006 14

PAZAR

Grand Bazaar, Istanbul

slide-15
SLIDE 15

Reg-Creative 2006 15

Retrieval/Browsing Interface

slide-16
SLIDE 16

Reg-Creative 2006 16

Highlights

  • Available: www.pazar.info
  • All data linked to genome assemblies available in

EnsEMBL (limiting species)

  • Three project classes
  • Open – you can modify data
  • Published – you can read (and copy) everything
  • Restricted – only owner-approved users
  • Open-Access/Open-Software
  • Code in sourceforge
  • Data can be extracted from “open” and “published” projects
slide-17
SLIDE 17

Reg-Creative 2006 17

slide-18
SLIDE 18

Reg-Creative 2006 18

slide-19
SLIDE 19

Reg-Creative 2006 19

slide-20
SLIDE 20

Reg-Creative 2006 20

slide-21
SLIDE 21

Reg-Creative 2006 21

Some Statistics

  • “Restricted” but going public soon

– “PLEIADES PROJECT” NEURO GENES Regulated Genes: 77 Regulatory sequence (genomic): 303 Transcription Factors: 78 Annotated Publications: 143

  • “Published” projects include
  • JASPAR
  • Muscle
  • Liver
  • ARE collection
slide-22
SLIDE 22

Reg-Creative 2006 22

Current Efforts

  • Three full-time annotators at work
  • Pleiades collection
  • Improving annotation interface
  • Ontology links for expression
  • TFCat integration
  • Graphical display of annotations
slide-23
SLIDE 23

Reg-Creative 2006 23

PAZAR and OREGANNO

  • Different systems and intentions
  • PAZAR allows private curation projects
  • Differ in style of annotations
  • PAZAR data is not validated – you must choose data

collections that you trust

  • PAZAR is a mall; OREGANNO is a super-store
  • PAZAR allows for broad range of data
  • SELEX
  • Promoter deletion experiments
  • TF Complexes
  • Mutations
  • TSS definition/Alternative Promoters
  • Working together
  • Ontologies
  • Data exchange
slide-24
SLIDE 24

Reg-Creative 2006 24

Help?

  • Text mining tools to accelerate annotation
  • Graphical display of information in database
  • Ontology building expertise
  • Collaborative projects
  • Open to expansion and improvements to facilitate

research projects

  • Questions? Elodie Portales-Casamar is here
slide-25
SLIDE 25

Reg-Creative 2006 25

Putting It All Together

slide-26
SLIDE 26

Reg-Creative 2006 26

Thanks!

  • Jay Snoddy
  • Stefan Kirov (BMS)

VANDERBILT

  • CIHR
  • IBM
  • MSFHR
  • MerckFrosst
  • GenomeBC
  • GenomeCanada
  • CFI
  • BC Children’s

Hospital Foundation

FUNDING

  • James Mortimer
  • Brian Kennedy
  • Elodie Portales-Casamar
  • Debra Fulton
  • Jonathan Lim
  • Stuart Lithwick
  • Magdalena Swanson
  • Amy Ticoll
  • David Martin
  • David Arenillas
  • Jochen Brumm
  • Alice Chou
  • Shannan Ho Sui
  • Andrew Kwon
  • Dimas Yusuf
  • Miroslav Hatas
  • Dora Pak

THE AMAZING PEOPLE WHO DID THE WORK!