Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - - PowerPoint PPT Presentation

visualizing encode data in the ucsc genome browser
SMART_READER_LITE
LIVE PREVIEW

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, - - PowerPoint PPT Presentation

Visualizing ENCODE Data in the UCSC Genome Browser Pauline Fujita, Ph.D. UCSC Genome Bioinformatics Group Training Resources genome@soe.ucsc.edu Genomewiki: genomewiki.ucsc.edu Mailing list archives: genome.ucsc.edu/FAQ/ Training page:


slide-1
SLIDE 1

Visualizing ENCODE Data in the UCSC Genome Browser

Pauline Fujita, Ph.D.

UCSC Genome Bioinformatics Group

slide-2
SLIDE 2

Training Resources

genome@soe.ucsc.edu

  • Genomewiki: genomewiki.ucsc.edu
  • Mailing list archives:

genome.ucsc.edu/FAQ/

  • Training page:

genome.ucsc.edu/training.html

  • Twitter

@GenomeBrowser

  • Tutorial videos: YouTube channel
  • Open Helix: openhelix.com/ucsc
slide-3
SLIDE 3

Outline

  • Basics: search, display, more info
  • Tools for finding ENCODE data
  • Annotating a BED file: RNAseq example
  • Annotating a VCF file
  • Track Hubs: What are they? How do I

make one?

  • Exercises
slide-4
SLIDE 4

Basic Navigation: Main Display

genome.ucsc.edu/cgi-bin/hgTracks?db=hg19

slide-5
SLIDE 5

Display Configuration

  • Visibility:

hide, dense, squish, pack, full

  • Track ordering: drag and drop
  • Drag and zoom/highlighting
  • Configuration page
  • Right click menu
slide-6
SLIDE 6

How to find more info

Track Description Item Description

slide-7
SLIDE 7

More info: Track Description

slide-8
SLIDE 8

More info: Item Description

slide-9
SLIDE 9

ENCODE

slide-10
SLIDE 10

ENCODE: Super-track Settings

slide-11
SLIDE 11

ENCODE: Track Settings

slide-12
SLIDE 12

ENCODE: Item Details

slide-13
SLIDE 13

ENCODE Tools

slide-14
SLIDE 14

ENCODE

ENCODE

genome.ucsc.edu/ENCODE/

slide-15
SLIDE 15

ENCODE: Experiment Matrix

slide-16
SLIDE 16

ENCODE: ChIP-Seq Matrix

slide-17
SLIDE 17

ENCODE: Experiment Summary

slide-18
SLIDE 18

ENCODE: Track Search

slide-19
SLIDE 19

A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G G A A C CTT T C G A T C T C C T

File Formats

Scale chr2: GM12878 Ht 2 GM12878 Pk 2 2 kb hg19 191,876,000 191,877,000 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes STAT1 STAT1 STAT1 STAT1 STAT1 A G

  • T

C C T T C G C T GM12878 Sg 2 100 _ 1 _ K562 Z143 IgR 40 _ 3 _

bit.ly/fileformatsession BED

wig(gle)

BAM VCF

slide-20
SLIDE 20

Scale chr2: GM12878 Ht 2 GM12878 Pk 2 2 kb 191,876,000 191,877,000 STAT1 STAT1 STAT1 STAT1 STAT1 A G G A A C CTT

  • T

C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G C T T C T C C T GM12878 Sg 2 100 _ 1 _ K562 Z143 IgR 40 _ 3 _

File Formats

hg19 191,878,000 191,879,000 191,880,000 191,881,000 Basic Gene Annotation Set from GENCODE Version 19 DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes

BED

wig(gle)

BAM VCF

Positional annotations. (ex. Regions w/: enriched ChIP-seq signal for TF binding, Δ’l methylation, splice jxns from RNA-seq) Continuous signal data. # of reads (ex. DNase I HS and ChIP-seq signals) Alignments of seq. reads, mapped to genome (ex. RNA- seq alignments) Variation data: SNPs, indels, Copy Number Variants, Structural Variants (ex. ExAC data)

slide-21
SLIDE 21

Scale chr2: GM12878 Ht 2 GM12878 Pk 2 2 kb 191,876,000 191,877,000 STAT1 STAT1 STAT1 STAT1 STAT1 A G G A A C CTT

  • T

C G A T C C T T C G A A G G A C T G A A C T C A G G C A G A G T C C A A G A G T C G C A G C T T C T C C T GM12878 Sg 2 100 _ 1 _ K562 Z143 IgR 40 _ 3 _ DNaseI Hypersensitivity by Digital DNaseI from ENCODE/University of Washington K562 Znf143 IgG-rab ChIP-seq Peaks from ENCODE/SYDH K562 polyA+ IFNa30 RNA-seq Alignments from ENCODE/SYDH 191,879,000

Indexed File Formats

191,878,000 Basic Gene Annotation Set from GENCODE Version 19 GM12878 DNaseI HS Raw Signal Rep 2 from ENCODE/UW K562 TFBS Uniform Peaks of Znf143_(16618-1-AP) from ENCODE/Stanford/Analysis K562 Znf143 IgG-rab ChIP-seq Signal from ENCODE/SYDH Exome Aggregation Consortium (ExAC) - Variants from 60,706 Exomes hg19 191,880,000 191,881,000

BED bigBed

wig(gle)

bigWig BAM VCF

slide-22
SLIDE 22

Indexed File Formats

  • Only displayed portions of files

transferred to UCSC

  • Display large files (would time out)
  • File + index on your web-accessible

server (http, https, or ftp)

  • Faster display
  • More user control
slide-23
SLIDE 23

File Formats

slide-24
SLIDE 24

File Formats

slide-25
SLIDE 25

File Formats

slide-26
SLIDE 26

File Formats

www.encodeproject.org/help/file-formats/

Help File formats

slide-27
SLIDE 27

Custom Tracks

slide-28
SLIDE 28

Custom Tracks

genome.ucsc.edu/cgi-bin/hgCustom

slide-29
SLIDE 29

Custom Tracks

genome.ucsc.edu/cgi-bin/hgCustom

track name=”BED_custom_track” chr7 127471196 127472363 Gene1

slide-30
SLIDE 30

Annotating your data: BED

Tools

Data Integrator

slide-31
SLIDE 31

Data Integrator

genome.ucsc.edu/cgi-bin/hgIntegrator

slide-32
SLIDE 32

Data Integrator

slide-33
SLIDE 33

Data Integrator

slide-34
SLIDE 34

http://genome.ucsc.edu/cgi-bin/hgIntegrator?hgsid=43297266... 1 of 1 6/26/15, 3:20 PM

Data Integrator

#ct_SYDHTFBS_4733.chrom ct_SYDHTFBS_4733.chromStart ct_SYDHTFBS_4733.chromEnd ct_SYDHTFBS_4733.name ct_SYDHTFBS_4733.score wgEncodeGencodeBasicV19.name wgEncodeGencodeBasicV19.name2 chr21 33031473 33032186 . 608 ENST00000449339.1 AP000253.1 chr21 33031473 33032186 . 608 ENST00000270142.6 SOD1 chr21 33031473 33032186 . 608 ENST00000389995.4 SOD1 chr21 33031473 33032186 . 608 ENST00000470944.1 SOD1

slide-35
SLIDE 35

Annotating your VCF file

  • 1. Make a VCF custom track
  • 2. Go to the Variant Annotation Integrator
  • 3. Choose your track
  • 4. Add annotations
slide-36
SLIDE 36

Remotely Hosted Custom Tracks

  • Put data file (bigBed/bigWig/BAM/VCF, etc)

in internet accessible location

  • Must have: 1. track info, 2. bigDataUrl
  • VCF example:

track type=vcfTabix name="VCF_Example" description="VCF Ex. 1: 1000 Genomes phase 1 interim SNVs" bigDataUrl= http://hgwdev.cse.ucsc.edu/~pauline/presentations/ vcfExample.vcf.gz

slide-37
SLIDE 37

Variant Annotation Integrator

  • Upload pgSnp or VCF custom track
  • Associate UCSC annotations with your

uploaded variant calls

  • Add dbSNP info if dbSNP identifier

found

  • Select custom track and VAI options

37

slide-38
SLIDE 38

Variant Annotation Integrator

Tools Variant Annotation Integrator

slide-39
SLIDE 39

Variant Annotation Integrator

genome.ucsc.edu/cgi-bin/hgVai

slide-40
SLIDE 40

Track Data Hubs


  • Remotely hosted
  • Data persistence
  • File formats:

bigBED, bigWig, BAM, VCF

  • Track organization:

groups, supertracks

  • multiWigs
  • Assembly hubs
slide-41
SLIDE 41

Track Hubs

My Data Track Hubs

slide-42
SLIDE 42

Track Hubs

genome.ucsc.edu/cgi-bin/hgHubConnect

My Data Track Hubs

slide-43
SLIDE 43

My Hubs

genome.ucsc.edu/cgi-bin/hgHubConnect

My Data Track Hubs

slide-44
SLIDE 44

Make Your Own Track Hub

You will need:

  • Data (compressed binary index formats:

bigBed, bigWig, BAM, VCF)

  • Text files to define properties of the

track hub

  • Internet-enabled web/ftp server
  • Assembly Hubs:

a twoBit sequence file

slide-45
SLIDE 45

Track Hubs

genome.ucsc.edu/cgi-bin/hgHubConnect

My Data Track Hubs

myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included hg19/ - directory of data for the hg19 human assembly Data files! BAM, bigBed, bigWig, VCF

slide-46
SLIDE 46

An Example Assembly Hub

An Arabidopsis hub: http://genome-test.cse.ucsc.edu/ ~pauline/hubs/Plants/hub.txt

slide-47
SLIDE 47

Acknowledgements

UCSC Ge UCSC Geno nome me Br Browse wser t r team am

– Da David Hau vid Haussle ssler – co r – co-PI

  • PI

– Jim K im Kent – Br nt – Browse wser Co r Conce ncept, BLA pt, BLAT, T , Team Le am Leade ader, PI , PI – Bo Bob K b Kuhn hn –

– Asso

Associat ciate Dire Direct ctor, Ou , Outre treach – co ach – co-PI

  • PI

– Do Donna K nna Kar arolchik lchik, Ann Z , Ann Zweig – Pr ig – Proje ject Manage ct Manageme ment nt Engine Engineering ring QA QA, Do , Docs, Su cs, Suppo pport t Sys-admins Sys-admins

Angie Hinrichs Katrina Learned Jorge Garcia Pauline Fujita Erich Weiler Kate Rosenbloom Hiram Clawson Luvina Guruvadoo Gary Moro Steve Heitner Galt Barber Brian Raney Brian Lee Max Haeussler Jonathan Caspar Matt Speir

slide-48
SLIDE 48

THE GB TEAM

UC Santa Cruz Genomics Institute

slide-49
SLIDE 49

Funding Sources

Na Nation tional Huma l Human Gen Genome R

  • me Resea

esearch In h Institut stitute (NHGRI) e (NHGRI) Na Nation tional Ca l Cancer cer In Institut stitute (NCI) e (NCI) Na Nation tional In l Institut stitute f e for

  • r Den

Denta tal a l and d Cr Cranio-F

  • Facia

cial R l Resea esearch (NIDCR) h (NIDCR) Na Nation tional In l Institut stitute f e for

  • r Child Hea

Child Health a lth and Huma d Human De Developmen elopment (NICHD) t (NICHD) QB3 ( QB3 (UCB UCBerkele ley, UCSF , UCSF, UCSC) , UCSC) Amer America ican R Reco ecover ery a y and R d Rein einvestmen estment A t Act (ARRA) stimulus fun ct (ARRA) stimulus funds ds

UC Santa Cruz Genomics Institute

slide-50
SLIDE 50

genome.ucsc.edu 


THANK YOU!

UC Santa Cruz Genomics Institute

slide-51
SLIDE 51

Exercises

  • 1. Load example BED and VCF tracks via url
  • 2. Look at custom track data by pasting url into a

web browser.

  • 3. Annotate the TFBS custom track using the Data

Integrator.

  • 4. Annotate the VCF custom track using the Variant

Annotation Integrator.

slide-52
SLIDE 52

Exercise 1

Load example BED and VCF tracks via url

  • 1. Go to the Custom tracks menu
  • My Data -> Custom Tracks
  • 2. Input this url: http://bit.ly/customtracks (note that you

must include the ”http” part of this url or you will get an error) and click [submit].

  • 3. Click the [Go to genome browser] button.
  • 4. Once in the main Browser, jump to this position:
  • chr21:33,034,804-33,037,719
  • 5. See if you can drag your 2 custom tracks to the top of

the display

slide-53
SLIDE 53

Exercise 2

Exploring your BED and VCF tracks

  • 1. Now that you have 2 custom tracks loaded, take a look

at the data by pasting that same url into a web browser:

  • 2. These custom tracks are actually data copied from some

existing tracks, see if you can find them, turn them on, and observe that the original tracks and custom tracks look the same in the browser:

  • Track 1 (BED format): Group (Regulation), Super

Track (ENC TF Binding), Track (SYDH TFBS)

  • Track 2 (VCF format): Group (Variation), Track

(1000G Ph1 Vars)

  • 3. Navigate to this position for best comparison (esp. for the

VCF track): chr21:33,034,804-33,037,719

slide-54
SLIDE 54

Exercise 3

Annotate your BED with the Data Integrator

  • 1. Go to the Data Integrator
  • 2. Once there select:
  • 1. Region to annotate: chr21:33031597-33041570
  • 2. Add data source: group (custom tracks), track (SYDH…) [click

add]

  • 3. Now choose which annotations you want to add by [add]ing more

tracks to the list – ex:

  • 1. Find the genes that overlap with your regions: group (Genes and

Gene Prediction), track (GENCODE V19), view (Genes), subtrack (Basic) [add]

  • 2. Find the SNPs that overlap with your regions: group (Variation),

track (Common SNPs) [add] Choose which fields to include in your output: Output options -> Choose fields [Done] -> [get output]

slide-55
SLIDE 55

Exercise 4

Annotate your VCF with the Variant Annotation Integrator

  • 1. Go to the Variant Annotation Integrator
  • Tools -> V.A.I.
  • 2. Select Variants:
  • Variants: “VCF Ex. 1…”
  • 3. Now choose which annotations you want to add:
  • To determine which gene regions your variants fall into, select a

gene track (Select Genes = “Basic Gene Annotation Set… GENCODE”)

  • Add regulatory annotations: Under “Select Regulatory

Annotations” click the “+” button to choose which TFs to include (or select none to include all binding sites)

slide-56
SLIDE 56

Bonus Material!

slide-57
SLIDE 57

Where to search

genome.ucsc.edu/cgi-bin/hgGateway

slide-58
SLIDE 58

Where to search

genome.ucsc.edu/cgi-bin/hgGateway

slide-59
SLIDE 59

Where to search: Main Browser

genome.ucsc.edu/cgi-bin/hgTracks

slide-60
SLIDE 60

Public Hubs

My Data Track Hubs

slide-61
SLIDE 61

Where to search

genome.ucsc.edu/cgi-bin/hgHubConnect

slide-62
SLIDE 62

Track search

slide-63
SLIDE 63

Track search

slide-64
SLIDE 64

Track search