SLIDE 1
SLIDE 2 In The Beginning
Data. Lots of it.
SLIDE 3 In The Beginning
- Goal. Build a web-based interface on top
- f a fast backend to help navigate and
explore the data
esv
SLIDE 4
Origin: Prototype
SLIDE 5
Origin: Challenges
Linking: All views should be interactive
SLIDE 6
Origin: Challenges
Scalability: Creating, editing, and linking should be fast to drive data discovery
SLIDE 7
Origin: Challenges
Interface: Exploring data should be natural, informative, and easy to follow
SLIDE 8 Structures
View View Filter Data Filter Data
Visual Representation
- n genomic positions, genes
- n data parameters (eg. threshold, experiment type)
Underlying data source (ie. by sample ID, project)
SLIDE 9 Progress
Major Highlights
- Redesigned interface and editor
- New query engine
- Improved views / visualizations to support linking and interaction
- Supertable
- Data denormalization contributions
SLIDE 10
Live Demo
ESV Demonstration
SLIDE 11 Data Denormalization: Why?
- ElasticSearch is an extremely fast text-search engine -
but it is schema-free ○ No set column names, no defined structure
- How do we find relations then?
SLIDE 12 Data Denormalization: Why?
TITAN Dataset Mutationseq Dataset
How do we know which mutations fall within which copy number alteration given a given genomic coordinate?
SLIDE 13
Data Denormalization: How?
MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN ...
SLIDE 14
Data Denormalization: How?
MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: {...} ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN
SLIDE 15 Data Denormalization: How?
Mutationseq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: { chrom: 01 start: 103,062 end: 109,114 state: GAIN ... }
- Unlike Facebook or Twitter, our
data is mainly static
- Exploit ElasticSearch’s very fast
query term search
- Ask questions like: Find me all the
TITAN segments that overlap a particular MutationSeq event
SLIDE 16
Data Denormalization: Result
SLIDE 17 To Infinity and Beyond
- Applications to other areas of research and/or industry
in the future, as ESV was designed to be as general as possible
- Addition of new datasets/datatypes (ie. single sample
MutationSeq)
- User contributed views and additional default views
SLIDE 18 Summary
Over the past 3 months:
- Redesigned interface to support integration of complex views
- Added support to easily add new views
- Realtime search and filtering through ElasticSearch
- Integrated and improved views/visualizations
- Used denormalized data to support linking between any number
- f views
http://cbioportal.mo.bccrc.ca:8000/
SLIDE 19
Acknowledgements
Sohrab Shah Cydney Nielsen Development Team Daniel Machev Kelsey Hamer Ali Bashashati Kevin Wagner Shah Lab