In The Beginning Data. Lots of it. eg. VCF, BAM files In The - - PowerPoint PPT Presentation

▶

May 23, 2023 221 likes •424 views

In The Beginning Data. Lots of it. eg. VCF, BAM files In The Beginning Goal. Build a web-based interface on top of a fast backend to help navigate and explore the data esv Origin: Prototype Origin: Challenges Linking : All views should be

SLIDE 1

SLIDE 2

In The Beginning

Data. Lots of it.

eg. VCF, BAM files

SLIDE 3

In The Beginning

Goal. Build a web-based interface on top
f a fast backend to help navigate and

explore the data

esv

SLIDE 4

Origin: Prototype

SLIDE 5

Origin: Challenges

Linking: All views should be interactive

SLIDE 6

Origin: Challenges

Scalability: Creating, editing, and linking should be fast to drive data discovery

SLIDE 7

Origin: Challenges

Interface: Exploring data should be natural, informative, and easy to follow

SLIDE 8

Structures

View View Filter Data Filter Data

Visual Representation

n genomic positions, genes
n data parameters (eg. threshold, experiment type)

Underlying data source (ie. by sample ID, project)

SLIDE 9

Progress

Major Highlights

Redesigned interface and editor
New query engine
Improved views / visualizations to support linking and interaction
Supertable
Data denormalization contributions

SLIDE 10

Live Demo

ESV Demonstration

SLIDE 11

Data Denormalization: Why?

ElasticSearch is an extremely fast text-search engine -

but it is schema-free ○ No set column names, no defined structure

How do we find relations then?

SLIDE 12

Data Denormalization: Why?

TITAN Dataset Mutationseq Dataset

How do we know which mutations fall within which copy number alteration given a given genomic coordinate?

SLIDE 13

Data Denormalization: How?

MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN ...

SLIDE 14

Data Denormalization: How?

MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: {...} ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN

SLIDE 15

Data Denormalization: How?

Mutationseq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: { chrom: 01 start: 103,062 end: 109,114 state: GAIN ... }

Unlike Facebook or Twitter, our

data is mainly static

Exploit ElasticSearch’s very fast

query term search

Ask questions like: Find me all the

TITAN segments that overlap a particular MutationSeq event

SLIDE 16

Data Denormalization: Result

SLIDE 17

To Infinity and Beyond

Applications to other areas of research and/or industry

in the future, as ESV was designed to be as general as possible

Addition of new datasets/datatypes (ie. single sample

MutationSeq)

User contributed views and additional default views

SLIDE 18

Summary

Over the past 3 months:

Redesigned interface to support integration of complex views
Added support to easily add new views
Realtime search and filtering through ElasticSearch
Integrated and improved views/visualizations
Used denormalized data to support linking between any number
f views

http://cbioportal.mo.bccrc.ca:8000/

SLIDE 19

In The Beginning Data. Lots of it. eg. VCF, BAM files In The - - PowerPoint PPT Presentation

In The Beginning

Data. Lots of it.

In The Beginning

explore the data

esv

Origin: Prototype

Origin: Challenges

Linking: All views should be interactive

Origin: Challenges

Scalability: Creating, editing, and linking should be fast to drive data discovery

Origin: Challenges

Interface: Exploring data should be natural, informative, and easy to follow

Structures

View View Filter Data Filter Data

Progress

Major Highlights

Live Demo

ESV Demonstration

Data Denormalization: Why?

but it is schema-free ○ No set column names, no defined structure

Data Denormalization: Why?

Data Denormalization: How?

MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN ...

Data Denormalization: How?

MutationSeq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: {...} ... TITAN sample id: DG1155 chrom: 01 start: 103,062 end: 109,114 state: GAIN

Data Denormalization: How?

Mutationseq sample id: DG1155 chrom: 01 position: 104,589 ref_allele: A alt_allele: T probability: 0.91 events: { chrom: 01 start: 103,062 end: 109,114 state: GAIN ... }

data is mainly static

query term search

TITAN segments that overlap a particular MutationSeq event

Data Denormalization: Result

To Infinity and Beyond

in the future, as ESV was designed to be as general as possible

MutationSeq)

Summary

Over the past 3 months:

http://cbioportal.mo.bccrc.ca:8000/

Acknowledgements

Sohrab Shah Cydney Nielsen Development Team Daniel Machev Kelsey Hamer Ali Bashashati Kevin Wagner Shah Lab