AMC Graduate School Introduction e-Science course Scale of - - PowerPoint PPT Presentation

amc graduate school
SMART_READER_LITE
LIVE PREVIEW

AMC Graduate School Introduction e-Science course Scale of - - PowerPoint PPT Presentation

DNA and e-science Barbera van Schaik AMC Graduate School Introduction e-Science course Scale of sequence data Infrastructures Examples Barbera van Schaik Data sharing Grid to speed up analysis Bioinformatics Laboratory, KEBB Cloud


slide-1
SLIDE 1

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

AMC Graduate School e-Science course

Barbera van Schaik

Bioinformatics Laboratory, KEBB Academic Medical Center b.d.vanschaik@amc.uva.nl

April 15, 2015 Since: 1999 @ AMC Position: bioinformatician, junior researcher Projects: DNA sequence analysis

1 / 22

slide-2
SLIDE 2

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

DNA data analysis on many infrastructures

In this session

You will get an indication about the scale of DNA sequence data, how this is processed on different type of infrastructures, and I will show you a few examples Questions during and after the presentation

2 / 22

slide-3
SLIDE 3

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

What is bioinformatics?

Extraction of biological knowledge from complex data

3 / 22

slide-4
SLIDE 4

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

What is bioinformatics?

... one of the results *might* be a tool you can use

4 / 22

slide-5
SLIDE 5

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Manual and automated sequencing

5 / 22

slide-6
SLIDE 6

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Next generation sequencing

6 / 22

slide-7
SLIDE 7

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Sequencers around the world

http://omicsmaps.com/ 7 / 22

slide-8
SLIDE 8

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Big data

8 / 22

slide-9
SLIDE 9

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

DNA sequencing rate

http://www.wellcome.ac.uk/Education-resources/Education-and-learning/big-picture/ all-issues/genes-genomes-and-health/wtdv027167.htm 9 / 22

slide-10
SLIDE 10

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Analysis on PCs and small servers

10 / 22

slide-11
SLIDE 11

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

In-house cluster (centralized model)

11 / 22

slide-12
SLIDE 12

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Dutch life science grid (distributed model)

http://surfsara.nl/

12 / 22

slide-13
SLIDE 13

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Cloud computing

13 / 22

slide-14
SLIDE 14

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

And more

Super computer Hadoop GPU cluster

14 / 22

slide-15
SLIDE 15

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Examples

1 Data sharing 2 Speed up analysis 3 Education environment 4 Immunology data flow

15 / 22

slide-16
SLIDE 16

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Sharing research data

e-mail Dropbox Hard drives

16 / 22

slide-17
SLIDE 17

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Sharing research data

Beehub Grid

17 / 22

slide-18
SLIDE 18

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Grid to speed up analysis

Very rare syndrome: Nicolaides Baraitser Found de novo mutations in the SMARCA2 gene Reviewers: Are these mutations specific for the disease? Needed to check this mutation in a large cohort

18 / 22

slide-19
SLIDE 19

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Check mutations in GoNL data

http://www.dutchgenomeproject.com/ To do: annotate all variants of 500 Dutch genomes Two weeks to finish the analysis 19 / 22

slide-20
SLIDE 20

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Cloud for education

2nd yr bachelor course - Genomics of disease 150 students perform DNA analysis at the same time Temporary need of extra compute power (2 weeks per year)

20 / 22

slide-21
SLIDE 21

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Immunology data flow

Analysis pipeline runs on cloud Data is stored and shared via beehub Manual steps: data transfers, annotation, notify eachother by e-mail Need proper data management and provenance. Automation is a bonus. 21 / 22

slide-22
SLIDE 22

DNA and e-science Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples

Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub

Question round

Question round

22 / 22