 
              DNA and e-science Barbera van Schaik AMC Graduate School Introduction e-Science course Scale of sequence data Infrastructures Examples Barbera van Schaik Data sharing Grid to speed up analysis Bioinformatics Laboratory, KEBB Cloud for education Academic Medical Center Cloud and Beehub b.d.vanschaik@amc.uva.nl Question round April 15, 2015 Since: 1999 @ AMC Position: bioinformatician, junior researcher Projects: DNA sequence analysis 1 / 22
DNA and e-science DNA data analysis on many Barbera van Schaik infrastructures Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up In this session analysis Cloud for You will get an indication about the scale of DNA sequence education Cloud and Beehub data, how this is processed on different type of Question infrastructures , and I will show you a few examples round Questions during and after the presentation 2 / 22
DNA and e-science What is bioinformatics? Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round Extraction of biological knowledge from complex data 3 / 22
DNA and e-science What is bioinformatics? Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round ... one of the results *might* be a tool you can use 4 / 22
DNA and e-science Manual and automated sequencing Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 5 / 22
DNA and e-science Next generation sequencing Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 6 / 22
DNA and e-science Sequencers around the world Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round http://omicsmaps.com/ 7 / 22
DNA and e-science Big data Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 8 / 22
DNA and e-science DNA sequencing rate Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round http://www.wellcome.ac.uk/Education-resources/Education-and-learning/big-picture/ all-issues/genes-genomes-and-health/wtdv027167.htm 9 / 22
DNA and e-science Analysis on PCs and small servers Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 10 / 22
DNA and e-science In-house cluster Barbera van Schaik (centralized model) Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 11 / 22
DNA and e-science Dutch life science grid Barbera van Schaik (distributed model) Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round http://surfsara.nl/ 12 / 22
DNA and e-science Cloud computing Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 13 / 22
DNA and e-science And more Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round Super computer Hadoop GPU cluster 14 / 22
DNA and e-science Examples Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples 1 Data sharing Data sharing Grid to speed up analysis 2 Speed up analysis Cloud for education Cloud and 3 Education environment Beehub Question 4 Immunology data flow round 15 / 22
DNA and e-science Sharing research data Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round Hard drives Dropbox e-mail 16 / 22
DNA and e-science Sharing research data Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Beehub Cloud and Beehub Question round Grid 17 / 22
DNA and e-science Grid to speed up analysis Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round Very rare syndrome: Nicolaides Baraitser Found de novo mutations in the SMARCA2 gene Reviewers: Are these mutations specific for the disease? Needed to check this mutation in a large cohort 18 / 22
DNA and e-science Check mutations in GoNL data Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round http://www.dutchgenomeproject.com/ To do: annotate all variants of 500 Dutch genomes Two weeks to finish the analysis 19 / 22
DNA and e-science Cloud for education Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 2nd yr bachelor course - Genomics of disease 150 students perform DNA analysis at the same time Temporary need of extra compute power (2 weeks per year) 20 / 22
DNA and e-science Immunology data flow Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round Analysis pipeline runs on cloud Data is stored and shared via beehub Manual steps: data transfers, annotation, notify eachother by e-mail Need proper data management and provenance. Automation is a bonus. 21 / 22
DNA and e-science Question round Barbera van Schaik Introduction Scale of sequence data Infrastructures Examples Data sharing Grid to speed up analysis Cloud for education Cloud and Beehub Question round 22 / 22
Recommend
More recommend