Data Driven Innovation Interoperability Tech Track (#agridata) 18 - - PowerPoint PPT Presentation

data driven innovation
SMART_READER_LITE
LIVE PREVIEW

Data Driven Innovation Interoperability Tech Track (#agridata) 18 - - PowerPoint PPT Presentation

Data Driven Innovation Interoperability Tech Track (#agridata) 18 & 19 March 2015, Wageningen (@rfinkers) Outline Introduction Interoperable Genetic Diversity Concept Bring Your Own Data party Aim BYOD Green


slide-1
SLIDE 1

Data Driven Innovation

Interoperability Tech Track (#agridata)

18 & 19 March 2015, Wageningen (@rfinkers)

slide-2
SLIDE 2

Outline

§ Introduction “Interoperable Genetic Diversity” § Concept ”Bring Your Own Data” party § Aim BYOD Green Genetics? § Outcome BYOD Green Genetics § Hands on

2

slide-3
SLIDE 3
slide-4
SLIDE 4

Climate change & Social disruption

4

Photograph: ¡AFP/Getty ¡Images

http://www.theguardian.com/commentisfree/2015/mar/08/guardian-view-climate-change-social-disruption#img-1

slide-5
SLIDE 5
slide-6
SLIDE 6

Select a genetically diverse collection

6

Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information

slide-7
SLIDE 7

Web based aggregation of Information

7

slide-8
SLIDE 8

Interoperable Genetic Diversity

§ Genebanks should utilize genomics data

  • But should not store them!

§ Genomics studies should make variant data available

  • But need access to passport and

characterization & evaluation data.

§ Breeders needs tools to access diversity

Finkers, ¡van ¡Hintum et ¡al. ¡2014 ¡DOI: ¡10.1017/S1479262114000689

Genebank (s)

Genomics provider(s)

slide-9
SLIDE 9

Intermezzo: Linked Open Data

Standardization makes the information interoperable

  • Controlled vocabularies
  • Machine readable
  • Can all be queried by a single question vs. visiting

many websites

slide-10
SLIDE 10

Interoperable Genetic Diversity (2)

§ Implications:

  • Data can be stored at many different locations, but

can be found by computers

  • Newly published information (in the correct format)

will be included automatically.

  • Tools can be written to dedicated questions, such as

assessing allelic variation or utilize for collection management

Finkers, ¡van ¡Hintum et ¡al. ¡2014 ¡DOI: ¡10.1017/S1479262114000689

Genebank (s)

Genomics provider(s)

slide-11
SLIDE 11

Interdisciplinary Approach Needed

11

Genebanks Genomics provider(s)

slide-12
SLIDE 12

Interdisciplinary Approach Needed

Need for Data Scientists & Domain Experts

12

Genebanks Genomics provider(s)

slide-13
SLIDE 13

Format: Bring your own Data Workshop

  • 1. Users define the question(s)
  • 2. Users and Linked data experts define concepts and
  • ntologies
  • 3. Experts help to create linked data and formulate

query

slide-14
SLIDE 14

Bring Your Own Data Workshop

n More Info: http://www.dtls.nl/fair-data/byod/

14

Data

  • wners

Domain Experts Trainers Linked Data Experts

slide-15
SLIDE 15

Example: Solanaceae Trait Ontology

slide-16
SLIDE 16

BYOD in action

slide-17
SLIDE 17

Select a genetically diverse collection

17

Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information

slide-18
SLIDE 18

Example Query

18

PREFIX'rdf:'<http://www.w3.org/1999/02/22:rdf:syntax:ns#> PREFIX'rdfs:'<http://www.w3.org/2000/01/rdf:schema#> PREFIX'taxon:'<http://openlifedata.org/taxonomy_resource:> PREFIX'tdwg:'<http://rs.tdwg.org/dwc/terms/> SELECT'?acc ?label'(str(?lat)'as'?latitude)'(str(?long)'as' ?longitude) GRAPH'<http://cgngenis.wageningenur.nl>'{ ?acc taxon:species ?species'. ?species'rdfs:label ?label'. ?acc tdwg:decimalLatitude ?lat . ?acc tdwg:decimalLongitude ?long } }'order'by'?label

slide-19
SLIDE 19

Outcome: Query Graph

19

slide-20
SLIDE 20

FAIRport* in VLPB?

*More on FAIRport in the presentation of Luiz Bonino, Thursday 10:30

slide-21
SLIDE 21

Summary

§ Blueprint “Interoperable Genetic Diversity Shown” § BYOD resulted in interoperable data which could be

queried

  • Request your own BYOD?

§ Public <-> Private integration possible

slide-22
SLIDE 22

Select a genetically diverse collection

22

Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information

slide-23
SLIDE 23

Select a genetically diverse collection

23

Legacy databases (e.g. Uniprot) Genome Sequence & Genome Annotation Genome Variation Data (re- sequencing collections) & SNP annotation Accession Passport Information Accession Phenotype Information

slide-24
SLIDE 24

Working Prototype

§ screendump

24

slide-25
SLIDE 25

Questions?

Acknowledgements: BYOD team Theo van Hinthum & Frank Menting (CGN) Denis Guryunov & Martijn van Kaauwen (prototype)

  • et. all.
slide-26
SLIDE 26

HaploSmasher Hands On Session

§ HaploSmasher Prototype:

  • genomic regions as input:

SL2.40ch03:10000..10200

  • Solyc gene identifiers: Solyc10g085020
  • Filter SNPs on impact type
  • HIGH, MODERATE, LOW, MODIFIER

(SNPEff )

  • No input validation yet
  • Use correct notation, existing Solyc

gene ID’s

slide-27
SLIDE 27

HaploSmasher

slide-28
SLIDE 28

HaploSmasher

§ Query CGN FAIRdata graph

  • Prototype is only generating links to CGN passport data now
  • Graph data of three CGN accessions is available in our testset
slide-29
SLIDE 29

HaploSmasher examples:

§ Haplotype Output

slide-30
SLIDE 30

Example queries

§ http://www.plantbreeding.wur.nl/hs/ § Also, explore variation data & Linked resources

  • http://www.tomatogenome.net

§ Examples:

  • Beta-tubulin: Solyc10g085020
  • HIGH & MODERATE vs. ALL effects
  • Glutamate dehydrogenase Solyc05g052100
  • Uridine kinase Solyc02g067880
  • magnesium chelatase Solyc04g015750

30

slide-31
SLIDE 31

HaploSmasher examples:

§ Conserved housekeeping genes:

  • Beta-tubulin Solyc10g085020 439 AA
  • 1 SNP (HIGH & MODERATE effect) , two haplotypes
slide-32
SLIDE 32

HaploSmasher examples:

  • Beta-tubulin Solyc10g085020 439 AA
  • 136 SNPs (all SNPEff impact types)
  • Part of haplotype groups:
slide-33
SLIDE 33

HaploSmasher examples:

  • Glutamate dehydrogenase Solyc05g052100
  • 13 SNPs (HIGH, MODERATE)
slide-34
SLIDE 34

HaploSmasher examples:

  • Uridine kinase Solyc02g067880
  • 23 SNPs (HIGH, MODERATE)
  • Example haplotype groups:
slide-35
SLIDE 35

HaploSmasher examples:

  • magnesium chelatase Solyc04g015750
  • 21 SNPs (HIGH, MODERATE)
  • Example haplotype groups: