Serverless Beacon: Helping take genomic analysis from the cloud to - - PowerPoint PPT Presentation

serverless beacon helping take genomic analysis from the
SMART_READER_LITE
LIVE PREVIEW

Serverless Beacon: Helping take genomic analysis from the cloud to - - PowerPoint PPT Presentation

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic Brendan Hosking October 2019, HISA Data Analytics 2019 HEALTH AND BIOSECURITY Genomic data discovery: Beacons 2 | Custom Continuous Deployment to Uncover the


slide-1
SLIDE 1

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic

HEALTH AND BIOSECURITY

Brendan Hosking October 2019, HISA Data Analytics 2019

slide-2
SLIDE 2

Custom Continuous Deployment to Uncover the Secrets in the Genome | Brendan Hosking 2 |

Genomic data discovery: Beacons

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-3
SLIDE 3

Custom Continuous Deployment to Uncover the Secrets in the Genome | Brendan Hosking 3 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-4
SLIDE 4

Custom Continuous Deployment to Uncover the Secrets in the Genome | Brendan Hosking

“Only pay for the resources consumed – zero downtime cost.“

Used by

Cheaper

4 |

“Scaling up to large volumes of distributed variant data.“ Powerful

Serverless-Beacon to scale up discovery across continents

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-5
SLIDE 5

Serverless Innovation for Health | Denis C. Bauer | @allPowerde 5 |

Scaling Analysis

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00105 HG00106 HG0 1 15820 rs2691315 G T 100 PASS AC=6;AN=20;VT=SNP;EX_TARGET GT 1|0 0|1 0|1 0|0 0|0 1| 1 15903 rs557514207 G GC 100 PASS AC=8;AN=20;VT=INDEL;EX_TARGET GT 0|1 0|1 0|0 1|0 0|1 0|0 1 69761 rs200505207 A T 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 1|1 0|0 1 889159 rs13302945 A C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1 894573 rs13303010 G A 100 PASS AC=18;AN=20;VT=SNP;EX_TARGET GT 1|0 1|1 1|1 1|1 1|1 1|1 1 897216 rs186126206 C T 100 PASS AC=1;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 1|0 0|0 1 897325 rs4970441 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1 899928 rs6677386 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1 1564952 rs535125876;rs112177324 TG TGG,T 100 PASS AC=0,8;AN=20;VT=INDEL;MULTI_ALLELIC;EX_TARGET GT 2|2 0|0 0|0

Records Samples Beacon Dataset

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099 HG00100 HG00101 HG00102 HG00103 HG00105 HG00106 HG00107 1 15820 rs2691315 G T 100 PASS AC=6;AN=20;VT=SNP;EX_TARGET GT 1|0 0|1 0|1 0|0 0|0 1|0 1|0 0|0 0|0 0|1 1 15903 rs557514207 G GC 100 PASS AC=8;AN=20;VT=INDEL;EX_TARGET GT 0|1 0|1 0|0 1|0 0|1 0|0 0|1 0|1 0|1 0|1 1 69761 rs200505207 A T 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 1|1 0|0 0|0 0|0 0|0 0|0 1 69897 rs200676709 T C 100 PASS AC=17;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 0|0 1|1 1|1 1|0 1|1 1|1 1|1 1|1 1 876499 rs4372192 A G 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 877831 rs6672356 T C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 878314 rs142558220 G C 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 1|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1 881070 rs41285794 G A 100 PASS AC=1;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 1 881627 rs2272757 G A 100 PASS AC=13;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 1|1 1|1 1|0 1|1 1|1 1|0 1|0 1|1 1 881918 rs35471880 G A 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|1 1 887560 rs3748595 A C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 887801 rs3828047 A G 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 888639 rs3748596 T C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 888659 rs3748597 T C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 889158 rs13303056 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 889159 rs13302945 A C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 894573 rs13303010 G A 100 PASS AC=18;AN=20;VT=SNP;EX_TARGET GT 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 1 897216 rs186126206 C T 100 PASS AC=1;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 1 897325 rs4970441 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 899928 rs6677386 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 1564952 rs535125876;rs112177324 TG TGG,T 100 PASS AC=0,8;AN=20;VT=INDEL;MULTI_ALLELIC;EX_TARGET GT 2|2 0|0 0|0 2|0 0|2 2|0 0|2 2|0 2|0 1 878314 rs142558220 G C 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 1|1 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1 881070 rs41285794 G A 100 PASS AC=1;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 0|0 1 881627 rs2272757 G A 100 PASS AC=13;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 1|1 1|1 1|0 1|1 1|1 1|0 1|0 1|1 1 881918 rs35471880 G A 100 PASS AC=2;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 1|1 1 887560 rs3748595 A C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 887801 rs3828047 A G 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 888639 rs3748596 T C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 888659 rs3748597 T C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 889158 rs13303056 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 889159 rs13302945 A C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 894573 rs13303010 G A 100 PASS AC=18;AN=20;VT=SNP;EX_TARGET GT 1|0 1|1 1|1 1|1 1|1 1|1 1|1 1|0 1|1 1|1 1 897216 rs186126206 C T 100 PASS AC=1;AN=20;VT=SNP;EX_TARGET GT 0|0 0|0 0|0 0|0 1|0 0|0 0|0 0|0 0|0 0|0 1 897325 rs4970441 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 899928 rs6677386 G C 100 PASS AC=20;AN=20;VT=SNP;EX_TARGET GT 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1|1 1 1564952 rs535125876;rs112177324 TG TGG,T 100 PASS AC=0,8;AN=20;VT=INDEL;MULTI_ALLELIC;EX_TARGET GT 2|2 0|0 0|0 2|0 0|2 2|0 0|2 2|0 2| 0|0

Variant Call File

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-6
SLIDE 6

Serverless Innovation for Health | Denis C. Bauer | @allPowerde

Why Serverless?

6 |

Expensive Limited Insecure manual termination Must Provision size security policies Cheapest Scalable Secure

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-7
SLIDE 7

Serverless Innovation for Health | Denis C. Bauer | @allPowerde

Challenges

7 |

Limits

Memory Life span Storage

DevOps

No direct access Slow Testing Loop Function versions

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-8
SLIDE 8

Bioinformatics | Denis C. Bauer | @allPowerde

Custom Continuous Deployment to Uncover the Secrets in the Genome | Brendan Hosking 3 | 1.POST request to create/update dataset. The request includes vcf locations and dataset

metadata.

2.Invoke submitDataset lambda function. 3.Validate and insert the dataset metadata into Datasets dynamodb. 4.If a change was made to the vcfs in a dataset, publish the dataset to summariseDataset SNS. 5.Reads the dataset id from summariseDataset SNS. 6.Read the VCF locations from Datasets dynamodb. 7.Check VcfSummaries dynamodb to see if all the VCFs have been summarised. 8.If any VCF is missing call, variant or sample count information, publish that VCF to summariseVCF. 9.Read the vcf location from summariseVCF SNS. 10.Attempt to enter the slices of the VCF in the VCF location item in VcfSummaries dynamodb. 11.If there already values in the toUpdate attribute for that item, abort. 12.Read the number of samples from the vcf location. 13.Enter the sample count for the VCF location in VcfSummaries dynamodb. 14.Publish each region slice to summariseSlice SNS. 15.Read the region and VCF location from summariseSlice SNS. 16.Parse the region in the VCF and counts the total number of variants and calls. 17.Remove the region and adds its counts to the VCF item in VcfSummaries dynamodb. 18.Record the slices that remain to be updated. 19.If there are no more slices to update, get all datasets that use that VCF from Datasets

dynamodb. 20.For each dataset found, publish that dataset to the summariseDataset SNS. 21.Read the dataset id from summariseDataset SNS.

22.Read the VCF locations from Datasets dynamodb. 23.Read the VCF summaries from VcfSummaries dynamodb. 24.If all the VCFs have been summarised, aggregate the counts and enter them in Datasets

dynamodb.

Submit Workflow

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-9
SLIDE 9

Bioinformatics | Denis C. Bauer | @allPowerde

Custom Continuous Deployment to Uncover the Secrets in the Genome | Brendan Hosking 3 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking 1.GET request for a summary of the available datasets. 2.Invoke getInfo lambda function. 3.Read the Datasets dynamoDB, to get the summary of each dataset. 4.Return the dataset summary information to the API Gateway. 5.Return the summary of each dataset to the client. 6.GET request for information about variants in a particular region, perhaps on a subset of

datasets. 7.Invoke queryDatasets lambda function. 8.Collect metadata as well as vcf location for each dataset from Datasets dynamoDB. 9.Invoke splitQuery for each dataset, with the desired region and variant type.

  • 10. Split the region into slices, as well as split by VCF location, and invoke performQuery for

each combination.

  • 11. Analyse the vcf slice and collect information on the desired region and variant.
  • 12. Return the slice information to splitQuery.
  • 13. Aggregate the information as per the requirements in the API call, and return it to

queryDatasets.

  • 14. Note whether there were any hits and return the answer, as well as the dataset specific

information, to API Gateway.

  • 15. Return the response from queryDatasets directly to the client.

Query Workflow

slide-10
SLIDE 10

Serverless Innovation for Health | Denis C. Bauer | @allPowerde 10 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

Performance Improvements

slide-11
SLIDE 11

Serverless Innovation for Health | Denis C. Bauer | @allPowerde 11 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

Performance Improvements

slide-12
SLIDE 12

IM&T Administered

Serverless Innovation for Health | Denis C. Bauer | @allPowerde

How to maintain consistency in the cloud?

12 |

User Administered

Project Infrastructure Definition Continuous Deployment Secure Account Storage Deployment Mechanics

Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

slide-13
SLIDE 13

Serverless Innovation for Health | Denis C. Bauer | @allPowerde 13 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

Looking Forward

  • Serverless architecture will be the future - enabling

rapid prototyping and scalability.

  • CSIRO builds digital health solutions that are

future-ready.

  • Life-science research enables more effective clinical

practise: let’s build a healthier future together!

slide-14
SLIDE 14

Serverless Innovation for Health | Denis C. Bauer | @allPowerde 14 | Serverless Beacon: Helping take genomic analysis from the cloud to the clinic | Brendan Hosking

Let’s build a healthier world together

Denis Bauer, PhD Rob Dunne, PhD Piotr Szul

Transformational Bioinformatics

Collaborators News Software Lynn Langit

Top 10 Australian IT stories of 2017

You?

We are hiring… …email Denis

#InCoB 2019 | Jakarta KEYNOTE

Suzanne Scott Oscar Luo, PhD Arash Bayat, PhD Natalie Twine, PhD Genome Insight Yatish Jain Aidan O’Brien Laurence Wilson, PhD Brendan Hosking Aidan Tay Daniel Reti Digital Genome Engineering Suzanne Scott

Mumbai 2019