SRB in the BioEmergences project Dominique de Waleffe - - PowerPoint PPT Presentation

srb in the bioemergences project
SMART_READER_LITE
LIVE PREVIEW

SRB in the BioEmergences project Dominique de Waleffe - - PowerPoint PPT Presentation

The project The architecture SRB usage iRODS? SRB in the BioEmergences project Dominique de Waleffe dominique.dewaleffe@denali.be Denali SA CC-IN2P3 Feb 2, 2009 D. de Waleffe SRB in the BioEmergences project The project The


slide-1
SLIDE 1

The project The architecture SRB usage iRODS?

SRB in the BioEmergences project

Dominique de Waleffe

dominique.dewaleffe@denali.be

Denali SA

CC-IN2P3 – Feb 2, 2009

  • D. de Waleffe

SRB in the BioEmergences project

slide-2
SLIDE 2

The project The architecture SRB usage iRODS?

1

The project

2

The architecture

3

SRB usage

4

iRODS?

  • D. de Waleffe

SRB in the BioEmergences project

slide-3
SLIDE 3

The project The architecture SRB usage iRODS?

Partners

Framework Program 6 project Consortium:

CNRS – Centre De Recherche en Epist´ emologie Appliqu´ ee (CREA) (FRANCE) Institut Curie (France) Slovenska Technicka Univerzita V Bratislave (Slovakia) Universidad de M´ alaga (Spain) Denali Consulting S.A. (Belgium) European Molecular Biology Laboratory (Germany) University of Bologna (Italy) CNRS – CC-IN2P3 (France)

Project fact sheet on CORDIS:http://tinyurl.com/5yc42k

  • D. de Waleffe

SRB in the BioEmergences project

slide-4
SLIDE 4

The project The architecture SRB usage iRODS?

Project goals

What? With the BioEMERGENCES project, we aim at providing an experimental platform to observe in vivo emergent patterns at various scales and measure their variability between different individuals of the same species. This is a strategy towards the measurement of the individual susceptibility to genetic diseases or response to treatments. ... The main result expected from BioEMERGENCES is the specification of a European platform to achieve high throughput measurement of individual differences and screening

  • f drugs combinations such as bi or tri-therapies.
  • D. de Waleffe

SRB in the BioEmergences project

slide-5
SLIDE 5

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists

  • D. de Waleffe

SRB in the BioEmergences project

slide-6
SLIDE 6

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research

  • D. de Waleffe

SRB in the BioEmergences project

slide-7
SLIDE 7

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research Observe: Using high definition microscopes, capture 4D sets

  • f images of living embryos (Zebra Fish, Sea Urchin,. . . )
  • D. de Waleffe

SRB in the BioEmergences project

slide-8
SLIDE 8

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research Observe: Using high definition microscopes, capture 4D sets

  • f images of living embryos (Zebra Fish, Sea Urchin,. . . )

Transform: Invent methods to go from images to symbolic representations (lineage trees, contours)

  • D. de Waleffe

SRB in the BioEmergences project

slide-9
SLIDE 9

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research Observe: Using high definition microscopes, capture 4D sets

  • f images of living embryos (Zebra Fish, Sea Urchin,. . . )

Transform: Invent methods to go from images to symbolic representations (lineage trees, contours) Compare: Invent methods for efficient and meaningful comparisons

  • D. de Waleffe

SRB in the BioEmergences project

slide-10
SLIDE 10

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research Observe: Using high definition microscopes, capture 4D sets

  • f images of living embryos (Zebra Fish, Sea Urchin,. . . )

Transform: Invent methods to go from images to symbolic representations (lineage trees, contours) Compare: Invent methods for efficient and meaningful comparisons Industrialize

  • D. de Waleffe

SRB in the BioEmergences project

slide-11
SLIDE 11

The project The architecture SRB usage iRODS?

Goals

Team Multi-disciplinary team : biologists, mathematicians, engineers, computer scientists Research Observe: Using high definition microscopes, capture 4D sets

  • f images of living embryos (Zebra Fish, Sea Urchin,. . . )

Transform: Invent methods to go from images to symbolic representations (lineage trees, contours) Compare: Invent methods for efficient and meaningful comparisons Industrialize Platform for high throughput execution of the processes

  • D. de Waleffe

SRB in the BioEmergences project

slide-12
SLIDE 12

The project The architecture SRB usage iRODS?

Some details

Gather observations Biologists place an embryo under microscope for a number of hours

∆ ∆ ∆ ∆
  • D. de Waleffe

SRB in the BioEmergences project

slide-13
SLIDE 13

The project The architecture SRB usage iRODS?

Some details

Gather observations Biologists place an embryo under microscope for a number of hours a stack of horizontal images of size x ∗ y, separated in time by δt and space by δz are captured

∆ ∆ ∆ ∆
  • D. de Waleffe

SRB in the BioEmergences project

slide-14
SLIDE 14

The project The architecture SRB usage iRODS?

Some details

Gather observations Biologists place an embryo under microscope for a number of hours a stack of horizontal images of size x ∗ y, separated in time by δt and space by δz are captured a new stack is captured every ∆T

∆ ∆ ∆ ∆
  • D. de Waleffe

SRB in the BioEmergences project

slide-15
SLIDE 15

The project The architecture SRB usage iRODS?

Some details

Gather observations Biologists place an embryo under microscope for a number of hours a stack of horizontal images of size x ∗ y, separated in time by δt and space by δz are captured a new stack is captured every ∆T Repeated for many individuals under different conditions

∆ ∆ ∆ ∆
  • D. de Waleffe

SRB in the BioEmergences project

slide-16
SLIDE 16

The project The architecture SRB usage iRODS?

Some details

Gather observations Biologists place an embryo under microscope for a number of hours a stack of horizontal images of size x ∗ y, separated in time by δt and space by δz are captured a new stack is captured every ∆T Repeated for many individuals under different conditions Output A large set of large files containing raw images:

X Y ∆Z X Y ∆Z X Y ∆Z X Y ∆Z timestep

A set of metadata describing the experiment

  • D. de Waleffe

SRB in the BioEmergences project

slide-17
SLIDE 17

The project The architecture SRB usage iRODS?

Some details

Reconstruct cell lineage tree Invent different algorithms to:

filter images (remove noise) detect centers of cell nuclei ((x, y, z) position) determine membrane contours (set of 3-D polygons) determine nucleus contours (set of 3-D polygons) identify mytosis (cell divisions) track individual cell from step Ti to step Ti+1 and build lineage tree compare lineage trees , infer new results

  • D. de Waleffe

SRB in the BioEmergences project

slide-18
SLIDE 18

The project The architecture SRB usage iRODS?

Some details

Reconstruct cell lineage tree Invent different algorithms to:

filter images (remove noise) detect centers of cell nuclei ((x, y, z) position) determine membrane contours (set of 3-D polygons) determine nucleus contours (set of 3-D polygons) identify mytosis (cell divisions) track individual cell from step Ti to step Ti+1 and build lineage tree compare lineage trees , infer new results

visualize reconstructions

  • D. de Waleffe

SRB in the BioEmergences project

slide-19
SLIDE 19

The project The architecture SRB usage iRODS?

Some details

Reconstruct cell lineage tree Invent different algorithms to:

filter images (remove noise) detect centers of cell nuclei ((x, y, z) position) determine membrane contours (set of 3-D polygons) determine nucleus contours (set of 3-D polygons) identify mytosis (cell divisions) track individual cell from step Ti to step Ti+1 and build lineage tree compare lineage trees , infer new results

visualize reconstructions correct and annotate datasets

  • D. de Waleffe

SRB in the BioEmergences project

slide-20
SLIDE 20

The project The architecture SRB usage iRODS?

Some figures

Image sizes: 512 ∗ 512 ∗ 8 to 1024 ∗ 1024 ∗ 8 pixels, 0.5µ < δx, δy < 1.5µ, but soon: 2048 ∗ 2048 ∗ 24, Number of images in stack: between 50 and 200, Number of time steps: ∆T typically between 1 and 10 minutes, a few tens to a few hundreds of time intervals captured. Raw data volumes: 50 to 60Gigabytes of raw image files per experiment (size: 512) but will soon be 1/2Terabytes with new microscope. Number of cells: lineage trees contains several million cells. Current storage used (SRB): in excess of 8 TB.

  • D. de Waleffe

SRB in the BioEmergences project

slide-21
SLIDE 21

The project The architecture SRB usage iRODS?

Context diagram

Biologist Researcher Experimenter System manager Interactive Access Server CC-IN2P3 Researcher workstation Data Storage (SRB +ORACLE) CC-IN2P3 Compute farm CC-IN2P3 Compute GRID EGEE Laser microscope and its local controller/storage server BIO WKF management

  • D. de Waleffe

SRB in the BioEmergences project

slide-22
SLIDE 22

The project The architecture SRB usage iRODS?

Deployment viewpoint

CC-IN2P3 (LYON)

AFS Storage HTTP IN2P3- Web Server Apache PistooCluster Job execution (BQS) Storage IN2P3-WebService Job submission Status tracking JSDL Translation

  • > BQS

Anastasie Cluster Job execution (BQS) Job submission Status tracking IN2P3-WebApplication Algo management Data management Job submission Parameters Management Status tracking Experiment management User Management User notification CCALI Manuel access to infrastructure SRB Storage Oracle Storage

???

EGEE-WebService JSDL Translation

  • > JDL(EGEE)

Job submission Status tracking

GRIF (Paris)

EGEE GRID Job execution (EGEE)

GIF/s/Yvette

Microscope Image Server Raw Image Transfer

  • D. de Waleffe

SRB in the BioEmergences project

slide-23
SLIDE 23

The project The architecture SRB usage iRODS?

Application: experiment list

  • D. de Waleffe

SRB in the BioEmergences project

slide-24
SLIDE 24

The project The architecture SRB usage iRODS?

Application: processing pipelines

Details button: brings view below:

  • D. de Waleffe

SRB in the BioEmergences project

slide-25
SLIDE 25

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon.

  • D. de Waleffe

SRB in the BioEmergences project

slide-26
SLIDE 26

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used.

  • D. de Waleffe

SRB in the BioEmergences project

slide-27
SLIDE 27

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

  • D. de Waleffe

SRB in the BioEmergences project

slide-28
SLIDE 28

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

Kept for reuse in further processing.

  • D. de Waleffe

SRB in the BioEmergences project

slide-29
SLIDE 29

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

Kept for reuse in further processing. Data history kept in application’s DB.

  • D. de Waleffe

SRB in the BioEmergences project

slide-30
SLIDE 30

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

Kept for reuse in further processing. Data history kept in application’s DB.

Some additionnal files (movies of reconstructions,...)

  • D. de Waleffe

SRB in the BioEmergences project

slide-31
SLIDE 31

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

Kept for reuse in further processing. Data history kept in application’s DB.

Some additionnal files (movies of reconstructions,...) Stores algorithms (scripts, sources, builds procedures, executables)

  • D. de Waleffe

SRB in the BioEmergences project

slide-32
SLIDE 32

The project The architecture SRB usage iRODS?

SRB usage: What

Main repository: cc-in2p3 Lyon. Raw data storage. Data captured in Paris, format standardized, then copied to SRB. Srsync is used. Derived data.

Kept for reuse in further processing. Data history kept in application’s DB.

Some additionnal files (movies of reconstructions,...) Stores algorithms (scripts, sources, builds procedures, executables)

  • D. de Waleffe

SRB in the BioEmergences project

slide-33
SLIDE 33

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms.

  • D. de Waleffe

SRB in the BioEmergences project

slide-34
SLIDE 34

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system

  • D. de Waleffe

SRB in the BioEmergences project

slide-35
SLIDE 35

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync

  • D. de Waleffe

SRB in the BioEmergences project

slide-36
SLIDE 36

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

  • D. de Waleffe

SRB in the BioEmergences project

slide-37
SLIDE 37

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

displaying raw data file lists

  • D. de Waleffe

SRB in the BioEmergences project

slide-38
SLIDE 38

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

displaying raw data file lists Streaming movies [todo]

  • D. de Waleffe

SRB in the BioEmergences project

slide-39
SLIDE 39

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

displaying raw data file lists Streaming movies [todo]

Not used in the project:

  • D. de Waleffe

SRB in the BioEmergences project

slide-40
SLIDE 40

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

displaying raw data file lists Streaming movies [todo]

Not used in the project:

user meta data

  • D. de Waleffe

SRB in the BioEmergences project

slide-41
SLIDE 41

The project The architecture SRB usage iRODS?

SRB usage: how

Used in identical manner from both farms. Mostly used as a file system Command line: Smkdir, Scd, Sls, Sput, Sget, Srm, Srsync Jargon library used for

displaying raw data file lists Streaming movies [todo]

Not used in the project:

user meta data web based browser

  • D. de Waleffe

SRB in the BioEmergences project

slide-42
SLIDE 42

The project The architecture SRB usage iRODS?

SRB usage: issues

Slow access when doing operations requiring catalog access (e.g. Sls -l).

  • D. de Waleffe

SRB in the BioEmergences project

slide-43
SLIDE 43

The project The architecture SRB usage iRODS?

SRB usage: issues

Slow access when doing operations requiring catalog access (e.g. Sls -l). Bizarre error messages, or no exit codes (makes it difficult in scripts)

  • D. de Waleffe

SRB in the BioEmergences project

slide-44
SLIDE 44

The project The architecture SRB usage iRODS?

SRB usage: issues

Slow access when doing operations requiring catalog access (e.g. Sls -l). Bizarre error messages, or no exit codes (makes it difficult in scripts) Locking bugs (multiple e.g. Smkdir X) which impact the whole group!

  • D. de Waleffe

SRB in the BioEmergences project

slide-45
SLIDE 45

The project The architecture SRB usage iRODS?

SRB usage: issues

Slow access when doing operations requiring catalog access (e.g. Sls -l). Bizarre error messages, or no exit codes (makes it difficult in scripts) Locking bugs (multiple e.g. Smkdir X) which impact the whole group! deleted stuff is not always fully deleted

  • D. de Waleffe

SRB in the BioEmergences project

slide-46
SLIDE 46

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • D. de Waleffe

SRB in the BioEmergences project

slide-47
SLIDE 47

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have.

  • D. de Waleffe

SRB in the BioEmergences project

slide-48
SLIDE 48

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have. Looks like iRODS would be a good fit but:

  • D. de Waleffe

SRB in the BioEmergences project

slide-49
SLIDE 49

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have. Looks like iRODS would be a good fit but:

we have no extra budget in current project

  • D. de Waleffe

SRB in the BioEmergences project

slide-50
SLIDE 50

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have. Looks like iRODS would be a good fit but:

we have no extra budget in current project team is currently looking at using ROOT as storage framework

  • D. de Waleffe

SRB in the BioEmergences project

slide-51
SLIDE 51

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have. Looks like iRODS would be a good fit but:

we have no extra budget in current project team is currently looking at using ROOT as storage framework taking advantage of iRODS imply large re-architecting effort

  • D. de Waleffe

SRB in the BioEmergences project

slide-52
SLIDE 52

The project The architecture SRB usage iRODS?

Why iRODS?

Post-processing on ingestion. Could trigger raw data format changes on upload.

  • Workflows. Probably redundant with what we already have or

can have. Looks like iRODS would be a good fit but:

we have no extra budget in current project team is currently looking at using ROOT as storage framework taking advantage of iRODS imply large re-architecting effort

sources available.

  • D. de Waleffe

SRB in the BioEmergences project

slide-53
SLIDE 53

The project The architecture SRB usage iRODS?

Why iRODS , some risks ?

maintainability of complex rule base?

  • D. de Waleffe

SRB in the BioEmergences project

slide-54
SLIDE 54

The project The architecture SRB usage iRODS?

Why iRODS , some risks ?

maintainability of complex rule base?

rule syntax (one liners, readability, choices of operators, comments?)

myRule|foo==1|action1(...);action2(...);...|action3(...);action4(...);...

  • D. de Waleffe

SRB in the BioEmergences project

slide-55
SLIDE 55

The project The architecture SRB usage iRODS?

Why iRODS , some risks ?

maintainability of complex rule base?

rule syntax (one liners, readability, choices of operators, comments?)

myRule|foo==1|action1(...);action2(...);...|action3(...);action4(...);...

Why not sligthly more verbose

r u l e myRule { // t h i s r u l e i s t r i g g e r e d when foo and does bar when ( foo == 1) do { /∗ watch that t h i s a c t i o n has side−e f f e c t s ∗/ a c t i o n 1 ( . . . ) ; a c t i o n 2 ( . . . ) ; . . . }

  • n

f a i l u r e { a c t i o n 3 ( . . . ) ; a c t i o n 4 ( . . . ) ; . . . } }

  • D. de Waleffe

SRB in the BioEmergences project

slide-56
SLIDE 56

The project The architecture SRB usage iRODS?

Why iRODS , some risks ?

maintainability of complex rule base?

rule syntax (one liners, readability, choices of operators, comments?)

myRule|foo==1|action1(...);action2(...);...|action3(...);action4(...);...

Why not sligthly more verbose

r u l e myRule { // t h i s r u l e i s t r i g g e r e d when foo and does bar when ( foo == 1) do { /∗ watch that t h i s a c t i o n has side−e f f e c t s ∗/ a c t i o n 1 ( . . . ) ; a c t i o n 2 ( . . . ) ; . . . }

  • n

f a i l u r e { a c t i o n 3 ( . . . ) ; a c t i o n 4 ( . . . ) ; . . . } }

can I define new microServices as complex jobs (e.g submit job(s) to farm) without going to C programming?

  • D. de Waleffe

SRB in the BioEmergences project

slide-57
SLIDE 57

The project The architecture SRB usage iRODS?

Conclusion

BioEmergences has complex distributed data/processing needs

  • D. de Waleffe

SRB in the BioEmergences project

slide-58
SLIDE 58

The project The architecture SRB usage iRODS?

Conclusion

BioEmergences has complex distributed data/processing needs Could make use of iRODS if risks are shown to be a non issue

  • D. de Waleffe

SRB in the BioEmergences project

slide-59
SLIDE 59

The project The architecture SRB usage iRODS?

Questions?

  • D. de Waleffe

SRB in the BioEmergences project