The curation curation of laboratory experimental of laboratory - - PowerPoint PPT Presentation

the curation curation of laboratory experimental of
SMART_READER_LITE
LIVE PREVIEW

The curation curation of laboratory experimental of laboratory - - PowerPoint PPT Presentation

The curation curation of laboratory experimental of laboratory experimental The data as part of the overall data data as part of the overall data lifecycle lifecycle Jeremy G.Frey G.Frey Jeremy School of Chemistry, University of School


slide-1
SLIDE 1

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference Glasgow

The The curation curation of laboratory experimental

  • f laboratory experimental

data as part of the overall data data as part of the overall data lifecycle lifecycle

Jeremy Jeremy G.Frey G.Frey School of Chemistry, University of School of Chemistry, University of Southampton, UK Southampton, UK 21 Nov 2006 21 Nov 2006

DCC Conference, Glasgow DCC Conference, Glasgow

slide-2
SLIDE 2

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

If you do things right at the start then all the following processes are much easier! Exponentially growing amount of data - the future

  • verwhelms the past
slide-3
SLIDE 3

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

The The Comb Combe

eChem

Chem Project Project

  • End to End linking of data and

End to End linking of data and information information

  • Publication@Source

Publication@Source

  • So collect data with regard to how it

So collect data with regard to how it could eventually be used could eventually be used

  • Make sure the metadata is of high quality

Make sure the metadata is of high quality

  • Record properly at source in Digital Form

Record properly at source in Digital Form

  • The Chemistry Lab

The Chemistry Lab

  • People & Machines working together

People & Machines working together

slide-4
SLIDE 4

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Combechem

Smart Lab R4L e-Bank E-Malaria Instruments on the Grid BioSimGrid Statistics

slide-5
SLIDE 5

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Plan & COSHH Digital Model Information Integration Report Knowledge Goal Literature Synthesis

not just one laboratory but many co-laboratories working together

Analysis Smart Laboratory Smart Storage Smart Dissemination Smart HCI

The concept of Publication @ Source The concept of Publication @ Source

Smart Workflow

slide-6
SLIDE 6

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

If only I knew exactly how she did this experiments I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle? I wish I could get the numbers from this graph - the pdf is not much use. I wish I had recorded things at the start the way I do now…..

Typical Laboratory

slide-7
SLIDE 7

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

First, they do an online search

Need to make the data available Need to be able to find it But how to expose it?

slide-8
SLIDE 8

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

I am sure we collected that information a few years ago… The details should be in her thesis….. Can you read what he says here….? Can you find the file of data that were used to make the plot? Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.

slide-9
SLIDE 9

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

What are the people up to? What are the people up to?

  • Capture Data and Context

Capture Data and Context

  • People

People

  • Process

Process

  • Environment

Environment

slide-10
SLIDE 10

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Permanent, documented and primary record of laboratory

  • bservations
slide-11
SLIDE 11

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Observations are never collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA

slide-12
SLIDE 12

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

COSHH COSHH L Leverage off things we already

everage off things we already have to do have to do – – “ “We have a cunning We have a cunning plan plan” ”

slide-13
SLIDE 13

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

1 1 2 2 1 3 1 4 Sample of 4- flourinated biphenyl Add Cool Reflux Butanone Sample of K2CO3 Powder Weigh grammes 0.9031 Measure 40 ml Add Weigh 2.0719 g text 3 5 Add g Sample of Br11OCB 2 6 Reflux 2 7 Cool Water Measure 30 ml 9 Liquid- liquid extraction DCM Measure 3 of 40 ml 10 Dry MgSO4 11 Filter (Buchner) 12 Remove Solvent by Rotary Evaporation 13 Fuse Silica 14 Column Chromatography Ether/ Petrol Ratio Butanone dried via silica column and measured into 100ml RB flask. Used 1ml extra solvent to wash out container. Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15. Inorganics dissolve 2
  • layers. Added brine
~20ml. Organics are yellow solution Washed MgSO4 with DCM ~ 50ml Measure excess Observation Types weight - grammes measure - ml, drops annotate - text temperature - K,

°C

Key Process Input Literal Observation Add Cool Reflux Add Add Reflux Cool Dry Filter Remove Solvent by Rotary Evaporation Fuse Column Chromatography Dissolve 4- flourinated biphenyl in butanone Add K2CO3 powder Heat at reflux for 1.5 hours Cool and add Br11OCB Heat at reflux until completion Cool and add water (30ml) Combine organics, dry over MgSO4 & filter Remove solvent in vacuo Liquid- liquid extraction Extract with DCM (3x40ml) Fuse compound to silica & column in ether/petrol 4 8 Add Add text Annotate Annotate text Weigh Annotate g Annotate Annotate text text Future Questions Whether to have many subclasses of processes or fewer with annotations How to depict destructive processes How to depict taking lots of samples What is the observation/process boundary? e.g. MRI scan 1.5918 Combechem 30 January 2004 gvh, hrm, gms Ingredient List Fluorinated biphenyl 0.9 g Br11OCB 1.59 g Potassium Carbonate 2.07 g Butanone 40 ml image

To D List Plan Process Record

slide-14
SLIDE 14

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006 1 1 2 2 1 3 Sample of 4- flourinated biphenyl Add Reflux Butanone Sample of K2CO3 Powder

Weigh grammes 0.9031 Measure 40 ml

Add

Weigh 2.0719 g text Butanone dried via silica column and measured into 100ml RB flask. Used 1ml extra solvent to wash out container. Started reflux at 13.30. (Had to change heater stirrer) Only reflux for 45min, next step 14:15.

Add Reflux Add

Dissolve 4- flourinated biphenyl in butanone Add K2CO3 powder Heat at reflux for 1.5 hours

text Annotate Annotate

Ingredient List

Fluorinated biphenyl 0.9 g Br11OCB 1.59 g Potassium Carbonate 2.07 g Butanone 40 ml

slide-15
SLIDE 15

Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving

Smart Laboratory Spaces

slide-16
SLIDE 16

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

But what about the laboratory environment?

“I just realized, Howard, that everything in this apartment is more sophisticated than we are”

slide-17
SLIDE 17

Semantic Semantic DataGrid DataGrid

  • CombeChem

CombeChem used, tested & used, tested & strained the Semantic Web strained the Semantic Web for for

  • Enhanced (annotated)

Enhanced (annotated) DataGrid DataGrid over multiple diverse

  • ver multiple diverse

stores stores

  • Storage of Provenance

Storage of Provenance Information Information

  • Some Data Storage

Some Data Storage

  • Annotated multimedia streams

Annotated multimedia streams

  • Units &

Units & Propoerties Propoerties Ontology Ontology

  • Multiple Triple Stores

Multiple Triple Stores

slide-18
SLIDE 18

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Laboratory Laboratory “ “Blogs Blogs” ”

  • Laboratory notebook is a

Laboratory notebook is a Blog Blog

  • Encourage and facilitate collaboration

Encourage and facilitate collaboration

  • Need a data repository behind the

Need a data repository behind the B Blog log

  • R4L

R4L

  • E-Bank

E-Bank

  • Flexible

Flexible

  • Service oriented approach

Service oriented approach being developed being developed

  • A VRE

A VRE

slide-19
SLIDE 19

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Instrument Blog

‘Blog-jects’

slide-20
SLIDE 20

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication

slide-21
SLIDE 21

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Format Issues – everyday and for the long term

slide-22
SLIDE 22

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Note the use of “YouTube”

An experiment that failed… Publishable? Useful?

slide-23
SLIDE 23

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Record the ‘Scientific Conversation’ – this part of the record often exists

  • nly in the ‘grey literature’

CoAKTing Memetic

slide-24
SLIDE 24

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Laboratory Laboratory IRs IRs and Information and Information Management Management

slide-25
SLIDE 25

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Repositories

slide-26
SLIDE 26

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Validation Validation

  • Increasing the value of data

Increasing the value of data

  • How to bring all the necessary information

How to bring all the necessary information together to enable appropriate validation together to enable appropriate validation

  • Increasingly difficult & expensive to

Increasingly difficult & expensive to achieve achieve

  • Need provenance and context

Need provenance and context

  • Essential step otherwise just a collection

Essential step otherwise just a collection

  • f items
  • f items
slide-27
SLIDE 27

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Why? Why? Publishing Data and Information Publishing Data and Information Loss Loss

slide-28
SLIDE 28

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

SVG “active” graphics Link to data, follow links back to the raw data archive Link to simulation, full simulation data archived in BioSimGrid R4L Paper organized using RDF

slide-29
SLIDE 29

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Access to information requires Access to information requires crossing administrative domains crossing administrative domains

Researcher National Archive Research Group Institution International Database Research Group

slide-30
SLIDE 30

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Subversive and furtive sharing & exploitation

  • f data in

virtual space Data

CAS

RDF OAI

Taxi

E- user Labs Digital Repository

slide-31
SLIDE 31

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

He is charged with expressing contempt for meta-data

slide-32
SLIDE 32

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Metadata Lifecycle Metadata Lifecycle

  • Creation and maintenance of metadata

Creation and maintenance of metadata

  • Need a metadata infrastructure as well as

Need a metadata infrastructure as well as a data infrastructure a data infrastructure

  • Capture process as well as results

Capture process as well as results

  • Automatic metadata generation when

Automatic metadata generation when possible possible

  • Human annotation will always be needed

Human annotation will always be needed

slide-33
SLIDE 33

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Plans Plans

  • Plans are useful

Plans are useful

  • This is the wa

This is the way y things are supposed to be things are supposed to be done done

  • The

The Plan Plan p provide rovides s a a digital context so digital context so increases the value of planning increases the value of planning

  • Key to our

Key to our ‘ ‘Smart Lab Smart Lab’ ’ approach approach… …. .

  • Is it the best way?

Is it the best way?

slide-34
SLIDE 34

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

Who is responsible Who is responsible

  • C

Context is crucial for

  • ntext is crucial for curation

curation

  • every person, on each step of the process

every person, on each step of the process

  • f converting data to knowledge
  • f converting data to knowledge
  • N

Need to consider the future access to this eed to consider the future access to this information by themselves and others. information by themselves and others.

slide-35
SLIDE 35

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference Glasgow

Information Providers Information Consumers These are the same people – if we can ‘talk’ to ourselves efficiently

  • ver time then that is a good start

to be able to ‘talk’ to others

slide-36
SLIDE 36

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

All I am saying is that now is the time to develop the technology to deflect an asteroid We must speed up the knowledge discovery process

slide-37
SLIDE 37

21 Nov 2006 21 Nov 2006 Jeremy G. Frey Jeremy G. Frey University of Southampton University of Southampton DCC Conference 2006

PEOPLE PEOPLE

  • Southampton

Southampton ECS, ECS, MATHS & CHEMISTRY MATHS & CHEMISTRY

  • IT-INNOVATION

IT-INNOVATION

  • BRISTOL

BRISTOL

  • UKOLN

UKOLN

  • CCLRC

CCLRC

  • INDIANA

INDIANA

  • SYDNEY

SYDNEY

  • MANCHESTER

MANCHESTER

  • EPRSC e-Science &

EPRSC e-Science & Chemistry Programmes Chemistry Programmes

  • JISC

JISC e-Infrastructre e-Infrastructre

  • DTI

DTI

  • See web site for full

See web site for full details and links details and links

  • www.combechem.org

www.combechem.org