a distributed architecture for data mining and integra0on
play

ADistributedArchitecturefor DataMiningandIntegra0on - PowerPoint PPT Presentation

AdvancedDataMiningandIntegra0onResearchforEurope ADistributedArchitecturefor DataMiningandIntegra0on MalcolmAtkinson JanovanHemert LiangxiuHan AllyHume


  1. Advanced
Data
Mining
and
Integra0on
Research
for
Europe
 A
Distributed
Architecture
for

 Data
Mining
and
Integra0on
 Malcolm
Atkinson
 Jano
van
Hemert
 Liangxiu
Han
 Ally
Hume
 Chee
Sun
Liew
 www.admire‐project.eu
 ADMIRE
–
Framework
7
ICT
215024


  2. IntroducTon
 • MoTvaTon
 • Mission
&
Principal
InnovaTons
 Proposed
Architecture
 • High‐level
overview
of
the
architecture
 • Components
of
the
architecture
 • DMIL
 • Users
communiTes
and
interacTon
with
the
system
 • The
path
to
DMI
enactment
 Feasibility
Study
 • Use
case
‐
EURExpressII
 • System
walkthrough
 • Research
QuesTon
 ADMIRE
Project
 ...making
data‐mining
easier 
 2 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  3. A
Revolu0on
in
Science
 h\p://www.geongrid.org
 h\p://www.us‐vo.org
 h\p://www.neuropsygrid.org
 h\p://nctr.pmel.noaa.gov/Dart
 h\p://esdis.eosdis.nasa.gov
 h\p://lhc.web.cern.ch/lhc
 h\p://www.sinapse.ac.uk
 ...making
data‐mining
easier 
 3 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  4. Data
Driven
Science
 “…
conTnuing
leadership
in
science
relies
 increasingly
on
effecTve
and
reliable
 access
to
digital
scienTfic
data
…”
 “…
allow
the
users
to
idenTfy
and
access
 spaTal
or
geographical
informaTon
from
a
 wide
range
of
sources,

…
,
in
an
 interoperable
way
for
a
variety
of
uses
…”
 ...making
data‐mining
easier 
 4 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  5. Combinatorial
Complexity
 • Data
integraTon
 – precursor
to
Data
Mining
from
mulTple
sources
 • Data
mining
 – key
to
learning
from
today’s
wealth
of
data
 • Growing
opportunity
and
challenge
 – growing
number
of
distributed
data
 – growing
content
and
complexity
per
data
source
 – growing
number
of
users
 ...making
data‐mining
easier 
 5 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  6. Our
Mission
 • Radically
improve
enactment
of
Data
Mining
 and
data
IntegraTon
(DMI)
processes
across
 heterogeneous
and
distributed
data
resources
 and
data
mining
services.
 ...making
data‐mining
easier 
 6 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  7. Principal
Innova0ons
 • De‐coupling
of
the
enactment
technology
 from
the
tools

used
to
prepare
 data
mining
 and
 integra+on
 (DMI)
processes
 • Accommodate
independent
DMI
enactment
 services,
some
of
which
may
be
Tghtly
 coupled
with
curated
data
 ...making
data‐mining
easier 
 7 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  8. Separa0ng
DMI
levels
of
diversity
 using
DMI
canonical
language
 Hypothesis:
 By
enforcing
logical
 decoupling,
both
the
 tools
development
 and
the
pla9orm
 engineering
will
 proceed
rapidly
and
 independently
 ...making
data‐mining
easier 
 8 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  9. High‐level
Architecture
 ...making
data‐mining
easier 
 9 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  10. Components
of
the
Architecture
 ...making
data‐mining
easier 
 10 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  11. DMI
Language
(DMIL)
 • notaTon
for
all
DMI
requests
to
a
gateway
 • encodes
the
following:
 – Requests
for
informaTon
about
the
services,
data
 resources,
data
collecTons,
defined
components
 and
libraries
supported
by
the
gateway.
 – DefiniTon,
redefiniTon
and
withdrawal
of
any
of
 the
above.
 – Submission
of
requests
to
enact
a
specified
data
 mining
and
integraTon
process.
 ...making
data‐mining
easier 
 11 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  12. User
communi0es
 I
recognise
gene
 Domain
Experts
 expression

 I
can
 implement
and
 support
 I
know
DMI
 algorithms
 DADC
Engineers
 DMI
Experts
 ...making
data‐mining
easier 
 12 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  13. User
interac0on
with
DMI
systems
 DADC
Engineers
 Domain
Experts
 DMI
Experts
 ...making
data‐mining
easier 
 13 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  14. The
path
to
DMI
enactment
 Domain
Experts
 DADC
Engineers
 DMI
Experts
 ...making
data‐mining
easier 
 14 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  15. Use
case:
EURExpressII
 ...making
data‐mining
easier 
 15 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  16. Walkthrough:

 Processing
of
a
DMI
Request
 Decide
 gateway
 Terminate
the
 Validate
 enactment
 request
 Coordinate
 Organise
 and
Monitor
 computaTon
 IniTate
 enactment
 ...making
data‐mining
easier 
 16 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  17. Walkthrough:
Request
in
DMIL
 /*
import
components
*/
 use
dmi.rdb.SQLQuery;
 use
dmi.samplers.ListRandomSample;
 use
dmi.image.ImageRescale;
...
 use
dmi.classifiers.nFoldValidaTon;
 use
dmi.classifiers.LDAClassifier;
 /*
set
up
and
idenTfy
instances
of
the
PE
*/
 SQLQuery
sqlQuery
=
new
SQLQuery;
 ListRandomSample
listSample
=
new
ListRandomSample;
 TupleProjecTon
tupleProj
=
new
TupleProjecTon;
 GetFile
getFile
=
new
GetFile;
 ImageRescale
imageRescale
=
new
ImageRescale;
 MedianFilter
medianFilter
=
new
MedianFilter;
 WaveletDecomp
wavelet
=
new
WaveletDecomp;
 TupleMerge
tupleMerge
=
new
TupleMerge;
 ViaStatus
deliver
=
new
ViaStatus;
 String
query
=
“SELECT
leName,
.
.
.
 FROM
EURExpress.images,
.
.
.
 WHERE
.
.
.
”;
 ...making
data‐mining
easier 
 17 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  18. Walkthrough:
Request
in
DMIL
 /*
the
literal
“query"
gets
connected
to
sqlQuery's
input
“expression"*/
 |‐
query
‐|
=>
expression‐>sqlQuery;
 /*
sqlQuery's
output
“data"
gets
connected
to
listSample’s
input
“dataIn"
*/
 sqlQuery‐>data
=>
dataIn‐>listSample;
 |‐
0.01
‐|
=>
fracTon‐>listSample;
 ConnecTon
c1;
listSample‐>dataOut
=>
c1;
 c1
=>
filename‐>getFile;
 c1
=>
data‐>tupleProj;
 |‐
["date",
"assay#",
.
.
.
]
‐|
=>
columnIds‐>tupleProj;
 getFile‐>data
=>
dataIn‐>imageRescale;
 imageRescale‐>dataOut
=>
dataIn‐>medianFilter;
 |‐
repeat
enough
<
300,
200
>
‐|
=>
size‐>medianFilter;
 medianFilter‐>dataOut
=>
dataIn‐>wavelet;
 wavelet‐>dataOut
=>
dataIn[0]‐>tupleMerge;
 tupleProj‐>result
=>
dataIn[1]‐>tupleMerge;
 ValidaTon
val
=
nFoldValidaTon
(10,
LDAClassifier);
 tupleMerge‐>dataOut
=>
data‐>val;
 val‐>results
=>
data‐>deliver;
 ...making
data‐mining
easier 
 18 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


  19. Walkthrough:
Decide
Gateway
 ...making
data‐mining
easier 
 19 
 ADMIRE
@
DADC'09,
Munich,
Germany
‐
June
9,
2009
 ADMIRE
–
Framework
7
ICT
215024


Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend