Data Farming Getting the Most Out of Moores Law and Cluster - - PowerPoint PPT Presentation

data farming
SMART_READER_LITE
LIVE PREVIEW

Data Farming Getting the Most Out of Moores Law and Cluster - - PowerPoint PPT Presentation

Data Farming Getting the Most Out of Moores Law and Cluster Computing Data Mining vs. Data Farming Miners seek valuable buried nuggets - Miners have no control over whats there or how hard it is to separate it out - Data Mining seeks


slide-1
SLIDE 1

Data Farming

Getting the Most Out of Moore’s Law and Cluster Computing

slide-2
SLIDE 2

Data Mining vs. Data Farming

  • Miners seek valuable buried nuggets
  • Miners have no control over what’s there or

how hard it is to separate it out

  • Data Mining seeks valuable information buried

within massive amounts of data

  • Farmers cultivate to maximize yield
  • Farmers manipulate the environment to their

advantage: pest control, irrigation, fertilizer, etc.

  • Data Farming manipulates simulation models

to advantage with designed experimentation

slide-3
SLIDE 3

Simulation in DoD

  • DoD uses complex high-dimensional simulation models as an

important tool in its decision-making processes for diverse areas such as: logistics, humanitarian aid, peace support operations, anti- piracy & anti-terrorist efforts, future force planning, and combat modeling

  • Many simulations involve dozens, hundreds, or thousands of

“factors” that can be set to different levels

slide-4
SLIDE 4

Abstracting Simulation

Simulation Model

I n p u t s O u t p u t s

  • A computer simulation transforms inputs to outputs
  • Pareto Principle - a small subset of the inputs

dominate in determining the outputs

slide-5
SLIDE 5

Design of Experiments

“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

slide-6
SLIDE 6

Design of Experiments

“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

But simulation experiments are different...

slide-7
SLIDE 7

Design of Experiments

“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

But simulation experiments are different...

Typical assumptions for physical experiments

– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors

slide-8
SLIDE 8

Design of Experiments

“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

But simulation experiments are different...

Typical assumptions for physical experiments

– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors

Characteristics of typical simulation models

– Large # of factors – Many output measures of interest – Heterogeneous error – Non-linear – Many significant effects – Significant higher order interactions – Varied error structure

slide-9
SLIDE 9

Design of Experiments

“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

But simulation experiments are different...

Typical assumptions for physical experiments

– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors

Characteristics of typical simulation models

– Large # of factors – Many output measures of interest – Heterogeneous error – Non-linear – Many significant effects – Significant higher order interactions – Varied error structure

slide-10
SLIDE 10

Why Do We Need DOE?

Without a good plan for changing multiple factors simultaneously:

  • We limit the insights possible (can’t “untangle” effects)
  • Haphazardly choosing scenarios can use up a lot of time without

yielding answers to the fundamental questions

slide-11
SLIDE 11

Why Do We Need DOE?

Without a good plan for changing multiple factors simultaneously:

  • We limit the insights possible (can’t “untangle” effects)
  • Haphazardly choosing scenarios can use up a lot of time without

yielding answers to the fundamental questions A Simple Example: Capture the Flag

slide-12
SLIDE 12

Why Do We Need DOE?

Without a good plan for changing multiple factors simultaneously:

  • We limit the insights possible (can’t “untangle” effects)
  • Haphazardly choosing scenarios can use up a lot of time without

yielding answers to the fundamental questions A Simple Example: Capture the Flag

Speed Low High Success? No Yes Stealth Low High Speed Stealth

slide-13
SLIDE 13

Why Do We Need DOE?

Without a good plan for changing multiple factors simultaneously:

  • We limit the insights possible (can’t “untangle” effects)
  • Haphazardly choosing scenarios can use up a lot of time without

yielding answers to the fundamental questions Which is more important, stealth or speed? A Simple Example: Capture the Flag

Speed Low High Success? No Yes Stealth Low High Speed Stealth

slide-14
SLIDE 14

Why Do We Need DOE?

Without a good plan for changing multiple factors simultaneously:

  • We limit the insights possible (can’t “untangle” effects)
  • Haphazardly choosing scenarios can use up a lot of time without

yielding answers to the fundamental questions Which is more important, stealth or speed? A Simple Example: Capture the Flag

Speed Low High Success? No Yes Stealth Low High Speed Stealth

No way to tell! The factors are “confounded”

slide-15
SLIDE 15

One-at-a-Time Variation?

slide-16
SLIDE 16

One-at-a-Time Variation?

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

slide-17
SLIDE 17

One-at-a-Time Variation?

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

If we vary Speed and Stealth separately, we (incorrectly) conclude neither contributes to success!

slide-18
SLIDE 18

One-at-a-Time Variation? No!

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

slide-19
SLIDE 19

One-at-a-Time Variation? No!

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

slide-20
SLIDE 20

One-at-a-Time Variation? No!

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

By varying Speed and Stealth together rather than separately, we see there is an “interaction”

slide-21
SLIDE 21

One-at-a-Time Variation? No!

Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth

By varying Speed and Stealth together rather than separately, we see there is an “interaction” This is a “factorial” or “gridded” design

slide-22
SLIDE 22

Finer Grids

  • Which output would you prefer to see?
  • The fly in the ointment - Studying two factors at this level of detail requires

11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.

Speed Speed Stealth Stealth

slide-23
SLIDE 23

Finer Grids

  • Which output would you prefer to see?
  • The fly in the ointment - Studying two factors at this level of detail requires

11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.

Speed Speed Stealth Stealth

Factorial Designs grow exponentially with the number of factors!

slide-24
SLIDE 24

How Bad is That?

  • Consider a model with 100 factors
  • Study each factor at only two levels

This would require 2100 experiments 2100 ≈ 1030, i.e., a “one” followed by thirty zeros!

slide-25
SLIDE 25

How Bad is That?

  • Consider a model with 100 factors
  • Study each factor at only two levels

This would require 2100 experiments 2100 ≈ 1030, i.e., a “one” followed by thirty zeros! If we could perform one billion experiments per second and started running experiments at the big bang, we would have completed less than (1/2500)th of the total number of experiments!!!!

slide-26
SLIDE 26

Can Moore’s Law Save us?

  • Moore’s Law is not a law - it is an observation that computing

power has maintained an exponential growth rate

  • In recent years, this has produced “petaflop” computers
slide-27
SLIDE 27

Can Moore’s Law Save us?

Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million

  • Moore’s Law is not a law - it is an observation that computing

power has maintained an exponential growth rate

  • In recent years, this has produced “petaflop” computers
slide-28
SLIDE 28

Can Moore’s Law Save us?

Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million

  • Moore’s Law is not a law - it is an observation that computing

power has maintained an exponential growth rate

  • In recent years, this has produced “petaflop” computers
  • Using the Roadrunner supercomputer would reduce the time

required for our experiment to a mere 40 million years

  • This is better, but still not good enough to be of practical use
slide-29
SLIDE 29

We Need New Types of Designs

Efficient R5 FF and CCD

slide-30
SLIDE 30

We Need New Types of Designs

Factorial (gridded) designs are most familiar

Efficient R5 FF and CCD

slide-31
SLIDE 31

We Need New Types of Designs

Efficient R5 FF and CCD

slide-32
SLIDE 32

We Need New Types of Designs

We have focused on Latin hypercubes

Efficient R5 FF and CCD

slide-33
SLIDE 33

We Need New Types of Designs

and sequential approaches

Efficient R5 FF and CCD

slide-34
SLIDE 34

We Need New Types of Designs

Efficient R5 FF and CCD

slide-35
SLIDE 35

Nearly Orthogonal Latin Hypercubes

A

  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

B C

  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

D E

  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

F

  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.0
0.0 1.0

G

slide-36
SLIDE 36

Nearly Orthogonal Latin Hypercubes

A

  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

B C

  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

D E

  • 1.0
0.0 1.0
  • 1.0
0.0 1.0

F

  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.
0. 0. 5 1.
  • 1.0
0.0 1.0

G

The pairwise projections for a 17-run, 7-factor orthogonal LH demonstrate – Orthogonality, which guarantees the 7 factors are unconfounded – Space-filling behavior so there are no large gaps in our exploration 17 total design points!

slide-37
SLIDE 37

SEED Center DOE Capabilities

  • NOLH designs
  • identify dominant factors
  • accommodate non-linear behaviors such as curvature,

inflection points, tipping points, or thresholds

  • can study up to 29 factors using only 257 design points
  • We have developed new designs that can study all main

effects and two-way interactions for up to 443 factors

  • Adaptive sequential methods that can study thousands of

factors are under development and testing

slide-38
SLIDE 38

High Performance Computing

  • Data Farming techniques bring run requirements back

to a feasible range

  • on a single computer total run times can be

measured in days or weeks

  • with high performance computing this improves to

hours or days

  • Can generate massive amounts of data
  • bandwidth is a significant consideration
slide-39
SLIDE 39

Data Farming in Practice

ASC-U: Assignment Scheduling Capability for UAVs Major Christopher J Nannini

15 Payloads and Sensors

26 factors, assessing UAV

  • perating characteristics, payload

& sensor packages, and decision

  • ptions, were varied by ±20%.

Without data farming techniques, the study would have required over 9,000 centuries to complete. The results of this study influenced DoD’s decision to cut two proposed lines of UAVs, yielding a multi-billion dollar savings.

slide-40
SLIDE 40
  • Piracy is economically rather than politically motivated
  • Pirates mingle freely with fishing vessels, making them hard to

identify, and attack opportunistically

  • Pirates may seek to reduce NATO presence via “embarrassment”

so then can operate more freely

Data Farming in Practice

Frigate Defense Effectiveness in Asymmetrical Green Water Engagements KptLt Heiko Abel, German Navy

Background: Piracy has increased off East Africa Research Questions/Issues:

  • Can a pirate swarm attack kill a typical NATO

helicopter frigate?

  • Identify weapons mixes and tactics, techniques &

procedures which improve the frigate’s survivability regardless of attacker’s weaponry and tactics

slide-41
SLIDE 41
  • Used Data Farming techniques to study a broad variety of weapons mixes and

TTPs for both attackers and defenders

  • Explored 72 factors in 61,680 runs of the simulation

Data Farming in Practice

Frigate Defense Effectiveness in Asymmetrical Green Water Engagements KptLt Heiko Abel, German Navy

Approach: Used agent-based simulation to model swarm attacks by pirates in small agile fast craft (SAFC) Findings:

  • NATO FFHs ARE vulnerable to swarm attacks
  • A well balanced weapons mix improves survivability, but no single weapons

package dominates

  • NATO Rules of Engagement do not adversely affect FFH survivability
slide-42
SLIDE 42

Find Out More

  • Academic papers, theses, and DOE software are

available from the SEED Center’s web site:

  • The SEED Center runs International Data Farming

Workshops (IDFWs) twice a year in conjunction with our international partners and colleagues. The next workshop is here in Monterey, March 21-25. See the web site for details.

http://harvest.nps.edu

slide-43
SLIDE 43

QUESTIONS?