Data Farming
Getting the Most Out of Moore’s Law and Cluster Computing
Data Farming Getting the Most Out of Moores Law and Cluster - - PowerPoint PPT Presentation
Data Farming Getting the Most Out of Moores Law and Cluster Computing Data Mining vs. Data Farming Miners seek valuable buried nuggets - Miners have no control over whats there or how hard it is to separate it out - Data Mining seeks
Getting the Most Out of Moore’s Law and Cluster Computing
how hard it is to separate it out
within massive amounts of data
advantage: pest control, irrigation, fertilizer, etc.
to advantage with designed experimentation
important tool in its decision-making processes for diverse areas such as: logistics, humanitarian aid, peace support operations, anti- piracy & anti-terrorist efforts, future force planning, and combat modeling
“factors” that can be set to different levels
I n p u t s O u t p u t s
dominate in determining the outputs
“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb
“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb
“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb
Typical assumptions for physical experiments
– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors
“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb
Typical assumptions for physical experiments
– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors
Characteristics of typical simulation models
– Large # of factors – Many output measures of interest – Heterogeneous error – Non-linear – Many significant effects – Significant higher order interactions – Varied error structure
“The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb
Typical assumptions for physical experiments
– Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors
Characteristics of typical simulation models
– Large # of factors – Many output measures of interest – Heterogeneous error – Non-linear – Many significant effects – Significant higher order interactions – Varied error structure
Without a good plan for changing multiple factors simultaneously:
yielding answers to the fundamental questions
Without a good plan for changing multiple factors simultaneously:
yielding answers to the fundamental questions A Simple Example: Capture the Flag
Without a good plan for changing multiple factors simultaneously:
yielding answers to the fundamental questions A Simple Example: Capture the Flag
Speed Low High Success? No Yes Stealth Low High Speed Stealth
Without a good plan for changing multiple factors simultaneously:
yielding answers to the fundamental questions Which is more important, stealth or speed? A Simple Example: Capture the Flag
Speed Low High Success? No Yes Stealth Low High Speed Stealth
Without a good plan for changing multiple factors simultaneously:
yielding answers to the fundamental questions Which is more important, stealth or speed? A Simple Example: Capture the Flag
Speed Low High Success? No Yes Stealth Low High Speed Stealth
No way to tell! The factors are “confounded”
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
If we vary Speed and Stealth separately, we (incorrectly) conclude neither contributes to success!
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
By varying Speed and Stealth together rather than separately, we see there is an “interaction”
Speed Low High Low Success? No No No Stealth Low Low High Speed Stealth
By varying Speed and Stealth together rather than separately, we see there is an “interaction” This is a “factorial” or “gridded” design
11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.
Speed Speed Stealth Stealth
11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.
Speed Speed Stealth Stealth
Factorial Designs grow exponentially with the number of factors!
power has maintained an exponential growth rate
Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million
power has maintained an exponential growth rate
Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million
power has maintained an exponential growth rate
required for our experiment to a mere 40 million years
Efficient R5 FF and CCD
Factorial (gridded) designs are most familiar
Efficient R5 FF and CCD
Efficient R5 FF and CCD
We have focused on Latin hypercubes
Efficient R5 FF and CCD
and sequential approaches
Efficient R5 FF and CCD
Efficient R5 FF and CCD
A
B C
D E
F
G
A
B C
D E
F
G
The pairwise projections for a 17-run, 7-factor orthogonal LH demonstrate – Orthogonality, which guarantees the 7 factors are unconfounded – Space-filling behavior so there are no large gaps in our exploration 17 total design points!
inflection points, tipping points, or thresholds
effects and two-way interactions for up to 443 factors
factors are under development and testing
to a feasible range
measured in days or weeks
hours or days
ASC-U: Assignment Scheduling Capability for UAVs Major Christopher J Nannini
15 Payloads and Sensors
26 factors, assessing UAV
& sensor packages, and decision
Without data farming techniques, the study would have required over 9,000 centuries to complete. The results of this study influenced DoD’s decision to cut two proposed lines of UAVs, yielding a multi-billion dollar savings.
identify, and attack opportunistically
so then can operate more freely
Frigate Defense Effectiveness in Asymmetrical Green Water Engagements KptLt Heiko Abel, German Navy
Background: Piracy has increased off East Africa Research Questions/Issues:
helicopter frigate?
procedures which improve the frigate’s survivability regardless of attacker’s weaponry and tactics
TTPs for both attackers and defenders
Frigate Defense Effectiveness in Asymmetrical Green Water Engagements KptLt Heiko Abel, German Navy
Approach: Used agent-based simulation to model swarm attacks by pirates in small agile fast craft (SAFC) Findings:
package dominates
available from the SEED Center’s web site:
Workshops (IDFWs) twice a year in conjunction with our international partners and colleagues. The next workshop is here in Monterey, March 21-25. See the web site for details.