Data Farming Getting the Most Out of Moores Law and Cluster - PowerPoint PPT Presentation

Data Farming Getting the Most Out of Moore’s Law and Cluster Computing

Data Mining vs. Data Farming • Miners seek valuable buried nuggets - Miners have no control over what’s there or how hard it is to separate it out - Data Mining seeks valuable information buried within massive amounts of data • Farmers cultivate to maximize yield - Farmers manipulate the environment to their advantage: pest control, irrigation, fertilizer, etc. - Data Farming manipulates simulation models to advantage with designed experimentation

Simulation in DoD • DoD uses complex high-dimensional simulation models as an important tool in its decision-making processes for diverse areas such as: logistics, humanitarian aid, peace support operations, anti- piracy & anti-terrorist efforts, future force planning, and combat modeling • Many simulations involve dozens, hundreds, or thousands of “factors” that can be set to different levels

Abstracting Simulation O I u n Simulation t p p u Model u t t s s • A computer simulation transforms inputs to outputs • Pareto Principle - a small subset of the inputs dominate in determining the outputs

Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb

Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different...

Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different... Typical assumptions for physical experiments – Small/ moderate # of factors – Univariate response – Homogeneous error – Linear – Sparse effects – Higher order interactions negligible – Normal errors

Design of Experiments “The idea behind [simulation]…is to [replace] theory by experiment whenever the former falters—Hammersley and Handscomb But simulation experiments are different... Typical assumptions for Characteristics of typical physical experiments simulation models – Large # of factors – Small/ moderate # of factors – Many output measures of interest – Univariate response – Heterogeneous error – Homogeneous error – Non-linear – Linear – Many significant effects – Sparse effects – Significant higher order interactions – Higher order interactions negligible – Varied error structure – Normal errors

Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions

Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag

Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Stealth High High Yes Speed

Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Which is more important, Stealth High High Yes stealth or speed? Speed

Why Do We Need DOE? Without a good plan for changing multiple factors simultaneously: • We limit the insights possible (can’t “untangle” effects) • Haphazardly choosing scenarios can use up a lot of time without yielding answers to the fundamental questions A Simple Example: Capture the Flag Speed Stealth Success? Low Low No Which is more important, Stealth High High Yes stealth or speed? Speed No way to tell! The factors are “confounded”

One-at-a-Time Variation?

One-at-a-Time Variation? Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed

One-at-a-Time Variation? Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed If we vary Speed and Stealth separately, we (incorrectly) conclude neither contributes to success!

One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed

One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed By varying Speed and Stealth together rather than separately, we see there is an “interaction”

One-at-a-Time Variation? No! Speed Stealth Success? Low Low No Stealth High Low No Low High No Speed By varying Speed and Stealth together rather than separately, we see there is an “interaction” This is a “factorial” or “gridded” design

Finer Grids • Which output would you prefer to see? Stealth Stealth Speed Speed • The fly in the ointment - Studying two factors at this level of detail requires 11x11=121 experiments. Three factors would take 11x11x11=1331 experiments.

Finer Grids • Which output would you prefer to see? Stealth Stealth Speed Speed • The fly in the ointment - Studying two factors at this level of detail requires 11x11=121 experiments. Three factors would take 11x11x11=1331 experiments. Factorial Designs grow exponentially with the number of factors!

How Bad is That? • Consider a model with 100 factors • Study each factor at only two levels This would require 2 100 experiments 2 100 ≈ 10 30 , i.e., a “one” followed by thirty zeros!

How Bad is That? • Consider a model with 100 factors • Study each factor at only two levels This would require 2 100 experiments 2 100 ≈ 10 30 , i.e., a “one” followed by thirty zeros! If we could perform one billion experiments per second and started running experiments at the big bang, we would have completed less than (1/2500) th of the total number of experiments!!!!

Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers

Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million

Can Moore’s Law Save us? • Moore’s Law is not a law - it is an observation that computing power has maintained an exponential growth rate • In recent years, this has produced “petaflop” computers Petaflop = 1000 trillion ops/second Cost of “Roadrunner”= $133 million • Using the Roadrunner supercomputer would reduce the time required for our experiment to a mere 40 million years • This is better, but still not good enough to be of practical use

We Need New Types of Designs Efficient R5 FF and CCD

We Need New Types of Designs Efficient R5 FF and CCD Factorial (gridded) designs are most familiar

We Need New Types of Designs We have focused on Latin hypercubes Efficient R5 FF and CCD

We Need New Types of Designs and sequential Efficient R5 FF approaches and CCD

Nearly Orthogonal Latin Hypercubes -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. 0 0 5 0 0 0 5 0 0 0 5 0 1.0 0.0 A -1.0 1.0 0.0 B -1.0 1.0 0.0 C -1.0 1.0 0.0 D -1.0 1.0 0.0 E -1.0 1.0 0.0 F -1.0 1.0 0.0 G -1.0 -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. -1. 0. 0. 1. 0 0 5 0 0 0 5 0 0 0 5 0 0 0 5 0

Data Farming Getting the Most Out of Moores Law and Cluster - PowerPoint PPT Presentation

Data Farming Getting the Most Out of Moores Law and Cluster Computing Data Mining vs. Data Farming Miners seek valuable buried nuggets - Miners have no control over whats there or how hard it is to separate it out - Data Mining seeks

Smart farming & Big data management If you dont measure it, you cant improve it

LEAF Open Farm Sunday 2018 Connecting People With Food and Farming Showcasing British Farming

LEAF Open Farm Sunday 2019 Connecting people with the world of farming LINKING ENVIRONMENT AND

QUALITY OF LIGHT MATTERS IN VERTICAL FARMING V a l o y a P r e s e n t a t i o n

Urban Farming Project City of Spokane Washington Process Urban Farming Open House

POSITIVE POINTS FOR SNAIL FARMING. 1. Snail Farming in Nigeria is relatively new. This

The need for a higher integration of precision farming technologies Digital farming Luis Prez

Updates on Tuna farming Status on the Bluefin Tuna Seedling Production and Farming in Japan

The Face of UK Agriculture? Jake Freestone Farm Manager Overbury Farms 2013 Nuffield Scholar

Contract farming: moving us forward Demetri Chriss CEO Tuvunu S.A. Director, Business

An introduction June 2008 Disclaimer for NZ Farming Systems Introduction Presentation This

Future Farming Scotland Farming with Nature Our strategies and tips 1) Collaboration is key 2)

Flood based farming in Tigray: Status, Flood based farming in Tigray: Status, Potential and

Nutrient Farming: the Business of Nutrient Farming: the Business of Environment Management D. Hey,

SEKEM Farming in the Desert Heba Askar World Environment day , Cairo House 11 June 2015 Farming

Food Security, Farming, and Climate Food Security, Farming, and Climate Change to Change to 2050

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science

Analytical models for performance and energy consumption evaluation of storage devices Eric Borba

Recursion & Induction CS16: Introduction to Algorithms & Data Structures Spring 2020

[ ] F , , , = is the m -dimisional vector of F F F (unobservable) sources of variation

Dimensionality reduction Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Bubbles for Fama PRESENTER Robin Greenwood, Harvard Business School DISCUSSANT Bubbles for Fama

Data Farming Getting the Most Out of Moores Law and Cluster - PowerPoint PPT Presentation

Data Farming Getting the Most Out of Moores Law and Cluster Computing Data Mining vs. Data Farming Miners seek valuable buried nuggets - Miners have no control over whats there or how hard it is to separate it out - Data Mining seeks

Smart farming &amp; Big data management If you dont measure it, you cant improve it

LEAF Open Farm Sunday 2018 Connecting People With Food and Farming Showcasing British Farming

LEAF Open Farm Sunday 2019 Connecting people with the world of farming LINKING ENVIRONMENT AND

QUALITY OF LIGHT MATTERS IN VERTICAL FARMING V a l o y a P r e s e n t a t i o n

Urban Farming Project City of Spokane Washington Process Urban Farming Open House

POSITIVE POINTS FOR SNAIL FARMING. 1. Snail Farming in Nigeria is relatively new. This

The need for a higher integration of precision farming technologies Digital farming Luis Prez

Updates on Tuna farming Status on the Bluefin Tuna Seedling Production and Farming in Japan

The Face of UK Agriculture? Jake Freestone Farm Manager Overbury Farms 2013 Nuffield Scholar

Contract farming: moving us forward Demetri Chriss CEO Tuvunu S.A. Director, Business

An introduction June 2008 Disclaimer for NZ Farming Systems Introduction Presentation This

Future Farming Scotland Farming with Nature Our strategies and tips 1) Collaboration is key 2)

Flood based farming in Tigray: Status, Flood based farming in Tigray: Status, Potential and

Nutrient Farming: the Business of Nutrient Farming: the Business of Environment Management D. Hey,

SEKEM Farming in the Desert Heba Askar World Environment day , Cairo House 11 June 2015 Farming

Food Security, Farming, and Climate Food Security, Farming, and Climate Change to Change to 2050

Experimental Analysis Marco Chiarandini Department of Mathematics &amp; Computer Science

CPSC 531: System Modeling and Simulation Carey Williamson Department of Computer Science

Analytical models for performance and energy consumption evaluation of storage devices Eric Borba

Recursion &amp; Induction CS16: Introduction to Algorithms &amp; Data Structures Spring 2020

[ ] F , , , = is the m -dimisional vector of F F F (unobservable) sources of variation

Dimensionality reduction Machine Learning Hamid Beigy Sharif University of Technology Fall 1393

Dimension Reduction with Heavy Tails Gabriel Kuhn Munich University of Technology

Bubbles for Fama PRESENTER Robin Greenwood, Harvard Business School DISCUSSANT Bubbles for Fama

Smart farming & Big data management If you dont measure it, you cant improve it

Experimental Analysis Marco Chiarandini Department of Mathematics & Computer Science

Recursion & Induction CS16: Introduction to Algorithms & Data Structures Spring 2020