basic concepts overview first principle models most of
play

Basic Concepts Overview First Principle Models Most of science and - PowerPoint PPT Presentation

Basic Concepts Overview First Principle Models Most of science and engineering is based on first-principle models First-principle vs. data-driven models Starts with a model Technology-centered vs. problem-centered courses


  1. Basic Concepts Overview First Principle Models • Most of science and engineering is based on first-principle models • First-principle vs. data-driven models • Starts with a model • Technology-centered vs. problem-centered courses – Kirchhoff’s laws • Five problems we will discuss – Newton’s laws of mechanics • Experimental process – Maxwell’s laws • Causality • Engineers then apply these to build and analyze systems • Types of variables (measurement scales) • Consider, for example, circuit analysis • Frequentist vs. Bayesian interpretations • Experimental data – Used to verify the underlying first-principle models – Used to estimate unkown parameters (e.g. acceleration of gravity) • This approach is consistent with the scientific method • Most common approach for Kalman filters J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 1 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 2 Scientific Method Complex Systems 1. Observe some aspect of the universe • Many systems are too complex to analyze using first-principle models 2. Generate a testable hypothesis that is consistent with observations • Examples: Generate a model of . . . 3. Generate predictions – automobile exhaust temperature 4. Test predictions with further observations – employee performance based on survey answers 5. If consistent, publish. Otherwise, goto 2. – the daily precipitation in Portland • Alternative – Collect data under all normal operating conditions – Construct a model from the data J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 3 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 4

  2. Technology-Centered vs. Problem Centered Data-based (Nonparametric) Models • Many classes in this topic area are technology-centered • This is becoming increasingly common – Neural networks – Data acquisition systems are becoming cheaper – Genetic algorithms – Computational power is increasing – Fuzzy logic – Methods of generating models from data are improving – Evolutionary computing • The ability to extract useful knowledge contained in data is becoming increasingly important • These classes discuss concepts and algorithms followed by applications • Amazon.com has 50 Terrabytes of consumer data. – What can be done with it? • An alternative approach is problem centered – Given a problem, what are reasonable and accepted solutions – What information is “contained in” the data? • Three of the five problems we will discuss are data-driven J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 5 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 6 Five Problems for this Class Approach • Hypothesis Testing: Given n groups of data, how do you • Most of the methods that we will discuss are derived from determine whether they have different statistical properties (e.g., statistics mean, standard deviation, pdf, etc.)? • Will not focus on biologically motivated methods of learning • Modeling: Given a set of input-output data, how do you generate (neural networks) a model? • These problems have a very deep history • Density estimation: Given a set of data, how do you estimate • There are essentially two stages for the three problems of learning the distribution from which the data was drawn? 1. Model construction (from the data) • Pattern recognition: Given a set of labeled data divided into a 2. Prediction (i.e. application of the model to new data) set of classes, how do you classify unlabeled data? • These two stages (our focus) are only part of a larger general • Optimization: Given a measure of performance and many process parameters, how do you adjust the parameters to maximize performance? J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 7 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 8

  3. Experimental Process Experimental Process Continued Drawing conclusions from data usually requires the following general 5. Preprocessing. Outlier detection & removal. Encoding of experimental procedure features. Event detection. Scaling. Input selection. 1. State the problem. Specialized knowledge is usually necessary to 6. Model Estimation. Our focus. Estimate dependencies between have a meaningful problem statement. inputs and output. Goal: accurate prediction (generalization). 2. Hypothesis Formulation. What depends on what? Label 7. Model Interpretation and Conclusions. What information did outputs, pick inputs. the model find in the data? How accurately can the models predict? Which inputs are most important? Which are irrelevant? 3. Data Generation/Experimental design. How is the data to be generated? What is measured? How accurate are the measurements? 4. Data Collection. J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 9 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 10 Experimental Process Comments Causality • Most of these steps are application-domain dependent Observed Process Variables Output • Cannot be easily formalized x 1 ,...,x n y c Observed Unobserved • Will not be generally discussed in this class Variables Variables z 1 ,...,z n x n ,...,x n z c+ 1 d • Goal: estimate unknown input-output dependency • Is easy to confuse modeling with identifying a causal relationship • The “outputs” (a.k.a. predictor variables) are not necessarily caused by the inputs • Example – Input: exhaust temperature – Output: air/fuel ratio of engine input • Could you control the air/fuel ratio by adjusting the exhaust temperature? J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 11 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 12

  4. Causality Continued Causality Continued 2 • It is common to mistake a statistical relationship between two Observed variables for causality Process Variables Output x 1 ,...,x n y – Married men live longer than single men c Observed Unobserved – Bush is president, the economy is plummeting Variables Variables z 1 ,...,z n x n ,...,x n z c+ 1 d – Height vs. weight – Florida is warm, Florida has a higher fraction of older people than any other state • Second example – “Standard & Poor’s 500 index [dropped] to its lowest close – Input: price of competitor’s stock since late October 1998, as investors buckled under another – Output: your price/earnings ratio heap of corporate profit warnings and steep job cuts.” • Inputs do not uniquely specify the output • Key point: causality cannot be determined from data analysis • We could not build a perfect model alone • It must be assumed or demonstrated by an argument outside of • Ideally we would like to know the complete conditional distribution statistical analysis p ( y | x ) = probability of output, given the input • Statistical dependency does not imply causality J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 13 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 14 Causality Continued 3 Variable Types There are five types of variables that we will encounter in examples • Four possibilities and in your projects – Outputs may causally depend on the observed inputs 1. Nominal/Categorical: No order or distance relation – Inputs may causally depend on the observed outputs • Colors – Input-output dependency may be caused by other (unobserved) • Gender factors • Binary variables – Input-output correlation is non-causal • Names • Each must be substantiated by arguments outside of data analysis 2. Periodic: Values have distance relation, but no order • Must be careful in interpreting results of “data mining” or • Days of the week “knowledge discovery” • Time • Meaningful dependencies can be found only if problem formulation 3. Ordinal: Order relation, but no distance relation is meaningful • Class rank • Data mining cannot replace commonsense knowledge • Analog ECE course sequence • Gold, silver, & bronze medal positions in Olympics J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 15 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 16

  5. Variable Types Continued Frequentist vs. Bayesian Interpretations 4. Interval/Numeric: Order relation and a distance relation • In most cases, we will assume the data has been drawn from a statistical distribution • Temperature (Celsius and Fahrenheit) • We will consider it as a random experiment • Potential energy • Voltage reference in an op amp circuit • This traditional view is called a frequentist interpretation • Learning amounts building a model based on available data and 5. Ratio: Values have an order relation and a distance relation. The apriori knowledge of the problem ratios of numbers are meaningful and there is a natural interpretation of 0. Includes most real-valued cases. • This doesn’t always make sense • Income – An economist predicts 80% chance of recession in 3 months • Mass • There is no random experiment • Speed • Probability in this case is really a measure of subjective belief • Current • This is known as the Bayesian interpretation of probabilities • There is a raging debate among statisticians as to which approach is better J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 17 J. McNames Portland State University ECE 4/557 Basic Concepts Ver. 1.07 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend