Data Engineer Our Data Science with Significant Statistics, to - - PowerPoint PPT Presentation

data engineer our data science with significant
SMART_READER_LITE
LIVE PREVIEW

Data Engineer Our Data Science with Significant Statistics, to - - PowerPoint PPT Presentation

Data Engineer Our Data Science with Significant Statistics, to Enrich Success by Enhancing Trust and Value of Society Arnold Goodman, Cofounder: Interface Symposia plus Statistical Analysis and Data Mining: The ASA Data Science Journal Did Not


slide-1
SLIDE 1

Did Not “Jack” Good Use His Statistics to Enrich Success

  • f Alan Turing A. I. to Break

the German’s Enigma Code, without High Commanders

  • f Germany Learning Code

Was Broken and Fixing It? 1

DATA ENGINEER TRUST AND VALUE Why Not Enhance Processes by Our Products, Models by Our Meanings, Algorithms by Our Actions, Data by Our Descriptions and Possibilities by Our Probabilities of Consequences?

INFORMATION WEEK OF FEBRUARY 8, 1993

STEIN SHRINKAGE, SUPERVISED LEARNING, OR AN APPROPRIATE TECHNIQUE MAY BE USED TO MOVE ALGORITHMIC MODEL PREDICTION AND DATA MODEL PREDICTION CLOSER TO EACH OTHER. 5 OTHERS ARE CHALLENGED TO EMPLOY MY ASSOCIATION, EVALUATE ITS RESULTS AS WELL AS DOCUMENT THEIR SITUATIONS, PROBLEMS, IMPROVEMENTS, FINDINGS AND CONCLUSIONS.

Why Not Balance Our Data Science Deduction by Inference, Efficiency by Effectiveness, Findings by Likely Conclusions, Precision by Accuracy, and Speed by Far Improved Quality?

Data Engineering with Statistics to Iteratively Associate Leo Breiman’s Algorithmic Model Prediction with a Likely-Good Data Model Relationship

STEIN SHRINKAGE, SUPERVISED LEARNING, OR AN APPROPRIATE TECHNIQUE MAY BE USED TO MOVE ALGORITHIC MODEL PREDICTION AND DATA MODEL PREDICTION EVER CLOSER TO EACH OTHER. OTHERS ARE CHALLENGED TO EMPLOY MY ASSOCIATION, EVALUATE ITS RESULTS AS WELL AS DOCUMENT THEIR SITUATIONS, PROBLEMS, IMPROVEMENTS, FINDINGS AND CONCLUSIONS.

Communcations of the ACM 58:6, 46,June 2015

Data Engineer Our Data Science with Significant Statistics, to Enrich Success by Enhancing Trust and Value of Society

Arnold Goodman, Cofounder: Interface Symposia plus Statistical Analysis and Data Mining: The ASA Data Science Journal

InformationWeek, Before 62, February 8, 1993

DATA ENGINEER OUR DATA SCIENCE

slide-2
SLIDE 2

Data Science Evidence for Data Engineering with Statistics

  • Why Not “Ask Watson or Siri: Artificial Intelligence Is as Elusive as Ever?” 2
  • To Improve “The State of A. I … We Need … More Predictive Models … That Can ‘Routinely Make Predictions’.” 3
  • A. I. has “Thrill of Discovery” and “Science” -- Needs “Data Analysis of Statistics” and “Power of Engineering”. 4
  • The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age – Evaluate A. I. Impacts. 5
  • “Discussed the Problem of Overfitting … in Deep Neural Networks (and) … Techniques to Prevent Overfitting”. 6
  • “95% of Tasks Do Not Require Deep Learning. … In 90% of Cases Generalized Linear Regression Will Do.” 7
  • Does Computer Research Associates’ “Incentivizing Quality and Impact (Value)” Not Make Key Significant Point?
  • Does “The Code Issue” Not Explain the Essential Pros and Cons of Both Codes and Coders in Plain Language? 8
  • “Controlled Experimentation (Is the) … Only Way to Establish Cause and Effect (, Causation and Causality).” 9
  • “Turning Uncertainty into Breakthrough Opportunities” Is the Subtitle of This Insightful and Inspiring Book. 10
  • The “Great Thing … about Data Mining/Data Science Community (Is) … Machine Learners, Statisticians, … .” 11
  • “Statisticians David J. Hand (2nd) and John Elder (5th) within ‘Videolectures’, in Their Number of Viewings.” 12
  • “Hiring Data Scientists”: "Statistics, Machine Learning &/or Data Mining” 13
  • Drucker: “Innovation in Any … Area Tends to Originate Outside the Area.” (Drucker Institute MONDAY*, 7/6/15)
  • Big Data Is a “Major Engineering and Mathematical Challenge”, and So Is a Balanced Analysis in Data Science. 14
  • Stanford University Has Institute for Computational and Mathematical Engineering to Do Its Data Engineering.
  • “Do Not Know: How Algorithms Perform Quantitatively, Good Parameter Settings, Processes Underlying Data,

Algorithm (Re)Evaluation/Comparison”, “Matching Patterns to Reality”, “Data Problem”, and “Data Science”. 15

  • PhD Topics: “Pattern(s) in Evolving Data”, “Version Control under Uncertainty” and “Ranking with ER Graphs” 16
slide-3
SLIDE 3

Data Engineering with Statistics to Enhance Trust and Value

  • Is Computing Not: Deduction, Efficiency, Learning Some Potential Findings, Precision and Speed?
  • Is Statistics Not: Inference, Effectiveness, Likely Conclusions from Findings, Accuracy and Quality?
  • Why Not Data Engineer Our Data Science to Balance Its Deduction by Inference, Efficiency by

Effectiveness, Findings by Likely Conclusions, Precision by Accuracy, and Speed by Quality?

  • Did Symposia on the Interface of Computing and Statistics Not Define Data Science to be the

Interface with Big Data and Data Science Conceived, Born and Even Matured in Them? 17 + Refs

  • Does Data Science Not Function with: Processes, Models, Algorithms, Data and Possibilities?
  • Does Data Engineering Not Enable: Products, Meanings, Actions, Descriptions and Probabilities?
  • Why Not Data Engineer Trust and Value by Enhancing Processes with Products, Models with

Meanings, Algorithms with Actions, Data with Descriptions, and Possibilities with Probabilities?

  • Founding Fathers Engineered Improbable Improvisation of United States (7/3/15 “Charlie Rose”).
  • If Mathematics Yields No Solutions, Data Engineering Iterates and Simulates to Far Smarter Data.
  • My 1960s Data Engineering: Iterative Weighted Least Squares (1st Regression Mixed Model and

1st Hierarchical Model), (1st Text) Relationship Analytics, Experimental and Simulation Analysis, and Model to Guide Negotiators of $1B Incentive Prime Contract to Put Man on Moon. 18 + Refs

  • My Data Engineering: 2012 INFORMS Conference on Business Analytics and Operations Research.
slide-4
SLIDE 4

  • Trevor Hastie, Rob Tibshirani and Jerry Friedman, The

Elements of Statistical Learning: Data Mining, Inference, and Prediction 19

  • Leo Breiman, “Two Cultures of Statistical Modeling” 20
  • William Cleveland, “Data Science: An Action Plan for

Expanding the Technical Areas of the Field of Statistics”. 21

  • Padhraic Smyth and I Enriched Computing, Statistics,

and Bioinformatics (with Own Day) by Interface ’01. 22

  • Computer Scientists Who Data Engineer with Statistics

Include Tom Dietterich, Pedro Domingos, Usama Fayyad, Chandrika Kamath, Vipin Kumar, Zoran Obradovic, Peter Norvig, Padhraic Smyth and Mohammed Zaki.

  • Most Statisticians Who Data Engineer with Statistics

Include John Chambers, Dick De Veaux, Bill DuMouchel, Brad Efron, John Elder, Jim Goodnight, David Hand, Alan Izenman, Michael Jordan, Jon Kettenring, David Madigan, Richard Olshen, Art Owen, Daryl Pregibon, Stuart Russell, John Sall, Dan Steinberg, Hal Stern, Joe Verducci, Ed Wegman and Lee Wilkinson – I May Have Missed Some.

2001 Is a Data Engineering and Statistics Year, yet Computing Now Rules Market

“We Must Sit Loosely in the Saddle of the Data.” * “Vaguely Correct Is Better Than Precisely Wrong.” *

Gene Wiley Created 1976 Cartoon for Me.

John Tukey Was the Pioneer

  • f Data

Mining, Data Science, and Now Data Engineer- ing. With Harry Press, “Power Spectral Methods”, NATO Flight Test, 1956 Exploratory Data Analysis, 1977 * Heard by Me During 1958/59 Stanford Statistics Seminar “The Inevitable Collision Between Statistics and Computation”, 1963 IBM Scientific Computing Sym.

slide-5
SLIDE 5

Associate Algorithmic Prediction with Likely-Good Data Relationship

  • Leo Breiman Defined Both the Algorithmic Modeling (Computing) Culture and Data Modeling

(Statistics) Culture, for Predicting a Process (System) Output Vector Y from Its Input Vector X.

  • An Algorithmic Model Predicts Y, while a Data Model Predicts Y Based on Linear Relationship.
  • Charles Stein Observed to Me in 1960: “When There Are Two Things That You Know Are True

and They Contradict Each Other, You Are About to Learn Something.” – Challenge Is to Learn!

  • If We Desire to Engineer the Value of Y by Varying the Value of X in Addition to Predicting Y,

May Associating an Algorithmic Prediction with a Relationship of Y to X Not Be Significant?

  • Might Algorithmic and Data Predictions Becoming Sufficiently Close to Essentially Associate

an Algorithmic Prediction with Likely-Good Data Relationshipit Not Be Significant Progress?

  • My Complete System Analysis Iteratively Integrates Experimental and Simulation Analysis. 18
  • Why Not Data Engineer Predictions, Performing Specified Iterations to Associate Algorithmic

Prediction with a Sufficiently-Close Data Relationship, Which Is Likely Good, by Modifying It?

  • May It Not Likely Yield Integration of Deduction with Inference, Efficiency with Effectiveness,

Findings with Likely Conclusions, Precision with Accuracy, and Speed with Quality?

  • Others Are Challenged to Employ My Proposed Association and to Evaluate Their Results as

Well as to Document Their Situations, Problems, Improvements, Findings and Conclusions.

slide-6
SLIDE 6

Iterating Model Predictions: Data → Algorithmic → Data …

COMPARISON OF PROCESS DATA VERSUS PROCESS ALGORITHMIC MODEL PREDICTIONS PROCESS DATA PREDICTION OTHERS ARE CHALLENGED TO EMPLOY SUGGESTED MERGER, EVALUATE THE RESULTS AS WELL AS DOCUMENT THEIR SITUATIONS, PROBLEMS, IMPROVEMENTS, FINDINGS AND CONCLUSIONS

Process (System)

Adapting Complete Process (System) Analysis to Iteratively CombineData Model and Algorithmic Model Predictions

COMPARISON OF PROCESS DATA MODEL VERSUS PROCESS ALGORITHMIC MODEL PREICTIONS

OTHERS ARE CHALLENGED TO EMPLOY MY ASSOCIATION, EVALUATE ITS RESULTS AS WELL AS DOCUMENT THEIR SITUATIONS, PROBLEMS, IMPROVEMENTS, FINDINGS AND CONCLUSIONS.

Data Engineering with Statistics to Iteratively Associate the Algorithmic Prediction with Its Likely-Good Data Relationship

STEIN SHRINKAGE, SUPERVISED LEARNING, OR AN APPROPRIATE TECHNIQUE MAY BE USED TO MOVE ALGORITHMIC MODEL PREDICTION AND DATA MODEL PREDICTION EVER CLOSER TO EACH OTHER.

slide-7
SLIDE 7
  • 1. Andrew Hodges, Alan Turing: The Enigma, 1993 -- With a New Preface, 2014
  • 2. Joab Jackson, “Ask Watson or Siri: Artificial Intelligence Is as Elusive as Ever”, PC World, October 20, 2014
  • 3. Marina Krakovsky, “last byte”, Communications of the ACM 57:9, September 2014
  • 4. Tiffany Trader, “50 Computer Science Schools That Are Changing the World”, HPC wire, November 11, 2014
  • 5. Robert Wachter, The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age, 2015
  • 6. Nikhil Buduma, “Data Science 101: Preventing Overfitting in Neural Networks”, KDnuggets, April 22, 2015
  • 7. Kamil Bartocha, “The Inconvenient Truth About Data Science, KDnuggets, May 6, 2015
  • 8. Paul Ford and Josh Tyrangiel “The Issue of Code”, SPECIAL DOUBLE ISSUE, June 15 – June 28, 2015 | Bloomberg.com
  • 9. Thomas H. Davenport, big data @ work: Dispelling the Myths, Uncovering the Opportunities, 2014

10.Ram Charan, The Attacker’s ADVANTAGE: TURNING UNCERTAINTY INTO BREAKTHROUGH OPPORTUNITIES, 2015 11.Gregory Piatetsky, “Interview: Pedro Domingos, … KDD 2014 Data Mining/Data Science Innovation Award” KDnuggets, August 15, 2014 12.Grant Marshall, Most Viewed Data Mining Talks at Videolectures, KDnuggets, September 18, 2014 13.Bart Baesens, Richard Weber and Cristian Bravo, “Hiring Data Scientists: What to Look for”, KDnuggets, September 18, 2014 14.Gregory Piatetsky, “Berkeley Michael Jordan says Big Data Is Transformative, Not a Delusion”, KDnuggets, November 7, 2014 15.Albrecht Zimmerman, “The Data Problem in Data Mining”, SIGKDD explorations 16:2, December 2014 16.Anisoara Nica, et al., “New Research Directions in Knowledge Discovery and Allied Spheres”, SIGKDD explorations 16:2, December 2014 17.Arnold Goodman, “Evolution of Symposia on the Interface … Defines Data Science to be the Interface”, WIREs Comput Stat, 2014 18.Goodman A Data Modeling and Analysis for Users: A Guide to the Perplexed, AFIPS Conference Proceedings 41, 1972 19.Hastie T, Tibshirani R, Friedman J The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2001 20.Breiman L Two Cultures of Statistical Modeling. Stat Sci 16:3, 2001 21.Cleveland W Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. Int Stat Rev 69: 21-26, 2001 22.Edward J. Wegman, Amy Braverman, Arnold Goodman and Padhraic Smyth, Editors, Interface ‘01 – Computing Science and Statistics, 2002

slide-8
SLIDE 8

Are Not Accuracy, Bias, Convention, Dependability, Economy, Faith, Generosity, Habit, Inconvenience, … Driven by Minds and Hearts? “Be the Master of Your Will and the Slave

  • f Your Conscience.” (Chassidic)

Goethe: “Knowing Is Not Enough; We Must

  • Apply. Willing Is Not Enough; We Must Do.”
s

Own Page in National Institute of Medicine Healthcare Reports

ENGINEER COLLABORATION CHECKLIST

Fortune, Before 42, July 22, 2002 Time, Behind “Letters”, April 28, 2003 Communcations of the ACM 58:6, 46, June 2015

Why Do We Not Agree on Uncertainty, Thinking, Perspectives, Learning and Dependability, to Improve Accuracy of Findings and Conclusions?

ENGINEER TRUST AND VALUE

Data Engineer Our Data Science with Significant Statistics, to Enrich Success by Enhancing Trust and Value of Society

Arnold Goodman, Cofounder: Interface Symposia plus Statistical Analysis and Data Mining: The ASA Data Science Journal