how computers discover how computers discover
play

How Computers Discover How Computers Discover A Mini-Review of - PowerPoint PPT Presentation

laboratory Gerstner Discover how to discover best How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip Zelezn y CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics


  1. laboratory Gerstner “Discover how to discover best” How Computers Discover How Computers Discover A Mini-Review of Algorithmic Meta-Discovery Filip ˇ Zelezn´ y ˇ CVUT Prague, School of Electrical Engineering, Dept. of Cybernetics The Gerstner Laboratory for Intelligent Decision Making and Control � Filip ˇ Zelezn´ y 2005 c 1 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  2. laboratory Gerstner Introduction :: Traditional scientific discovery : a human forming a hypothesis explaining observations of some natural phenomena. :: Computer-based scientific discovery, usually employing machine learning algorithms. � Filip ˇ Zelezn´ y 2005 c 2 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  3. laboratory Gerstner Automated Discovery :: Computer programs constructing hypotheses from data − Machine Learning − Data Mining − Knowledge Discovery in Databases :: Highlight: the Robot-Scientist project (UK) − Robot develops predicate-logic hypotheses in functional genomics − Designs optimal experiments to validate hypotheses − Realizes the experiments physically − King et al, Nature vol. 427, 2004 � Filip ˇ Zelezn´ y 2005 c 3 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  4. laboratory Gerstner Meta-Discovery :: Viewing computer-based scientific discovery as an empirical phenomenon :: Inferring hypotheses thereabout. � Filip ˇ Zelezn´ y 2005 c 4 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  5. laboratory Gerstner Phase Transitions :: Originally: runtime statistics of problem solving algorithms on randomly generated problem instances. Example: propositional logic SATisfiability. 50,000 500,000 Soluble Insoluble avg. # backtracks # backtracks 25,000 250,000 0 0 1 5 10 1 5 10 #clauses / #variables #clauses / #variables Figure 1: The NP-complete logic satisfiability problem. Algorithm: Davis-Putnam search :: ← Under constrained (many solutions) vs. → Over constrained (small search) Hardest problems on the transition between. � Filip ˇ Zelezn´ y 2005 c 5 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  6. laboratory Gerstner Phase Transitions in Learning? :: “ Inductive Logic Programming ” (ILP): First-Order Logic representation of Data / Hypotheses. :: Example: Biochemistry . Predicting mutagenic activity by compound structure. :: Example Hypothesis active(A) ← atm(A, B, c, 10, C) ∧ atm(A, D, c, 10, C) ∧ bond(A, B, D, 1) :: Verifying the rule for given examples (chem. compounds) ≡ SAT problem :: Empirical studies Serra et al, IJCAI 01; Botta et al, JMLR 4:2003 : ILP systems tend to generate hypotheses in the Phase Transition region. � Filip ˇ Zelezn´ y 2005 c 6 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  7. laboratory Gerstner Heavy-Tailed Runtime Distributions :: What goes on in the PT region? Model runtime distributions. :: P(not achieving solution in time t ) − normal : decays exponentially with t − heavy-tailed decays by power-law (may have infinite moments, eg. mean) 1e−00 1e−02 1−F(x) (logscale) 1e−04 1e−06 1e+03 5e+03 2e+04 5e+04 2e+05 # backtracks ~ cpu time (logscale) � Filip ˇ Zelezn´ y 2005 c 7 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  8. laboratory Gerstner Heavy-Tailed Runtime Distributions :: HT Distribs: “Statistical Curiosity”, early 20th century: − V. Pareto: Income Distributions, − B. Mandelbrot: Fractal Phenomena in Nature :: Empirical finding Gomes et al, Jr Autom Reas 2001 Important combinatorial problems/algorithms exhibit heavy-tailed RTD . Surveyed randomized algorithms AND/OR problem instances :: In hypothesis learning: Zelezny et al, ILP 2002 Heavy-tailed RTD’s manifest themselves in ILP . − Not only a consequence of involved hypothesis checking (=SAT) − HT RTD also in terms of # of hypotheses searched � Filip ˇ Zelezn´ y 2005 c 8 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  9. laboratory Gerstner Restarted Randomized Search :: HT RTD: Intriguing consequences. − f ( t )∆ t 1 − F ( t ) prob finding a solution in the next ∆ t if not found until now = t . − Decreases with t . − The longer you search, the lower your chances... :: Makes sense to restart search every now and then ?! :: Indeed, − Non-Restarted search RT cdf F ( t ) : infinite mean, but F ( γ ) > 0 for some γ > 0 − Search restarted each time γ (“cut-off”) time achieved. RT cdf: F γ ( N ) = 1 − (1 − F ( γ )) N : exponential ⇒ finite mean � Filip ˇ Zelezn´ y 2005 c 9 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  10. laboratory Gerstner Restarted Randomized Search in ILP :: Expected runtime of ILP algorithm with restart cut-off time γ , to find hypothesis of given quality . Log-scale, orders of magnitude performance gains. cost (logscale) [1:200000] score [1:20] cutoff (logscale) [1:65536] :: Large empirical study ( Zelezny et al, ILP 2004 ): − 100-200 Condor Cluster PC’s – UW Madison − SGI Altix SuperComputer – CTU Prague � Filip ˇ Zelezn´ y 2005 c 10 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  11. laboratory Gerstner Occam’s razor: Empirical Assessment :: William of Ockham, 14 th century English logician. “Entities should not be multiplied beyond necessity.” :: Traditional Machine Learning interpretation “If several hypotheses explain data with roughly same accuracy, keep the simplest.” :: Reasons: 1. Evident: ease of human interpretation 2. Postulated: predictive ability (theory does not give a clue) :: Thanks to automated discovery, Reason 2 can be empirically tested. � Filip ˇ Zelezn´ y 2005 c 11 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  12. laboratory Gerstner Occam’s razor: Empirical Assessment :: Some seminal empirical studies ( Holte, Mach Learn 1993 ) apparently support the simplicity bias, but misinterpretation here. :: Detrimental effect on predictive accuracy due to − too many hypotheses tested − rather than too complex hypotheses tested Relation hyp space size / avg hyp complexity only incidental :: Domingos, Data Mining & Know Disc 1999 reviews empirical evidence againts Reason 2 for Occam’s razor. Successes of − Ensemble Learning (combining numerous complex hypotheses) − Support Vector Machines (transforming data to high dimensional spaces) − Excessive search leading to simple inaccurate hyps Quinlan et al, IJCAI 1995 � Filip ˇ Zelezn´ y 2005 c 12 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  13. laboratory Gerstner Computerized Meta-Learning :: So far: :: Now shifting to Meta-Learning : “Learn how to learn best” � Filip ˇ Zelezn´ y 2005 c 13 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  14. laboratory Gerstner Meta-Learning Achievements :: Traditional approaches: see Mach Learn spec issue Meta Learning, 54:2004 . Examples: − Meta-hypothesize on Which learning algorithm best for given data? − Predict range of parameters (eg. kernel-width for SVM’s) given meta-data. :: Unorthodox approaches: Maloberti, Sebag: Mach Learn 55(2):2004 − Detect position of problem w.r.t the Phase Transition region − Use to determine the best learning algorithm :: Other: Bensusan, ECML 1998 meta-learns how much pruning should be used. − Pruning ≈ simplifying hypotheses at some accuracy sacrifice − Occam’s razor motivated (title: “God does not always shave with Occam’s razor”) � Filip ˇ Zelezn´ y 2005 c 14 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

  15. laboratory Gerstner Speculations :: Given Meta-Learning is useful, would Meta-Meta-Learning be? :: And Meta − . . . Meta − learning? � �� � n × 1 2 gt 2 s = 1 1 2 gt 2 1 2 gt 2 s = s =1 2 gt 2 Simple rules s = 1 1 2 gt 2 s = 2 gt 2 s = 2 gt 2 s = 1 work best. 1 1 2 gt 2 s =1 2 gt 2 s =1 2 gt 2 Simple rules s = Simple rules 1 2 gt 2 s = 1 1 1 2 gt 2 2 gt 2 s =1 2 gt 2 2 gt 2 2 gt 2 Simple rules s =1 s = s = s = Simple rules 2 gt 2 work best. s = 1 1 2 gt 2 work best. s = 2 gt 2 2 gt 2 s = s =1 Simple rules work best. 1 1 work best. 2 gt 2 s = 2 gt 2 2 gt 2 s =1 s =1 Simple rules Simple rules 1 work best. ........... 2 gt 2 2 gt 2 2 gt 2 s = s = Simple rules ........... s =1 work best. work best. 2 gt 2 s = work best. etc. What if n infinite? (much like Lisp/Prolog meta-interpretation towers ) � Filip ˇ Zelezn´ y 2005 c 15 /17 Czech Institute of Technology (ˇ CVUT) in Prague / School of Electrical Engineering / Dept. of Cybernetics / The Gerstner Lab

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend