 
              Mathematical Models as Research Data — why do need precise and well-written information about mathematical models and what can we do Michael Kohlhase Professur für Wissensrepräsentation und -verarbeitung Informatik, FAU Erlangen-Nürnberg http://kwarc.info 13. August 2018, Math Models and Math Software as Research Data Kohlhase: Math Models as Research Data 1 13. 8. 2018; M3SRD
1 Introduction Kohlhase: Math Models as Research Data 1 13. 8. 2018; M3SRD
Mathematical Modeling and Simulation ◮ Definition 1.1. Mathematical Modeling and Simulation (MMS) as a research method 1. fix an object and properties of interest (e.g. electron distribution in an electronic device) 2. determine the quantities and physical laws involved (e.g. the electrostatic potential and the Poisson Equation) 3. solve equations symbolically or numerically for given boundary conditions (complex software stacks) 4. publish 1./2./3. in a paper and 3. in a data store (software on GitHub/GitLab) Kohlhase: Math Models as Research Data 2 13. 8. 2018; M3SRD
Mathematical Modeling and Simulation ◮ Definition 1.1. Mathematical Modeling and Simulation (MMS) as a research method 1. fix an object and properties of interest (e.g. electron distribution in an electronic device) 2. determine the quantities and physical laws involved (e.g. the electrostatic potential and the Poisson Equation) 3. solve equations symbolically or numerically for given boundary conditions (complex software stacks) 4. publish 1./2./3. in a paper and 3. in a data store (software on GitHub/GitLab) MMS has been established as a primary scientific research method alongside the classical methods of experiment and theory. Kohlhase: Math Models as Research Data 2 13. 8. 2018; M3SRD
Mathematical Modeling and Simulation ◮ Definition 1.1. Mathematical Modeling and Simulation (MMS) as a research method 1. fix an object and properties of interest (e.g. electron distribution in an electronic device) 2. determine the quantities and physical laws involved (e.g. the electrostatic potential and the Poisson Equation) 3. solve equations symbolically or numerically for given boundary conditions (complex software stacks) 4. publish 1./2./3. in a paper and 3. in a data store (software on GitHub/GitLab) MMS has been established as a primary scientific research method alongside the classical methods of experiment and theory. ◮ Research in of MMS is characterized by mathematical models, scientific software, ◮ and numerical data from computations (input, output, parameters) (see [KT16]) Kohlhase: Math Models as Research Data 2 13. 8. 2018; M3SRD
Mathematical Modeling and Simulation ◮ Definition 1.1. Mathematical Modeling and Simulation (MMS) as a research method 1. fix an object and properties of interest (e.g. electron distribution in an electronic device) 2. determine the quantities and physical laws involved (e.g. the electrostatic potential and the Poisson Equation) 3. solve equations symbolically or numerically for given boundary conditions (complex software stacks) 4. publish 1./2./3. in a paper and 3. in a data store (software on GitHub/GitLab) MMS has been established as a primary scientific research method alongside the classical methods of experiment and theory. ◮ Research in of MMS is characterized by mathematical models, scientific software, ◮ and numerical data from computations (input, output, parameters) (see [KT16]) MMS faces a reproducibility crisis: success and proliferation puts strains on quality of models, software, and data. Kohlhase: Math Models as Research Data 2 13. 8. 2018; M3SRD
Mathematical Modeling and Simulation ◮ Definition 1.1. Mathematical Modeling and Simulation (MMS) as a research method 1. fix an object and properties of interest (e.g. electron distribution in an electronic device) 2. determine the quantities and physical laws involved (e.g. the electrostatic potential and the Poisson Equation) 3. solve equations symbolically or numerically for given boundary conditions (complex software stacks) 4. publish 1./2./3. in a paper and 3. in a data store (software on GitHub/GitLab) MMS has been established as a primary scientific research method alongside the classical methods of experiment and theory. ◮ Research in of MMS is characterized by mathematical models, scientific software, ◮ and numerical data from computations (input, output, parameters) (see [KT16]) MMS faces a reproducibility crisis: success and proliferation puts strains on quality of models, software, and data. ◮ Idea/Vision: Treat all three kinds of artefacts above as “Research Data”, ◮ represent all aspects explicit � establish machine support for Kohlhase: Math Models as Research Data 2 13. 8. 2018; M3SRD
MMS Reproducibility Crisis ◮ Models (are published in mathematica/physical papers) ◮ no standardization of naming, notation, constructors, . . . ? ◮ how are the formulae derived from the physical laws? ◮ what are the side conditions/constraints under which the model is accurate? ◮ MMS Software (can only be understood wrt. the underlying models) ◮ what are the underlying assumptions/constraints? ◮ what are the admissible boundary conditions? ◮ where does the iteration converge (well)? ◮ Data (needs specification to become information) ◮ which software/model/discretization was used? ◮ what quantity was measured in what unit? Kohlhase: Math Models as Research Data 3 13. 8. 2018; M3SRD
MMS Reproducibility Crisis ◮ Models (are published in mathematica/physical papers) ◮ no standardization of naming, notation, constructors, . . . ? ◮ how are the formulae derived from the physical laws? ◮ what are the side conditions/constraints under which the model is accurate? ◮ MMS Software (can only be understood wrt. the underlying models) ◮ what are the underlying assumptions/constraints? ◮ what are the admissible boundary conditions? ◮ where does the iteration converge (well)? ◮ Data (needs specification to become information) ◮ which software/model/discretization was used? ◮ what quantity was measured in what unit? ◮ Models are applied by people who did not develop them. ◮ Implicit knowledge about the constraints, domains of applicability are lost. ◮ Models are applied by people who did not develop them. ◮ Implicit knowledge about the constraints, domains of applicability are lost. Kohlhase: Math Models as Research Data 3 13. 8. 2018; M3SRD
State of the Art: FAIR Principles for the Data Aspect ◮ FAIR: data should be Findable, Accessible, Interoperable, and Reusable 1. To be Findable: F1 (meta)data are assigned a globally unique and eternally persistent identifier. F2 data are described with rich metadata. F3 (meta)data are registered or indexed in a searchable resource. F4 metadata specify the data identifier. 2. To be Accessible: A1 (meta)data are retrievable by their identifier using a standardized communications protocol. A1.1 the protocol is open, free, and universally implementable. A1.2 the protocol allows for an authentication and authorization procedure, where necessary. A2 metadata are accessible, even when the data are no longer available. 3. To be Interoperable: I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. I2 (meta)data use vocabularies that follow FAIR principles. I3 (meta)data include qualified references to other (meta)data. 4. To be Re-usable: R1 meta(data) have a plurality of accurate and relevant attributes. R1.1 (meta)data are released with a clear and accessible data usage license. R1.2 (meta)data are associated with their provenance. R1.3 (meta)data meet domain-relevant community standards. Ongoing. . . : how to implement these into repositories, protocols, and services? Kohlhase: Math Models as Research Data 4 13. 8. 2018; M3SRD
State of the Art in 5 Dimensions ◮ Overview: Current Systems/Formats for Models, MMS Software, and Data can ◮ be characterized along five dimensions: 1: Coverage 2: Descrip- 3: Formality 4: Computa- 5 Immediacy tion tional Domain- Continuous Informal Expressive Domain Se- Independent mantics Weak For- Semi- Built-in special Reformulation mulations Formal cases e.g. PDEs Domain- Discrete Formal Solvable Dedimensiona- Specific lized Equations � continuous trade-off between “Specification” (hh) and “Implementation” (ll) Kohlhase: Math Models as Research Data 5 13. 8. 2018; M3SRD
State of the Art in 5 Dimensions ◮ Overview: Current Systems/Formats for Models, MMS Software, and Data can be characterized along five dimensions: 1: Coverage 2: Descrip- 3: Formality 4: Computa- 5 Immediacy tion tional Domain- Continuous Informal Expressive Domain Se- Independent mantics Weak For- Semi- Built-in special Reformulation mulations Formal cases e.g. PDEs Domain- Discrete Formal Solvable Dedimensiona- Specific lized Equations � continuous trade-off between “Specification” (hh) and “Implementation” (ll) ◮ Classifying Some Systems: System 1 2 3 4 5 Publications hh hh hh hh hh Modelica m m ll ll m MatLab h ll ll ll ll FAIR @ MMS hh-m hh-m hh-m hh-m hh-m Kohlhase: Math Models as Research Data 5 13. 8. 2018; M3SRD
FAIR Principles for Models and Simulation Software? ◮ Current Systems/Formats and proposed FAIR-like treatment of Models and MMS Software Publications MaMoReD: FAIR @ MMS 5-dim score FEniCS ExaStencils PDE Modelica MatLab SBML domains Kohlhase: Math Models as Research Data 6 13. 8. 2018; M3SRD
Recommend
More recommend