statistical scientific programming
play

Statistical Scientific programming: Introduction challenges in - PowerPoint PPT Presentation

Statistical Scientific programming Olivia Quinet Statistical Scientific programming: Introduction challenges in converting R to C++ CluePoints Clinical Trials Statistical tests SMART package The R language Olivia Quinet A very short


  1. Some examples (I) Statistical Scientific programming 1 w = !is.na(d[[field]]); 2 ctr = factor(d$center[w]); Olivia Quinet 3 npat = unclass(table(ctr)); 4 v = d[w, field]; 5 y = rowsum(v, ctr); Introduction CluePoints Clinical Trials 1. Select the rows where the values of the column "field" are not missing Statistical tests SMART package 2. Get the values as factor of the column "center" for the selected rows, i.e. The R language A very short introduction to the list of centers R Some examples 3. Count the number of rows associated to the different centers, i.e. the R to C++? number of patients per center Scientific programming 4. Get the values for the column "field" for the selected rows challenges Testing 5. Sum by center the values from (4) Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  2. Some examples (I) Statistical Scientific programming 1 w = !is.na(d[[field]]); 2 ctr = factor(d$center[w]); Olivia Quinet 3 npat = unclass(table(ctr)); 4 v = d[w, field]; 5 y = rowsum(v, ctr); Introduction CluePoints Clinical Trials center xyz Statistical tests SMART package ctr01 The R language ctr02 1 A very short introduction to R Some examples ctr npat y ctr01 2 R to C++? ctr01 1 2 ctr03 3 Scientific ctr02 2 7 programming ctr05 4 challenges ctr03 5 10 Testing ctr02 ... Measure Software architecture ctr02 5 Data structure Smart pointers, Pimpl, Factories ctr02 Fail-fast/Fail-safe Numerical errors ... Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  3. Some examples (II) Statistical Scientific programming 1 xc = x - offset; 2 v = tapply(xc, ctr, mean, na.rm=T); Olivia Quinet 3 Sn = unclass(table(ctr)); 4 Sn2 = tapply(sid, ctr, function(i) sum(table(i)^2)); 5 sigma = sqrt(Sn*vc[3]^2 + Sn2*vc[2]^2 + Sn^2*vc[1]^2)/Sn; Introduction 6 p = pnorm(v, sd=sigma) CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  4. Some examples (II) Statistical Scientific programming 1 xc = x - offset; 2 v = tapply(xc, ctr, mean, na.rm=T); Olivia Quinet 3 Sn = unclass(table(ctr)); 4 Sn2 = tapply(sid, ctr, function(i) sum(table(i)^2)); 5 sigma = sqrt(Sn*vc[3]^2 + Sn2*vc[2]^2 + Sn^2*vc[1]^2)/Sn; Introduction 6 p = pnorm(v, sd=sigma) CluePoints Clinical Trials Statistical tests 1. Apply an offet to x SMART package The R language 2. Compute per center the mean of xc A very short introduction to R 3. Number of records per center Some examples R to C++? 4. Compute per center the sum of the squares of the number of values per Scientific programming patient challenges 5. Compute sigma Testing Measure Software architecture 6. Compute the p-values for each center based on a normal distribution Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  5. Some examples (II) Statistical Scientific programming 1 xc = x - offset; 2 v = tapply(xc, ctr, mean, na.rm=T); Olivia Quinet 3 Sn = unclass(table(ctr)); 4 Sn2 = tapply(sid, ctr, function(i) sum(table(i)^2)); 5 sigma = sqrt(Sn*vc[3]^2 + Sn2*vc[2]^2 + Sn^2*vc[1]^2)/Sn; Introduction 6 p = pnorm(v, sd=sigma) CluePoints Clinical Trials center subjid x Statistical tests SMART package ctr01 s01001 The R language ctr02 s02001 1 A very short introduction to R ctr Sn Sn2 sigma p ctr01 s01001 2 Some examples ctr01 1 1 0.37 0.21 ctr03 s03001 3 R to C++? ctr02 3 5 0.25 0.55 ctr05 s05001 4 Scientific ctr03 5 5 0.19 0.06 programming ctr02 s02002 ... challenges Testing ctr02 s02001 5 Measure ctr02 s02001 Software architecture Data structure ... Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  6. Some examples (III) Statistical Scientific programming 1 dd = duplicated(d$subjid); 2 v = d[[field]] Olivia Quinet 3 w = dd & c(FALSE, v[1:(length(v)-1)]==1); 4 x10 = rowsum(1-v[w], ctr[w]); 5 N10 = unclass(table(ctr[w])); Introduction 6 x10Max = rowsum(as.integer(!c(TRUE, w[1:(length(w)-1)])[w]), ctr[w]); CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  7. Some examples (III) Statistical Scientific programming 1 dd = duplicated(d$subjid); 2 v = d[[field]] Olivia Quinet 3 w = dd & c(FALSE, v[1:(length(v)-1)]==1); 4 x10 = rowsum(1-v[w], ctr[w]); 5 N10 = unclass(table(ctr[w])); Introduction 6 x10Max = rowsum(as.integer(!c(TRUE, w[1:(length(w)-1)])[w]), ctr[w]); CluePoints Clinical Trials Statistical tests 1. Create a boolean vector indicating if a subjid is duplicated or not SMART package The R language 2. Get the values of the column "field" A very short introduction to R 3. Do some wierd selection Some examples R to C++? 4. Get the number of transitions 1 → 0 per center for each patient Scientific programming 5. Get the number of potential transitions 1 → 0 per center challenges Testing 6. Get the maximum number of valid transitions 1 → 0 per center Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  8. Some examples (III) Statistical Scientific programming 1 dd = duplicated(d$subjid); 2 v = d[[field]] Olivia Quinet 3 w = dd & c(FALSE, v[1:(length(v)-1)]==1); 4 x10 = rowsum(1-v[w], ctr[w]); 5 N10 = unclass(table(ctr[w])); Introduction 6 x10Max = rowsum(as.integer(!c(TRUE, w[1:(length(w)-1)])[w]), ctr[w]); CluePoints Clinical Trials center subjid visit x Statistical tests SMART package ctr01 s01001 v01 1 The R language ctr01 s01001 v02 0 A very short introduction to ctr X10 N10 x10max R ctr01 s01001 v03 1 ctr01 2 3 3 Some examples ctr01 s01001 v04 0 ctr02 1 2 1 R to C++? ctr01 s01002 v01 1 ctr03 1 1 1 Scientific ... ctr01 s01002 v02 1 programming challenges ctr02 s02001 v03 0 Testing ctr02 s02002 v01 1 Measure ctr02 s02002 v02 1 Software architecture Data structure ctr02 s02002 v03 0 Smart pointers, Pimpl, Factories ctr03 s03001 v01 1 Fail-fast/Fail-safe ctr03 s03001 v02 0 Numerical errors Accumulators ctr03 s03001 v03 1 std::algorithms, boost, GSL, BLAS, LAPACK, . . . ... Conclusion Questions, Remarks?

  9. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  10. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints ◮ Hard: understanding what the researcher wanted to do Clinical Trials Statistical tests SMART package PRO Faster code The R language CON C++ and R codes can be very different A very short introduction to 1 line in R − → ± 30 lines in C++ R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  11. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints ◮ Hard: understanding what the researcher wanted to do Clinical Trials Statistical tests SMART package PRO Faster code The R language CON C++ and R codes can be very different A very short introduction to 1 line in R − → ± 30 lines in C++ R Some examples ◮ Hardest: changing the data structure R to C++? Scientific PRO Less ressource/faster code programming challenges CON C++ and R codes are even more different Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  12. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints ◮ Hard: understanding what the researcher wanted to do Clinical Trials Statistical tests SMART package PRO Faster code The R language CON C++ and R codes can be very different A very short introduction to 1 line in R − → ± 30 lines in C++ R Some examples ◮ Hardest: changing the data structure R to C++? Scientific PRO Less ressource/faster code programming challenges CON C++ and R codes are even more different Testing ◮ Recoding model fitting algorithms is a huge (tremendous) task. It’s easier Measure Software architecture to call the R function from the C++ code Data structure Smart pointers, Pimpl, Factories PRO Updates of the fitting model code Fail-fast/Fail-safe CON Added dependencies Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  13. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints ◮ Hard: understanding what the researcher wanted to do Clinical Trials Statistical tests SMART package PRO Faster code The R language CON C++ and R codes can be very different A very short introduction to 1 line in R − → ± 30 lines in C++ R Some examples ◮ Hardest: changing the data structure R to C++? Scientific PRO Less ressource/faster code programming challenges CON C++ and R codes are even more different Testing ◮ Recoding model fitting algorithms is a huge (tremendous) task. It’s easier Measure Software architecture to call the R function from the C++ code Data structure Smart pointers, Pimpl, Factories PRO Updates of the fitting model code Fail-fast/Fail-safe CON Added dependencies Numerical errors Accumulators std::algorithms, boost, GSL, ◮ Beware of Numerical (in)accuracy BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  14. How to translate R code to C++ code? Statistical Scientific programming ◮ Straightforward approach: Recode each R function in C++ Olivia Quinet PRO C++ and R codes are similar CON Too many combinations of parameters/structures Introduction CluePoints ◮ Hard: understanding what the researcher wanted to do Clinical Trials Statistical tests SMART package PRO Faster code The R language CON C++ and R codes can be very different A very short introduction to 1 line in R − → ± 30 lines in C++ R Some examples ◮ Hardest: changing the data structure R to C++? Scientific PRO Less ressource/faster code programming challenges CON C++ and R codes are even more different Testing ◮ Recoding model fitting algorithms is a huge (tremendous) task. It’s easier Measure Software architecture to call the R function from the C++ code Data structure Smart pointers, Pimpl, Factories PRO Updates of the fitting model code Fail-fast/Fail-safe CON Added dependencies Numerical errors Accumulators std::algorithms, boost, GSL, ◮ Beware of Numerical (in)accuracy BLAS, LAPACK, . . . Conclusion ◮ Testing and testing and testing (no data, invalid data, NaN, Inf, . . . ) Questions, Remarks?

  15. Scientific programming challenges Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  16. Scientific programming challenges Statistical Scientific programming ◮ Requirements include low response time and memory usage, minimizing Olivia Quinet numerical errors and error propagation. Introduction ◮ Testing CluePoints Clinical Trials ◮ Software architecture Statistical tests SMART package ◮ Data structure The R language A very short introduction to ◮ Fail-fast/Fail-safe idioms R Some examples ◮ Exceptions R to C++? ◮ RAII Scientific programming challenges ◮ Pimpl idiom and smart pointers Testing Measure ◮ Factory pattern Software architecture Data structure ◮ Iterator pattern and accumulators Smart pointers, Pimpl, Factories Fail-fast/Fail-safe ◮ std::algorithms, boost, GSL, BLAS, LAPACK, . . . Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  17. Testing Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Credit xkcd Questions, Remarks?

  18. Testing Statistical Scientific programming ◮ Framework Olivia Quinet ◮ Unit testing, Integration testing, . . . Introduction ◮ Test Driven Development CluePoints Clinical Trials ◮ Behavior Driven Development to replicate the documentation specification Statistical tests SMART package ◮ Continuous Integration The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  19. Testing Statistical Scientific programming ◮ Framework Olivia Quinet ◮ Unit testing, Integration testing, . . . Introduction ◮ Test Driven Development CluePoints Clinical Trials ◮ Behavior Driven Development to replicate the documentation specification Statistical tests SMART package ◮ Continuous Integration The R language A very short introduction to ◮ Each bug must be tested R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  20. Code for Testing Statistical Scientific programming ◮ If you cannot test your code, rewrite it Olivia Quinet ◮ If you cannot test your code easily, rewrite it Introduction ◮ If you cannot test your code independently, rewrite it CluePoints Clinical Trials ◮ . . . Statistical tests SMART package The R language Tools like clang static analyzer and gcov/lcov code coverage are a great help A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  21. Measure!!! Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Credit xkcd Conclusion Questions, Remarks?

  22. Measure!!! Statistical Scientific programming ◮ Select between different data structures Olivia Quinet ◮ Select between different algorithms Introduction ◮ Use generated data CluePoints Clinical Trials ◮ Use real data Statistical tests SMART package ◮ Use data of different sizes The R language A very short introduction to ◮ . . . R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  23. Measure!!! – an example Statistical Scientific Context: Originally, an algorithm has to be applied on vectors: f ( x , y ) programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  24. Measure!!! – an example Statistical Scientific Context: Originally, an algorithm has to be applied on vectors: f ( x , y ) programming Olivia Quinet Then only on some filtered elements: f ( x , y , w ) Introduction X Y w CluePoints Clinical Trials 42.5 100 true Statistical tests SMART package ... ... true The R language A very short introduction to ... ... false R Some examples ... ... true R to C++? Scientific ... ... false programming challenges ... ... true Testing Measure ... ... true Software architecture Data structure ... ... false Smart pointers, Pimpl, Factories Fail-fast/Fail-safe ... ... ... Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  25. Measure!!! – an example Statistical Scientific Context: Originally, an algorithm has to be applied on vectors: f ( x , y ) programming Olivia Quinet Then only on some filtered elements: f ( x , y , w ) ◮ Modify the algorithm to take into account only the filtered Introduction CluePoints vectors’elements: filter algo Clinical Trials Statistical tests ◮ Create pseudo vectors with the filtered elements: filter vector SMART package The R language ◮ Create new vectors with the filtered elements: copy vector A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  26. Measure!!! – an example Statistical Scientific Context: Originally, an algorithm has to be applied on vectors: f ( x , y ) programming Olivia Quinet Then only on some filtered elements: f ( x , y , w ) ◮ Modify the algorithm to take into account only the filtered Introduction CluePoints vectors’elements: filter algo Clinical Trials Statistical tests ◮ Create pseudo vectors with the filtered elements: filter vector SMART package The R language ◮ Create new vectors with the filtered elements: copy vector A very short introduction to R Some examples R to C++? Option Timing (s) Scientific N = 10 2 N = 10 4 N = 10 6 N = 10 8 programming challenges Testing 0.0003 0.008 0.9 100 filter algo Measure Software architecture filter vector 0.0003 0.006 0.8 98 Data structure 0.0006 0.015 4.6 / Smart pointers, Pimpl, copy vector Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  27. Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  28. Software architecture & Data structure Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  29. Software architecture & Data structure Statistical Scientific Important points to consider programming Olivia Quinet ◮ Input/output data structure? Introduction ◮ Computational units ? CluePoints ◮ Simple but not too simple! Clinical Trials Statistical tests SMART package ◮ Which doors are you closing? The R language ◮ Expressiveness A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  30. Software architecture & Data structure Statistical Scientific Important points to consider programming Olivia Quinet ◮ Input/output data structure? Introduction ◮ Computational units ? CluePoints ◮ Simple but not too simple! Clinical Trials Statistical tests SMART package ◮ Which doors are you closing? The R language ◮ Expressiveness A very short introduction to R Some examples R to C++? Scientific programming challenges For this project Testing Measure ◮ Data is organized in datasets, i.e. tables in which each column represents a Software architecture Data structure particular variable or key variable, and each row corresponds to a given Smart pointers, Pimpl, Factories record. There may also be missing values. Fail-fast/Fail-safe Numerical errors ◮ Statistical tests are the computational units . Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  31. Levels of abstraction Statistical Scientific programming ◮ The most important good pratice Olivia Quinet ◮ Divide and conquer Introduction ◮ Top down design CluePoints Clinical Trials ◮ Bottom up design Statistical tests SMART package ◮ Separation of concerns The R language A very short introduction to ◮ Modularity : low coupling ← R → high cohesion Some examples ◮ Design review R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  32. Levels of abstraction – mathematical formula Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  33. Levels of abstraction – mathematical formula Statistical Scientific programming ◮ Very tempting to code one mathematical formula into one function. Olivia Quinet ◮ Decompose the formula into meaningful steps, e.g. numerator, Introduction denominator, partial sums, . . . CluePoints Clinical Trials ◮ Transform the function into a class Statistical tests SMART package ◮ Transform each step into a struct The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  34. Abstraction levels – Example Statistical Scientific Sample variance – Standard formula programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to N 1 R � s 2 x ) 2 N = ( x k − ¯ Some examples N − 1 R to C++? k =1 Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  35. Abstraction levels – Example Statistical Scientific Sample variance – Standard formula programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package N 1 � The R language s 2 ) 2 N = ( x k − ¯ × x A very short introduction to N − 1 ���� R k =1 Mean Some examples � �� � � �� � Fraction R to C++? Square of difference Scientific � �� � programming Sum challenges � �� � Testing Multiplication Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  36. Abstraction levels – Example Statistical Scientific programming 1 namespace MATH _ INTERNAL { template<typename T=double> 2 Olivia Quinet struct sample _ variance { 3 T s2{std::numeric _ limits<T>::quiet _ NaN()}; 4 5 Introduction template<typename Container> 6 CluePoints sample _ variance(const Container& X) { if(X.size()>1) s2 = frac(X)*sum(X, mean(X)); } 7 Clinical Trials 8 Statistical tests operator T() const { return s2; } 9 SMART package 10 11 template<typename Container> The R language 12 static T frac(const Container& X) { return ONE/(X.size()-ONE); } A very short introduction to 13 R 14 template<typename Container> Some examples 15 static T mean(const Container& X) { return std::accumulate(X.begin(), X.end(), ZERO)/X.size(); } R to C++? 16 17 struct square _ of _ difference { Scientific 18 const T xbar; programming 19 square _ of _ difference(const T mean) : xbar(mean) {} challenges 20 T operator()(const T x) const { return (x-xbar)*(x-xbar); } Testing 21 }; Measure 22 23 template<typename Container> Software architecture 24 static T sum(const Container& X, const T xbar) { Data structure 25 const square _ of _ difference d(xbar); Smart pointers, Pimpl, Factories 26 return std::accumulate(X.begin(), X.end(), ZERO, [d](const T s, const T x) { return s + d(x); }); Fail-fast/Fail-safe 27 } Numerical errors 28 }; Accumulators 29 } std::algorithms, boost, GSL, 30 template<typename Container> BLAS, LAPACK, . . . 31 inline typename Container::value _ type sample _ variance(const Container& X) 32 { Conclusion 33 return MATH _ INTERNAL::sample _ variance<typename Container::value _ type>(X); Questions, 34 } Remarks?

  37. Data structure Statistical Scientific programming ◮ Performance requires well thought data structure Olivia Quinet ◮ Cache usage Introduction ◮ Prefetching CluePoints Clinical Trials ◮ Lazy evaluation Statistical tests SMART package ◮ Sparse representation The R language A very short introduction to ◮ . . . R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  38. Data structure – Example Statistical Scientific Duplicate patients: comparing patient’s fingerprints programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  39. Data structure – Example Statistical Scientific programming ◮ N numerical vectors of the same length M Olivia Quinet Typical cases: N = 1000 − 40000 and M = 20 − 20000 Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  40. Data structure – Example Statistical Scientific programming ◮ N numerical vectors of the same length M Olivia Quinet Typical cases: N = 1000 − 40000 and M = 20 − 20000 Introduction ◮ N × ( N − 1) / 2 scalar products CluePoints Clinical Trials Statistical tests M SMART package � X · Y = X i × Y j The R language A very short introduction to i R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  41. Data structure – Example Statistical Scientific programming ◮ N numerical vectors of the same length M Olivia Quinet Typical cases: N = 1000 − 40000 and M = 20 − 20000 Introduction ◮ N × ( N − 1) / 2 scalar products CluePoints Clinical Trials Statistical tests M SMART package � X · Y = X i × Y j The R language A very short introduction to i R Some examples ◮ Sparse vectors!!! R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  42. Data structure – Example Statistical Scientific programming ◮ N numerical vectors of the same length M Olivia Quinet Typical cases: N = 1000 − 40000 and M = 20 − 20000 Introduction ◮ N × ( N − 1) / 2 scalar products CluePoints Clinical Trials Statistical tests M SMART package � X · Y = X i × Y j The R language A very short introduction to i R Some examples ◮ Sparse vectors!!! R to C++? ◮ Performance results Scientific programming challenges N = 841 N = 35613 Testing Measure M = 1060 M = 14304 Software architecture Data structure Memory Timing Memory Timing Smart pointers, Pimpl, Factories Fail-fast/Fail-safe R ± 1m / Numerical errors Accumulators C++ (normal vectors) 57MB 0.68s 9.8GB 29m std::algorithms, boost, GSL, BLAS, LAPACK, . . . C++ (sparse vectors) 34MB 0.45s 6.6GB 38s Conclusion Questions, Remarks?

  43. Smart pointers, Pimpl, Factory pattern Statistical Scientific programming Olivia Quinet Introduction file.h file-imp.h/file.cpp CluePoints Clinical Trials Statistical tests SMART package Public Interface The R language +class impl; Private implementation A very short introduction to +std::shared_ptr<impl> m_impl; R +operation1() +private data Some examples +operation1() R to C++? Scientific programming Derived Public Interface challenges Testing implA implB Measure Software architecture +operation1() +operation1() Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  44. Smart pointers, Pimpl, Factory pattern Statistical Scientific Pointer to implementation or Private implementation programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  45. Smart pointers, Pimpl, Factory pattern Statistical Scientific Pointer to implementation or Private implementation programming Olivia Quinet Introduction CluePoints PROS Clinical Trials Statistical tests ◮ Separate interface from implementation SMART package The R language ◮ Decrease recompilation cycles A very short introduction to R ◮ Binary compatibility of shared libraries Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  46. Smart pointers, Pimpl, Factory pattern Statistical Scientific Pointer to implementation or Private implementation programming Olivia Quinet Introduction CluePoints PROS Clinical Trials Statistical tests ◮ Separate interface from implementation SMART package The R language ◮ Decrease recompilation cycles A very short introduction to R ◮ Binary compatibility of shared libraries Some examples R to C++? Scientific programming challenges CONS Testing Measure ◮ Increase in memory usage Software architecture Data structure ◮ Increase in maintenance effort Smart pointers, Pimpl, Factories Fail-fast/Fail-safe ◮ Performance loss Numerical errors Accumulators ◮ Doesn’t work well with templates std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  47. Smart pointers, Pimpl, Factory pattern Statistical Scientific programming ◮ std::unique_ptr or std::shared_ptr ? Olivia Quinet ◮ Mutable or non mutable objects? Introduction ◮ Access to the objects, how often? CluePoints Clinical Trials ◮ Multiple inheritance, virtual inheritance (diamond problem)? Statistical tests SMART package ◮ Template member functions, template classes? The R language A very short introduction to ◮ Objects in a coherent state! R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  48. Smart pointers, Pimpl, Factory pattern : Inheritance Statistical Scientific programming API Implementation Olivia Quinet Pimpl Introduction Pimpl::impl +class impl CluePoints +std::shared_ptr<impl> m_ptr #make_pimpl(Arg&& ...arg) Clinical Trials #_get_ptr<I>() Statistical tests #_valid_ptr() SMART package The R language A very short introduction to R Test Some examples Test::impl +class impl R to C++? +error() +m_error +results() +error() Scientific #add_error() programming +results(): =0 challenges Testing CountField Measure CountField::impl BetaBinomial +class impl Software architecture BetaBinomial::impl +class impl Data structure COUNTFIELD_INTERNAL Smart pointers, Pimpl, +model() +m_model Factories +build(...) Fail-fast/Fail-safe +results() impl_no_visits impl_with_visits Numerical errors +model() Accumulators #compute() +build(...) +build(...) std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  49. Smart pointers, Pimpl, Factory pattern: Diamond problem Statistical Scientific programming API Implementation Olivia Quinet Pimpl Pimpl::impl Introduction +class impl Test CluePoints Test::impl +class impl Clinical Trials Statistical tests CNR_Base_ByCenter SMART package CNR_Base_ByCenter::impl +class impl The R language CNR_Base_BySubjid A very short introduction to CNR_Base_BySubjid::impl +class impl R Some examples CNR_Mean_ByCenter R to C++? CNR_Mean_ByCenter::impl +class impl Scientific CNR_Mean_BySubjid programming CNR_Mean_BySubjid::impl +class impl challenges CNR_Sd_ByCenter CNR_Sd_ByCenter::impl Testing +class impl Measure Software architecture Data structure CNR_ByCenter Smart pointers, Pimpl, CNR_ByCenter::impl Factories +class impl Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, CNR_BySubjid BLAS, LAPACK, . . . CNR_BySubjid::impl +class impl Conclusion Questions, Remarks?

  50. Smart pointers, Pimpl, Factory pattern: Template members Statistical Scientific programming api.hpp Olivia Quinet 1 class MyPublic : public Pimpl { Introduction 2 public: class impl; 3 CluePoints 4 Clinical Trials MyPublic(...); 5 Statistical tests 6 SMART package template<typename Tp> Tp as() const; 7 The R language 8 }; A very short introduction to R Some examples api.cpp R to C++? Scientific MyPublic::MyPublic(...) : Pimpl(impl::build(...)) {} 1 programming 2 challenges template<> 3 double MyPublic::as() const { return _ valid _ ptr() ? _ get _ ptr<impl>()->asNumber() : NaN(); } 4 Testing 5 Measure template<> 6 Software architecture std::string MyPublic::as() const { return _ valid _ ptr() ? _ get _ ptr<impl>()->asString() : std::string(); } 7 Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  51. Pimpl: use Statistical Scientific programming 1 typedef std::vector<Test> TESTS; 2 Olivia Quinet 3 // Create the tests 4 TESTS tests; Introduction 5 tests.push _ back(BetaBinomial(...)); 6 tests.push _ back(CountField(...)); CluePoints 7 tests.push _ back(CountField(...)); Clinical Trials 8 tests.push _ back(CNR _ ByCenter(...)); Statistical tests 9 ... SMART package 10 The R language 11 // Export the results 12 json _ ostream os(...); A very short introduction to R 13 print _ results(os, tests); Some examples 14 15 R to C++? 16 void print _ results(json _ ostream& os, const TESTS& tests) 17 { Scientific 18 for(const auto& test: tests) { programming 19 ... challenges 20 } Testing 21 } Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  52. Pimpl: use Statistical Scientific programming smartAnalysis Olivia Quinet Task Task Introduction List of Tests List of Tasks List of Results Manager Dispatcher CluePoints Clinical Trials Statistical tests SMART package The R language Data Reporting factory A very short introduction to R Process Test 1 Data Reporting Result 1 Some examples factory Task 1 Test 2 Result 2 R to C++? Test 3 Task 2 Typed Result 3 Scientific factories Typed programming factories Test 4 Process challenges Testing Measure Multivar factory Multivar Software architecture Task N’ factory Test N Result N" Data structure Smart pointers, Pimpl, Process Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  53. Fail-fast/Fail-safe idioms Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  54. Fail-fast/Fail-safe idioms Statistical Scientific programming ◮ Check constraints on input/output Olivia Quinet 1 double foo(const std::vector<size _ t>& l, const std::vector<double>& x, const std::vector<bool>& w) 2 { Introduction 3 CP _ ASSERT(l.size() == x.size()); CluePoints 4 CP _ ASSERT(l.size() == w.size()); Clinical Trials 5 // Rest of the code Statistical tests 6 } SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  55. Fail-fast/Fail-safe idioms Statistical Scientific programming ◮ Check constraints on input/output Olivia Quinet 1 double foo(const std::vector<size _ t>& l, const std::vector<double>& x, const std::vector<bool>& w) 2 { Introduction 3 CP _ ASSERT(l.size() == x.size()); CluePoints 4 CP _ ASSERT(l.size() == w.size()); Clinical Trials 5 // Rest of the code Statistical tests 6 } SMART package The R language ◮ Fitting of statistical models might fails A very short introduction to R Some examples 1 try { R to C++? 2 fit = vglm("cbind(a,b)~1", 3 Named("family", family), Scientific 4 Named("data", dateframe), programming Named("control", control(Named("criterion", "coef"), 5 challenges Named("stepsize", 0.5)))); 6 7 } catch(std::exception& e) { Testing // Retry with other parameters 8 Measure 9 } Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  56. Fail-fast/Fail-safe idioms Statistical Scientific programming ◮ Check constraints on input/output Olivia Quinet 1 double foo(const std::vector<size _ t>& l, const std::vector<double>& x, const std::vector<bool>& w) 2 { Introduction 3 CP _ ASSERT(l.size() == x.size()); CluePoints 4 CP _ ASSERT(l.size() == w.size()); Clinical Trials 5 // Rest of the code Statistical tests 6 } SMART package The R language ◮ Fitting of statistical models might fails A very short introduction to R Some examples 1 try { R to C++? 2 fit = vglm("cbind(a,b)~1", 3 Named("family", family), Scientific 4 Named("data", dateframe), programming Named("control", control(Named("criterion", "coef"), 5 challenges Named("stepsize", 0.5)))); 6 7 } catch(std::exception& e) { Testing // Retry with other parameters 8 Measure 9 } Software architecture Data structure Smart pointers, Pimpl, ◮ Propagate the error message Factories Fail-fast/Fail-safe ◮ Rethrow the exception Numerical errors Accumulators ◮ Store the exception as an error message inside the object std::algorithms, boost, GSL, BLAS, LAPACK, . . . ◮ . . . Conclusion Questions, Remarks?

  57. Numerical instabilities Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Credit xkcd Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  58. Numerical instabilities Statistical Scientific Test on standard deviations programming Olivia Quinet ◮ P values computed from the integration of two functions: Introduction f 1 ( x ) = pchisq ( s / x 2 ; N , left . tail ) × dgamma ( x ; scale , shape ) CluePoints Clinical Trials Statistical tests SMART package The R language f 2 ( x ) = pchisq ( s / x 2 ; N , right . tail ) × dgamma ( x ; scale , shape ) A very short introduction to R Some examples f 1 ( x ) f 2 ( x ) R to C++? Scientific 1.0 1.0 programming challenges 0.8 0.8 Testing Measure 0.6 0.6 Software architecture Data structure 0.4 0.4 Smart pointers, Pimpl, Factories Fail-fast/Fail-safe 0.2 0.2 Numerical errors Accumulators 0.0 0.0 std::algorithms, boost, GSL, BLAS, LAPACK, . . . 0.0 0.5 1.0 1.5 2.0 0.0 0.5 1.0 1.5 2.0 Conclusion Questions, Remarks?

  59. Numerical instabilities Statistical Scientific calcPsd: test on standard deviations programming Olivia Quinet ◮ f 1 ( x ) is unstable in case shape < 1 Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? f 1 ( x ) Scientific programming challenges 2.0 Testing Measure 1.5 Software architecture Data structure Smart pointers, Pimpl, 1.0 Factories Fail-fast/Fail-safe Numerical errors 0.5 Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . 0.0 Conclusion 0.0 0.5 1.0 1.5 2.0 Questions, Remarks?

  60. Numerical instabilities Statistical Scientific calcPsd: test on standard deviations programming Olivia Quinet ◮ f 1 ( x ) is unstable in case shape < 1 ◮ f 1 ( x ) can be rewritten by using the integration by parts theorem Introduction � a � a CluePoints 0 udv = [ uv ] a 0 vdu 0 − Clinical Trials Statistical tests SMART package f 1 ′ ( x ) = 2 s x 3 × dchisq ( s / x 2 ; N ) × pgamma ( x ; scale , shape , left . tail ) The R language A very short introduction to R Some examples R to C++? f 1 ( x ); f 1 ′ ( x ) Scientific programming challenges 2.0 Testing Measure 1.5 Software architecture Data structure Smart pointers, Pimpl, 1.0 Factories Fail-fast/Fail-safe Numerical errors 0.5 Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . 0.0 Conclusion 0.0 0.5 1.0 1.5 2.0 Questions, Remarks?

  61. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Standard formula Olivia Quinet N 1 Introduction � s 2 x ) 2 N = ( x k − ¯ CluePoints N − 1 Clinical Trials k =1 Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  62. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Standard formula Olivia Quinet N 1 Introduction � s 2 x ) 2 N = ( x k − ¯ CluePoints N − 1 Clinical Trials k =1 Statistical tests SMART package The R language ◮ Can be implemented as a 2 pass algorithm, first the mean ¯ x , and the variance A very short introduction to s 2 afterwards. R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  63. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Standard formula Olivia Quinet N 1 Introduction � s 2 x ) 2 N = ( x k − ¯ CluePoints N − 1 Clinical Trials k =1 Statistical tests SMART package The R language ◮ Can be implemented as a 2 pass algorithm, first the mean ¯ x , and the variance A very short introduction to s 2 afterwards. R Some examples BUT the number of items N can be huge R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  64. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Standard formula Olivia Quinet N 1 Introduction � s 2 x ) 2 N = ( x k − ¯ CluePoints N − 1 Clinical Trials k =1 Statistical tests SMART package The R language ◮ Can be implemented as a 2 pass algorithm, first the mean ¯ x , and the variance A very short introduction to s 2 afterwards. R Some examples BUT the number of items N can be huge R to C++? ◮ Have a one pass algorithm Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  65. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Standard formula Olivia Quinet N 1 Introduction � s 2 x ) 2 N = ( x k − ¯ CluePoints N − 1 Clinical Trials k =1 Statistical tests SMART package The R language ◮ Can be implemented as a 2 pass algorithm, first the mean ¯ x , and the variance A very short introduction to s 2 afterwards. R Some examples BUT the number of items N can be huge R to C++? ◮ Have a one pass algorithm Scientific programming ◮ Compute the variance for increasing N to observe convergence. challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  66. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – One pass algorithm: Sum of squares method Olivia Quinet  � 2  � N N Introduction 1 � � s 2 x 2 CluePoints N =  N x k k −  N ( N − 1) Clinical Trials k =1 k =1 Statistical tests SMART package The R language One pass algorithm but the formula is unstable: A very short introduction to R ◮ float precision: for {10000 f , 10001 f , 10002 f }, Some examples the result is − 1 . 0666667 e+ 01 instead of 1. R to C++? Scientific ◮ double precision: for {100000000, 100000001, 100000002}, programming challenges the result is 0 instead of 1. Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  67. Minimizing numerical errors Statistical Scientific programming ◮ Sample variance – Iterative algorithm: Welford’s recursion method Olivia Quinet M k − 1 + x k − M k − 1 Introduction = M k CluePoints k Clinical Trials S k = S k − 1 + ( x k − M k − 1 )( x k − M k ) , Statistical tests SMART package The R language with M 0 = 0 and S 0 = 0, and then A very short introduction to R Some examples S N s 2 R to C++? N = N − 1 , Scientific programming challenges This stable algorithm with can be easily turned into an accumulator Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  68. Accumulators Statistical Scientific programming Olivia Quinet Start Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Loop Scientific programming challenges Testing Measure Accumulator Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . End Conclusion Questions, Remarks?

  69. Accumulators Statistical Scientific programming ◮ Separation of the operations on the elements from the iteration leads to Olivia Quinet smaller testable code. Introduction ◮ Statisticals tests involve operations (agregation, sum, average, variance, CluePoints Clinical Trials . . . ) on one or more variables based on one or more several key variables. Statistical tests E.g.: Preprocess involves taking the mean by visits or the sum by patients, SMART package The R language the count of non missing values per center, . . . A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  70. Accumulators as a OO pattern Statistical Scientific Mean of the elements of a vector programming Olivia Quinet ◮ without accumulator Introduction double sum{0}; 1 CluePoints for(const auto x: myvector) sum += x; 2 const double mean = sum / myvector.size(); Clinical Trials 3 Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  71. Accumulators as a OO pattern Statistical Scientific Mean of the elements of a vector programming Olivia Quinet ◮ without accumulator Introduction double sum{0}; 1 CluePoints for(const auto x: myvector) sum += x; 2 const double mean = sum / myvector.size(); Clinical Trials 3 Statistical tests SMART package ◮ with accumulator The R language A very short introduction to R using namespace boost::accumulators; 1 Some examples accumulator _ set<double, stats<tag::mean> > acc; 2 for(const auto x: myvector) acc(x); 3 R to C++? mean(acc); 4 Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  72. Accumulator implementation Statistical Scientific Sample variance – Welford’s recursion method programming Olivia Quinet M k − 1 + x k − M k − 1 = M k Introduction k CluePoints S k = S k − 1 + ( x k − M k − 1 )( x k − M k ) Clinical Trials Statistical tests SMART package S k s 2 = k − 1 , The R language k A very short introduction to R Some examples 1 template<typename T> class Variance { R to C++? 2 size _ t k; /**< Number of elements */ Scientific 3 T m; /**< 0th order moment, i.e. average */ programming 4 T s; /**< 1st order moment */ challenges 5 public: 6 Variance() : k(0), m(0), s(0) {} Testing 7 void operator()(const T x) { Measure 8 if(std::isnan(x)) return; Software architecture 9 ++k; Data structure 10 const T pm(m); Smart pointers, Pimpl, 11 m += (x-pm) * (ONE/k); Factories 12 s += (x-pm) * (x-m); Fail-fast/Fail-safe 13 } Numerical errors 14 T average() const noexcept { return m; } Accumulators 15 T s2() const noexcept { return k > 1 ? s / (k-1) : ZERO; } std::algorithms, boost, GSL, BLAS, LAPACK, . . . 16 }; Conclusion Questions, Remarks?

  73. std::algorithms, boost, GSL, BLAS, LAPACK, . . . Statistical Scientific programming Olivia Quinet Introduction CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Credit Big Bang Theory Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  74. std::algorithms, boost, GSL, BLAS, LAPACK, . . . Statistical Scientific programming ◮ <algorithm> Olivia Quinet ◮ boost Introduction ◮ Statistics, . . . CluePoints ◮ Logging facilities Clinical Trials Statistical tests ◮ System (command line arguments, . . . ) SMART package ◮ Thread, MPI, Serialization The R language ◮ . . . A very short introduction to R Some examples ◮ GNU Scientific Library R to C++? ◮ Optimization, minimization, . . . Scientific programming ◮ BLAS and LAPACK challenges ◮ Operations on matrices Testing Measure ◮ Numerical Recipes Software architecture Data structure ◮ Lots of algorithms Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  75. Implementing Beta-Binomial distribution with boost Statistical Scientific In probability theory and statistics, the beta-binomial distribution is a family of programming Olivia Quinet discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Introduction Bernoulli trials is either unknown or random. CluePoints Clinical Trials Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  76. Implementing Beta-Binomial distribution with boost Statistical Scientific programming � � n B ( k + α, n − k + β ) Olivia Quinet f ( k ; n , α, β ) = B ( α, β ) k Introduction CluePoints Clinical Trials ◮ k and n are positive integers with k < = n Statistical tests SMART package ◮ α and β are strictly positive numbers The R language ◮ Binomial coefficient A very short introduction to R � � Some examples n n ! Γ( n + 1) = k !( n − k )! = R to C++? k Γ( k + 1)Γ( n − k + 1) Scientific programming ◮ Beta function challenges B ( x , y ) = Γ( x ) + Γ( y ) Testing Measure Γ( x + y ) Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  77. Implementing Beta-Binomial distribution with boost Statistical Scientific programming � � n B ( k + α, n − k + β ) Olivia Quinet f ( k ; n , α, β ) = B ( α, β ) k Introduction CluePoints ◮ Numerically fine as long as α and β are small Clinical Trials Statistical tests ◮ When α and β are not small, B ( α, β ) tends toward zero. SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  78. Implementing Beta-Binomial distribution with boost Statistical Scientific programming � � n B ( k + α, n − k + β ) Olivia Quinet f ( k ; n , α, β ) = B ( α, β ) k Introduction CluePoints ◮ Numerically fine as long as α and β are small Clinical Trials Statistical tests ◮ When α and β are not small, B ( α, β ) tends toward zero. SMART package The R language Trick: Do the calculation in the log scale: A very short introduction to R � � � � Some examples n f ( k ; n , α, β ) = exp log + log B ( k + α, n − k + β ) − log B ( α, β ) R to C++? k Scientific programming � � challenges n Testing log = l_binomial_coefficient ( n , k ) Measure k Software architecture Data structure = lgamma ( n + 1) − lgamma ( k + 1) − lgamma ( − k + n + 1) Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators log B ( x , y ) = lbeta ( x , y ) std::algorithms, boost, GSL, BLAS, LAPACK, . . . = lgamma ( x ) + lgamma ( y ) − lgamma ( x + y ) Conclusion Questions, Remarks?

  79. Minimize with GSL Statistical Scientific programming ◮ GSL is a C library Olivia Quinet ◮ Use wrapper classes Introduction CluePoints ◮ Pointer to the minimizer created/owned by GSL Clinical Trials Statistical tests SMART package ◮ Pointer to the function definition struct The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  80. Minimizers – Example Statistical Scientific programming enum class M _ TYPE { 1 NO _ GRADIENT, 2 Olivia Quinet ... 3 }; 4 Introduction 5 template<M _ TYPE> struct M _ API; /**< Template for the C API */ 6 CluePoints template<M _ TYPE> class M _ FCT; /**< Template for defining the function to minimize */ 7 Clinical Trials template<M _ TYPE> class IMinimizer; /**< Template for the minimizer */ 8 Statistical tests SMART package The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  81. Minimizers – Example Statistical Scientific programming 1 /// Specialized template defining the function to minimize (no gradient) 2 template<> Olivia Quinet 3 class M _ FCT<M _ TYPE::NO _ GRADIENT> { 4 public: Introduction friend class IMinimizer<M _ TYPE::NO _ GRADIENT>; 5 6 CluePoints /// Virtual base class for the implementation of the function to minimize 7 Clinical Trials class Base : public boost::noncopyable { 8 Statistical tests public: 9 SMART package 10 virtual ~Base() {} The R language 11 virtual double evaluate(const double _ x) = 0; 12 }; A very short introduction to R 13 Some examples 14 typedef std::unique _ ptr<Base> PTR; /**< Type of the pointer to the instance of the function to minimize */ 15 typedef gsl _ function DEF; /**< Type for the definition */ R to C++? 16 17 M _ FCT(PTR _ fct, const NUMBER _ minimum, const NUMBER _ lower, const NUMBER _ upper) : ... Scientific 18 programming 19 double evaluate(const double _ x) { return m _ fct->evaluate( _ x); } challenges 20 double get _ lowest _ bound() const { return m _ f _ lower < m _ f _ upper ? m _ lower : m _ upper; } Testing 21 double get _ lowest _ f _ bound() const { return std::min(m _ f _ lower, m _ f _ upper); } Measure 22 Software architecture 23 private: Data structure 24 ... Smart pointers, Pimpl, 25 }; Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  82. Minimizers – Example Statistical Scientific programming 1 template<M _ TYPE _ Type> class IMinimizer : public boost::noncopyable { 2 public: Olivia Quinet typedef M _ FCT< _ Type> FCT; 3 4 Introduction explicit IMinimizer(const std::string& _ type, FCT& _ fct, const double _ epsabs, const double _ epsrel) { ... } 5 ~IMinimizer() { ... } 6 CluePoints 7 Clinical Trials bool iterate(const size _ t _ maxiter) { 8 Statistical tests if(m _ can _ minimize) { 9 SMART package 10 for(size _ t iter = 0; iter< _ maxiter && next(); ++iter) { The R language 11 if(converged()) return true; 12 } A very short introduction to R 13 } Some examples 14 return false; 15 } R to C++? 16 17 std::string name() const Scientific 18 { return m _ ptr ? M _ API< _ Type>::name(m _ ptr) : std::string(); } programming 19 bool next() challenges 20 { return m _ ptr && m _ can _ minimize ? M _ API< _ Type>::iterate(m _ ptr) == 0 : false; } Testing 21 bool converged() const Measure 22 { return m _ ptr && m _ can _ minimize ? M _ API< _ Type>::converged(m _ ptr, m _ epsabs, m _ epsrel) : false; } Software architecture 23 double x() const Data structure 24 { return m _ ptr ? (m _ can _ minimize ? M _ API< _ Type>::x _ minimum(m _ ptr) : m _ fct.get _ lowest _ bound()) : 0; } Smart pointers, Pimpl, 25 double y() const Factories 26 { return m _ ptr ? (m _ can _ minimize ? M _ API< _ Type>::f _ minimum(m _ ptr) : m _ fct.get _ lowest _ f _ bound()) : 0; } Fail-fast/Fail-safe 27 private: Numerical errors 28 ... Accumulators 29 }; std::algorithms, boost, GSL, 30 typedef IMinimizer<M _ TYPE::NO _ GRADIENT> MinimizerNoGradient; BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  83. Minimizers – Example Statistical Scientific programming 1 template<M _ TYPE> struct M _ API; /// Generic template holding the API 2 Olivia Quinet 3 template<> 4 struct M _ API<M _ TYPE::NO _ GRADIENT> { Introduction typedef gsl _ min _ fminimizer* PTR; /**< Type of the pointer to the minimizer */ 5 typedef gsl _ function* DEF; /**< Type of the pointer to the definition */ 6 CluePoints 7 Clinical Trials static PTR alloc(const STRING& _ type) { return gsl _ min _ fminimizer _ alloc(type( _ type)); } 8 Statistical tests static void free(PTR _ p) { gsl _ min _ fminimizer _ free( _ p); } 9 SMART package 10 The R language 11 static const gsl _ min _ fminimizer _ type* type(const std::string& _ type); 12 static std::string name(PTR _ p) { return gsl _ min _ fminimizer _ name( _ p); } A very short introduction to R 13 Some examples 14 static bool set(PTR _ p, DEF _ fct, const double _ minimum, const double _ lower, const double _ upper) 15 { return gsl _ min _ fminimizer _ set( _ p, _ fct, _ minimum, _ lower, _ upper) != GSL _ EINVAL; } R to C++? 16 static bool set(PTR _ p, DEF _ fct, const double _ minimum, const double _ fminimum, const double _ lower, const double _ flower, const double _ upper, const double _ fupper) Scientific 17 { return gsl _ min _ fminimizer _ set _ with _ values( _ p, _ fct, _ minimum, _ fminimum, _ lower, _ flower, _ upper, _ fupper) programming != GSL _ EINVAL; } challenges 18 Testing 19 static int iterate(PTR _ p) { return gsl _ min _ fminimizer _ iterate( _ p); } Measure 20 static bool converged(PTR _ p, const double _ epsabs, const double _ epsrel) Software architecture 21 { return gsl _ min _ test _ interval(x _ lower( _ p), x _ upper( _ p), _ epsabs, _ epsrel) == GSL _ SUCCESS; } Data structure 22 Smart pointers, Pimpl, 23 static double x _ minimum(PTR _ p) { return gsl _ min _ fminimizer _ x _ minimum( _ p); } Factories 24 static double x _ upper(PTR _ p) { return gsl _ min _ fminimizer _ x _ upper( _ p); } Fail-fast/Fail-safe 25 static double x _ lower(PTR _ p) { return gsl _ min _ fminimizer _ x _ lower( _ p); } Numerical errors 26 static double f _ minimum(PTR _ p) { return gsl _ min _ fminimizer _ f _ minimum( _ p); } Accumulators 27 static double f _ upper(PTR _ p) { return gsl _ min _ fminimizer _ f _ upper( _ p); } std::algorithms, boost, GSL, 28 static double f _ lower(PTR _ p) { return gsl _ min _ fminimizer _ f _ lower( _ p); } BLAS, LAPACK, . . . 29 }; Conclusion Questions, Remarks?

  84. Minimizers – Example Statistical Scientific programming 1 class MyFctNoGradient : public gsl::MinimizerNoGradient::FCT::Base { 2 public: Olivia Quinet MyFctNoGradient(...) { ... } 3 4 Introduction double evaluate(const double _ x) override { ... } 5 6 }; CluePoints 7 Clinical Trials 8 double minimize _ my _ fct(...) Statistical tests 9 { SMART package 10 gsl::MinimizerNoGradient::FCT f(new MyFctNoGradient(...), .5*(low+hi), low, hi); The R language 11 gsl::MinimizerNoGradient minimizer("Brent", f, 0.1, 0.1); 12 minimizer.iterate(10); A very short introduction to R 13 return minimizer.x(); Some examples 14 } R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

  85. Conclusion Statistical Scientific programming ◮ Fast production code Olivia Quinet ◮ No task is impossible Introduction ◮ Seek expertise CluePoints Clinical Trials ◮ Testing: Don’t trust your code Statistical tests SMART package ◮ Have fun and keep learning The R language A very short introduction to R Some examples R to C++? Scientific programming challenges Testing Measure Software architecture Data structure Smart pointers, Pimpl, Factories Fail-fast/Fail-safe Numerical errors Accumulators std::algorithms, boost, GSL, BLAS, LAPACK, . . . Conclusion Questions, Remarks?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend