Benchmarks and Quality Evaluation of CAS ACA 2016 Kassel Germany - PowerPoint PPT Presentation

Benchmarks and Quality Evaluation of CAS ACA 2016 – Kassel – Germany Albert Heinle Symbolic Computation Group David R. Cheriton School of Computer Science University of Waterloo Canada 2016–08–01 1 / 16

Correct Benchmarking of CAS – Case Studies and Dangers Challenges and Vision for Benchmarking in Computer Algebra Conclusion 2 / 16

Correct Benchmarking of CAS – Case Studies and Dangers 3 / 16

Find the Problem – A Case Study I You read in a paper a sentence like the following: We presented a new implementation of algorithm X. Our timings show that we outperform the alternative programs when using examples we found in literature as input, and we observe that our program scales well by using randomly generated objects. What is/are potential problem/s? 4 / 16

Find the Problem – A Case Study I You read in a paper a sentence like the following: We presented a new implementation of algorithm X. Our timings show that we outperform the alternative programs when using examples we found in literature as input, and we observe that our program scales well by using randomly generated objects. What is/are potential problem/s? ◮ Are the scripts and outputs made available? Did the authors check if the outputs were correct for the random inputs? ◮ Did the authors run the other programs on their machine, or did they just take the timings from the other paper? ◮ Did the authors also check the scalability for the other programs? 4 / 16

Find the Problem – A Case Study II Consider the following Singular code: execute(read("singular_poly.txt")); // File Content: // ring R = 0,(x,y),dp; // ideal I = *large polynomial system*; timer = 1; int t = timer; ideal g = yourCommand(I); t = timer - t; print(g); print(t); What is/are potential problem/s? 5 / 16

Find the Problem – A Case Study II Consider the following Singular code: execute(read("singular_poly.txt")); // File Content: // ring R = 0,(x,y),dp; // ideal I = *large polynomial system*; timer = 1; int t = timer; ideal g = yourCommand(I); t = timer - t; print(g); print(t); What is/are potential problem/s? ◮ Singular sorts all input polynomials with respect to given monomial ordering. This may assist computations, but the sorting time is not taken into account. ◮ Singular is open source, hence we know how the timer works. What happens if we would use Maple in a similar way? 5 / 16

Find the Problem – A Case Study III Singular: | Maple: ===========================|=========================== ring R = 0,(x,y),lp; | with(Groebner): ideal I = x^2 + y^2, x + y;| F:=[x^2 + y^2, x + y]; print(groebner(I)); | print(Basis(F,plex(x,y))) What is/are potential problem/s? 6 / 16

Find the Problem – A Case Study III Singular: | Maple: ===========================|=========================== ring R = 0,(x,y),lp; | with(Groebner): ideal I = x^2 + y^2, x + y;| F:=[x^2 + y^2, x + y]; print(groebner(I)); | print(Basis(F,plex(x,y))) What is/are potential problem/s? ◮ Singular computes by default not a reduced Gr¨ obner basis, while Maple in its current version always does. 6 / 16

Summarizing the Dangers of the Case Studies ◮ Ad Case Study I: Loosing Transparency. ◮ Ad Case Study II: Overlooking crucial implementation details. ◮ Ad Case Study III: Different facets of certain computations are overlooked. The threat of all the above points becomes larger with the number of different implementations available. 7 / 16

Challenges and Vision for Benchmarking in Computer Algebra 8 / 16

What Makes Benchmarking for the Computer Algebra Community Difficult? ◮ Non-uniqueness of computation results. Sometimes checking results for “equality” is a difficult problem itself. This difficulty also transfers to checking the correctness of an output. ◮ Many sub-communities with their own sets of problems. ◮ Input formats for different computer algebra systems are differing a lot. 9 / 16

What We Should Not Do... Figure : Picture Taken from http://xkcd.com/927/ 10 / 16

SDEval for Benchmarking in Computer Algebra ◮ SDEval 12 is a benchmarking framework tailored for the computer algebra community. ◮ Create Benchmarks: Using entries from the Symbolic Data database, one can create executable code for several different computer algebra systems. ◮ Run Benchmarks: Independent from the creation part, it provides a feasible infrastructure to run, monitor and time computations, and there are interfaces for scripts to interpret the output. 1 http://wiki.symbolicdata.org/SDEval 2 https://www.youtube.com/watch?v=CctmrfisZso 11 / 16

A Call For Transparency: The SDEval Solution Together with papers, authors should make so-called taskfolders available. These look like the following. + TaskFolder | - runTasks.py //For Running the task | - taskInfo.xml //Saving the Task in XML Structure | - machinesettings.xml//The Machine Settings in XML form | + classes //All classes of the SDEval project | + casSources //Folder containing all executable files | | + SomeProblemInstance1 | | | + ComputerAlgebraSystem1 | | | | - executablefile.sdc //Executable code for CAS | | | | - template_sol.py //Script to analyze the output of the CAS | | | + ComputerAlgebraSystem2 | | | | - executablefile.sdc | | | + ... | | + SomeProblemInstance2 | | | + ... | | + ... Figure : Folder structure of a taskfolder 12 / 16

What We Could be Working Towards: StarExec ◮ StarExec 3 is a complete benchmarking infrastructure for the satisfiability community (SAT/SMT solvers). Funded with 1.85 million USD by the NSF. ◮ Different kinds of computations clearly structured and standardized by SMT-LIB . Figure : Image taken from http://smtlib.cs.uiowa.edu/logics.shtml 3 https://www.starexec.org/ 13 / 16

What We Could be Working Towards: StarExec (cntd.) ◮ Different from SDEval , StarExec also provides physical computation infrastructure to perform calculations and to run benchmarks (Used during conferences). ◮ StarExec does not provide the flexibility that we would need for computer algebra computations. However, we can learn a lot from their experience and maybe one day create a similar infrastructure for computer algebra. 14 / 16

Conclusion 15 / 16

What Do We Need, What Do We Have ◮ The computer algebra community needs to realize the need we have for correct, reproducible, and transparent benchmarking. ◮ Several databases, like Symbolic Data , are available from different communities. We need a way to have a central overview of all of them. ◮ With SDEval , we have a starting point for creating and running benchmarks, which can be refined in the future. ◮ At some point, we should also introduce a computational infrastructure ` a la StarExec . 16 / 16

Benchmarks and Quality Evaluation of CAS ACA 2016 Kassel Germany - PowerPoint PPT Presentation

Benchmarks and Quality Evaluation of CAS ACA 2016 Kassel Germany Albert Heinle Symbolic Computation Group David R. Cheriton School of Computer Science University of Waterloo Canada 20160801 1 / 16 Correct Benchmarking of CAS

CAS EXAMINATION CAS EXAMINATION PROCESS PROCESS Steve Armstrong Steve Armstrong Rajesh

CAS BUDGET PRESENTATION PBAC April 17, 2009 ABOUT CAS Fig. 1. CAS is the largest College in the

CAS BUDGET PRESENTATION PBAC April 22, 2010 ABOUT CAS CAS is the largest College in the UA

Benchmarks Online Testing Data District Benchmarks English/Language Arts and Math

The HPC Challenge Benchmarks and the PMaC project Certificates of relevance for benchmarks

CAS Questions and Answers University High School CAS Questions and Answers 2016-2017 IB

CAS@UCC IB2 Parent Presentation 2016 CAS vs OSSD CAS does not count hours. The IB just wants

An Introduction to the Card Acquiring Service Presentation Outline CAS Program History CAS

Ev Evaluation Benchmarks and Learning Criteria fo for Di Discou ourse-Aw Aware Sente ntence

NPFL103: Information Retrieval (5) Ranking, Complete search system, Evaluation, Benchmarks Pavel

BENCHMARKS TOPIC SUMMARY Scott Adams, Dilbert BENCHMARKS The Investment Process and how BM fits

Inside The RT Patch Talk: Steven Rostedt (Red Hat) Benchmarks : Darren V Hart (IBM) Inside

An Introduction to the Court of Arbitration for Sport Anti- Doping Division (CAS ADD) and Other

CAS@UCC Year 11 Parent Presentation 2017 What is CAS? Its everything you remember in life!

CAS antitrust notice The Casualty Actuarial Society (CAS) is committed to adhering strictly to

CAS EXAMINATION PROCESS 2011 TRANSITION Daniel Roth Rajesh Sahasrabuddhe Geoff Werner William

Relaxed Linear References for Lock-free Data Structures Elias Castegren , Tobias Wrigstad

Non-Volatile Memory Tia ianzheng Wang Justin Levandoski Paul Larson The making of

Minutes of October 2019 CAS Senate meeting 104 Gore October 21, 2019 Present: J.

Minutes for University of Delaware CAS Faculty Senate meeting March 18, 2019 104 Gore Present:

A Two-way Path between Formal and Informal Design of Embedded Systems Mingshuai Chen 1 , Anders P.

Next-Generation Secure Public-Key Infrastructures Pawe Szaachowski Network Security Group,

Leveraging Value Locality in Optimizing NAND Flash-based SSDs Aayush Gupta , Raghav Pisolkar,

Integration of N-tiers application Using CAS Single Sign On system with a webmail application,