LibreOffice Calc Spreadsheets on the GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Spreadsheets on the GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net “Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it, and you will find rest for your souls...” - Jeremiah 6:16

Overview ● LibreOffice ? ● A bit about: ● GPUs … ● Spreadsheets ● Internal re-factoring ● OpenCL optimisation ● new calc features ● XML / load performance ● Calc / GPU questions ? ● Questions ?

LibreOffice Project & Software • Open Source / Free Cumulative unique IP's for updates vs. time Software not counting any Linux / vendor versions • One million new unique IPs per week (that we can 60,000,000 track) 50,000,000 • Double the weekly growth one year ago. 40,000,000 • Tens of millions of users, and growing fast. 30,000,000 • Hundred+ contributing 20,000,000 coders each month • 2500+ commits last 10,000,000 month 0 • Around a thousand developers ( including QA, Translators, UX etc. http://www.libreoffice.org/

Advisory Board Members This slide's layout is a victim of our success here ... 4 / 41 Event Name | Your Name

Why use the GPU ?

APUs – GPU faster than CPU ● Tons of un-used Compute Units across your APU ● Double precision is un-reasonably slower ● And precision is non-negotiable for Numbers based spreadsheets IEE764 required. on a Kaveri 7850K ● Better power usage per flop. APU - & top-end discrete Graphics card. fp64 CPU flops GPU flops FirePro 7990 fp32 1 10 100 1000 10000 Flops : note the log scale ...

Developers behind the calc re-work: Kohei Yoshida: MDDS maintainer Jagan Lokanatha Heroic calc core re-factorer Kismat Singh Code Ninja etc. Markus Mohrhard Calc maintainer, Chart2 wrestler Unit tester par Excellence etc. Matus Kukan Data Streamer, G-builder, A large OpenCL team, Size optimizer .. Particularly I-Jui (Ray) Sung

Spreadsheet Geometry An early Spreadsheet C 3000 BC Excel 2003 Excel 2010 Aspect ratio: 8:1 64k x 256 10^6 x 16k Contents: Aspect: Aspect: Victory against 256:1 16:1 every land … who giveth all life The 'Broom forever … Handle' aspect 50% of ratio. spreadsheets used to make business decisions . Columnar data structures

Spreadsheet Core Data Storage

The joy of Object Orientation ScTable ScBaseCell ScDocument Broadcaster (8 bytes) ScColumn Text width (2 bytes) Cell type (1 byte) Script type (1 byte) ScValueCell ScFormulaCell ScStringCell ScEditCell ScNoteCell* 10 / 41 Event Name | Your Name

Abstraction of Cell Value Access ScBaseCell Usage (Before) ScDocument Undo / Redo RTF Filter Change Tracking Quattro Pro Filter Content Rendering HTML Filter Excel Filter (xls, xlsx) External Reference Document Iterators CSV Filter DIF Filter UNO API Layer Conditional Formatting SYLK Filter VBA API Layer Chart Data Provider DBF Filter ODF Filter Cell Validation CppUnit Test 11

Abstraction of Cell Value Access ScBaseCell Usage (After) Biggest calc core re-factor in a decade+ ScDocument Dis-infecting the horrible, long-term, inherited structural problems of Calc. Lots of new unit tests being created for the first time for Document Iterators the calc core. Moved to using new 'MDDS' data structures. 2x weeks with no compile ... 12

Before (ScBaseCell) Scattered ScTable pointer chasing walking cells ScBaseCell down a column ... ScDocument Broadcaster (8 bytes) ScColumn Text width (2 bytes) Cell type (1 byte) Script type (1 byte) ScValueCell ScFormulaCell ScStringCell ScEditCell ScNoteCell* 13 / 41 Event Name | Your Name

After (mdds::multi_type_vector) ScTable ScColumn svl::SharedString block ScDocument double block EditTextObject block ScFormulaCell block Broadcasters Cell notes Text widths Cell values Script types 14 / 41 Event Name | Your Name

Iterating over cells (old way) … loop down a column … and the inner loop: double nSum = 0.0; ScBaseCell* pCell = pCol >maItems[nColRow].pCell; ++nColRow; switch (pCell->GetCellType()) { case CELLTYPE_VALUE: nSum += ((ScValueCell*)pCell)->GetValue(); break; case CELLTYPE_FORMULA: … something worse ... case CELLTYPE_STRING: case CELLTYPE_EDIT: … case CELLTYPE_NOTE: … } 15 / 41 Event Name | Your Name

Iterating over cells (new way) double nSum = 0.0; for (size_t i = 0; i < nChunkLength; i++) nSum += pDoubleChunk[i]; ONO. from a vectoriser ... 16 / 41 Event Name | Your Name

Shared Formula

Before … Tokens ScTokenArray ScFormulaCell … RPN ... ScFormulaCell ScTokenArray ... ScFormulaCell ScTokenArray ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell 18 / 41 Event Name | Your Name

After ScFormulaCell ScFormulaCell ScFormulaCellGroup ScFormulaCell … Tokens ScTokenArray ScFormulaCell … RPN ScFormulaCell ScFormulaCell ScFormulaCell 19 / 41 Event Name | Your Name

Memory usage 400 372 Heap memory size (MB) 300 259 200 100 27 0 Shared formula on Empty document Shared formula off Test document used: http://kohei.us/wp-content/uploads/2013/08/shared-formula-memory-test.ods 20 / 41 Event Name | Your Name

Shared string re-work ● String comparisons were slow ● Also not tractable for a GPU ● Case-insensitive equality is a hard problem – ICU & heavy lifting. ● String comparisons a lot in functions, and Pivot Tables. ● Shared string storage is useful. ● So fix it ...

Concept svl::SharedStringPool svl::SharedString Original string pool svl::SharedString Upcased string pool svl::SharedString 22 / 41 Event Name | Your Name

String comparison (old way) 23 / 41 Event Name | Your Name

String comparison (new way) 24 / 41 Event Name | Your Name

OpenCL / calculation ...

Why OpenCL & HSA ... ● GPU and CPU optimisation … ● Why write custom SSE2/SSE3 etc. assembly detect arch, and select backend cross platforms. ● Instead get OpenCL (from APU vendor) to generate the best code ... ● Hetrogenous System Architecture rocks: ● An AMD64 like innovation: ● shared Virtual Memory Address space & pointers: ↔ GPU CPU. ● Avoid wasteful copies, fast dispatch ● Great OpenCL 2.0 support. ● Use the right Compute Unit for the job.

Auto-compile Formula → OpenCL #pragma OPENCL EXTENSION cl_khr_fp64: enable int isNan(double a) { return isnan(a); } double legalize(double a, double b) { return isNan(a)?b:a;} double tmp0_0_fsum(__global double *tmp0_0_0) { double tmp = 0; { int i; i = 0; tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); i = 1; Formulae compiled idly / on tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); entry in a thread … to hide i = 2; tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); latency. } // to scope the int i declaration return tmp; Kernel generation thanks } to: double tmp0_nop(__global double *tmp0_0_0) { double tmp = 0; int gid0 = get_global_id(0); tmp = tmp0_0_fsum(tmp0_0_0); return tmp; } __kernel void DynamicKernel_nop_fsum(__global double *result, __global double *tmp0_0_0) { int gid0 = get_global_id(0); result[gid0] = tmp0_nop(tmp0_0_0); }

__kernel void The same formula for a longer sum … tmp0_0_0_reduction(__global double* A, __global double *result, int arrayLength, int windowSize) Compiled from standard formula syntax { double tmp, current_result =0; int writePos = get_group_id(1); int lidx = get_local_id(0); double tmp0_0_fsum(__global double __local double shm_buf[256]; *tmp0_0_0) { int offset = 0; double tmp = 0; int end = windowSize; int gid0 = get_global_id(0); end = min(end, arrayLength); tmp = ((tmp0_0_0[gid0])+(tmp)); barrier(CLK_LOCAL_MEM_FENCE); return tmp; int loop = arrayLength/512 + 1; } for (int l=0; l<loop; l++) { double tmp0_nop(__global double tmp = 0; *tmp0_0_0) { int loopOffset = l*512; double tmp = 0; if((loopOffset + lidx + offset + 256) < end) { int gid0 = get_global_id(0); tmp = legalize(((A[loopOffset + lidx + offset])+ tmp = tmp0_0_fsum(tmp0_0_0); (tmp)), tmp); return tmp; tmp = legalize(((A[loopOffset + lidx + offset + } 256])+(tmp)), tmp); __kernel void } else if ((loopOffset + lidx + offset) < end) DynamicKernel_nop_fsum(__global double tmp = legalize(((A[loopOffset + lidx + offset])+ *result, (tmp)), tmp); shm_buf[lidx] = tmp; __global double *tmp0_0_0) barrier(CLK_LOCAL_MEM_FENCE); { for (int i = 128; i >0; i/=2) { int gid0 = get_global_id(0); if (lidx < i) result[gid0] = tmp0_nop(tmp0_0_0); shm_buf[lidx] = ((shm_buf[lidx])+ } (shm_buf[lidx + i])); barrier(CLK_LOCAL_MEM_FENCE); } if (lidx == 0) current_result =((current_result)+(shm_buf[0])); barrier(CLK_LOCAL_MEM_FENCE); } if (lidx == 0) result[writePos] = current_result; }

Performance numbers for sample sheets. GPU / OpenCL Software min_max_avg_r 30x → 500x faster for destination-workbook Shorter is better these samples vs. dates-worked the legacy software calculation stock-history on Kaveri. ground-water 1 10 100 1,000 10,000 100,000 Yet another log plot … milliseconds on the X axis ...

LibreOffice Calc Spreadsheets on the GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Spreadsheets on the GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it,

Handling Messages about LibreOffice Italo Vignoli 1 LibreOffice Bern 2014 Conference

LibreOffice Calc Now available on your GPU Michael Meeks <michael.meeks@collabora.com>

Life after Calc Core Change Kohei Yoshida <kohei.yoshida@collabora.com> T opics What

LibreOffice - Prsentation - Diaporama Prsentation de LibreOffice vous permet de crer

LibreOffice Asia Conference 2019 Tokyo Expanding LibreOffice Contribution to West Africa Agada

LibreOffice Theme Changer Customize Your LibreOffice Appearance in Easy Way Rania Amina FOSDEM -

LibreOffice configuration management tools, approaches & best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING & ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

Dealing with Journalists Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Media

LibreOffice Development Infrastructure State of The Union LibreOffice Conference, Aarhus 2015

Handling PR Announcements Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Our

Promoting LibreOffice in YOUR language Charles-H. Schulz 04.09.2014 1 LibreOffice Bern 2014

Calc/Cream OpenOffice spreadsheet front-end for constraint programming Naoyuki Tamura,

The (quantitative) history of LibreOffice Jesus M. Gonzalez-Barahona jgb@bitergia.com

Successful Stories: Migrating Libreoffice in Taiwan Sep. 24, 2015 Franklin Weng About Me An

Design Document Overview Client: Jon Mathews Advisor: Dr. Suraj Kothari Team Members Chaz

External Language Stored Procedures Framework for MySQL Antony T Curtis

Validation Labs with OpenStack Shuquan Huang, Intel IT Engineering Computing Weibo: @

Your Program as a Transpiler Applying Compiler Design to Everyday Programming About Me

http://help.arcgis.com/en/arcgisdesktop/10.0/pdf/whats_new_in_arcgis_10.pdf Cathy Cole NCDOT GIS

Co-simulation Design towards Cyber-Physical Robotic Applications Leveraging on FMI Standard and

Nonce-Disrespecting Adversaries: Practical Forgery Attacks on GCM in TLS Hanno Bck, Aaron

AEGIS A Fast Authenticated Encryption Algorithm Hongjun Wu, Bart Preneel Nanyang Technological

Sambuz

Useful Links

Newsletter

Mail Us

LibreOffice Calc Spreadsheets on the GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Spreadsheets on the GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it,

Handling Messages about LibreOffice Italo Vignoli 1 LibreOffice Bern 2014 Conference

LibreOffice Calc Now available on your GPU Michael Meeks &lt;michael.meeks@collabora.com&gt;

Life after Calc Core Change Kohei Yoshida &lt;kohei.yoshida@collabora.com&gt; T opics What

LibreOffice - Prsentation - Diaporama Prsentation de LibreOffice vous permet de crer

LibreOffice Asia Conference 2019 Tokyo Expanding LibreOffice Contribution to West Africa Agada

LibreOffice Theme Changer Customize Your LibreOffice Appearance in Easy Way Rania Amina FOSDEM -

LibreOffice configuration management tools, approaches &amp; best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING &amp; ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

Dealing with Journalists Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Media

LibreOffice Development Infrastructure State of The Union LibreOffice Conference, Aarhus 2015

Handling PR Announcements Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Our

Promoting LibreOffice in YOUR language Charles-H. Schulz 04.09.2014 1 LibreOffice Bern 2014

Calc/Cream OpenOffice spreadsheet front-end for constraint programming Naoyuki Tamura,

The (quantitative) history of LibreOffice Jesus M. Gonzalez-Barahona jgb@bitergia.com

Successful Stories: Migrating Libreoffice in Taiwan Sep. 24, 2015 Franklin Weng About Me An

Design Document Overview Client: Jon Mathews Advisor: Dr. Suraj Kothari Team Members Chaz

External Language Stored Procedures Framework for MySQL Antony T Curtis

Validation Labs with OpenStack Shuquan Huang, Intel IT Engineering Computing Weibo: @

Your Program as a Transpiler Applying Compiler Design to Everyday Programming About Me

http://help.arcgis.com/en/arcgisdesktop/10.0/pdf/whats_new_in_arcgis_10.pdf Cathy Cole NCDOT GIS

Co-simulation Design towards Cyber-Physical Robotic Applications Leveraging on FMI Standard and

Nonce-Disrespecting Adversaries: Practical Forgery Attacks on GCM in TLS Hanno Bck, Aaron

AEGIS A Fast Authenticated Encryption Algorithm Hongjun Wu, Bart Preneel Nanyang Technological

Sambuz

Useful Links

Newsletter

Mail Us

LibreOffice Calc Now available on your GPU Michael Meeks <michael.meeks@collabora.com>

Life after Calc Core Change Kohei Yoshida <kohei.yoshida@collabora.com> T opics What

LibreOffice configuration management tools, approaches & best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING & ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER