LibreOffice Calc Now available on your GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Now available on your GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net “Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it, and you will find rest for your souls...” - Jeremiah 6:16

Overview ● A bit about: ● GPUs … ● Spreadsheets ● Internal re-factoring ● OpenCL optimisation ● new calc features ● XML / load performance ● Calc / GPU questions ? ● LibreOffice 4.2 : the FOSDEM release ... ● Questions ?

Why use the GPU ?

APUs – GPU faster than CPU 1 ● Tons of un-used Compute Units across your APU ● Sadly double precision is slower. ● And Precision is non-negotiable for spreadsheets IEE764 required. Numbers based on a Kaveri 7850K ● Better power usage per flop. APU - & top-end discrete Graphics card. fp64 CPU flops GPU flops FirePro 7990 fp32 1 10 100 1000 10000 Flops : note the log scale ... 1. for some ops: things GPU's were designed for, like LiteCoin mining ...

Developers behind the calc re-work: Kohei Yoshida: MDDS maintainer Jagan Lokanatha Heroic calc core re-factorer Kismat Singh Code Ninja etc. Markus Mohrhard Calc maintainer, Chart2 wrestler Unit tester par Excellence etc. Matus Kukan Data Streamer, G-builder, A large OpenCL team, Size optimizer .. Particularly I-Jui (Ray) Sung

Spreadsheet Geometry An early Spreadsheet C 3000 BC Excel 2003 Excel 2010 Aspect ratio: 8:1 64k x 256 10^6 x 16k Contents: Aspect: Aspect: Victory against 256:1 16:1 every land … who giveth all life The 'Broom forever … Handle' aspect 50% of ratio. spreadsheets used to make business decisions . Columnar data structures

Spreadsheet Core Data Storage

Before (ScBaseCell) ScTable ScBaseCell ScDocument Broadcaster (8 bytes) ScColumn Text width (2 bytes) Cell type (1 byte) Script type (1 byte) ScValueCell ScFormulaCell ScStringCell ScEditCell ScNoteCell* 8 / 61 Event Name | Your Name

Abstraction of Cell Value Access ScBaseCell Usage (Before) ScDocument Undo / Redo RTF Filter Change Tracking Quattro Pro Filter Content Rendering HTML Filter Excel Filter (xls, xlsx) External Reference Document Iterators CSV Filter DIF Filter UNO API Layer Conditional Formatting SYLK Filter VBA API Layer Chart Data Provider DBF Filter ODF Filter Cell Validation CppUnit Test 9

Abstraction of Cell Value Access ScBaseCell Usage (After) Biggest calc core re-factor in a decade+ ScDocument Dis-infecting the horrible, long-term, inherited structural problems of Calc. Lots of new unit tests being created for the first time for Document Iterators the calc core. Moved to using new 'MDDS' data structures. 2x weeks with no compile ... 10

Before (ScBaseCell) Scattered ScTable pointer chasing walking cells ScBaseCell down a column ... ScDocument Broadcaster (8 bytes) ScColumn Text width (2 bytes) Cell type (1 byte) Script type (1 byte) ScValueCell ScFormulaCell ScStringCell ScEditCell ScNoteCell* 11 / 61 Event Name | Your Name

After (mdds::multi_type_vector) ScTable ScColumn svl::SharedString block ScDocument double block EditTextObject block ScFormulaCell block Broadcasters Cell notes Text widths Cell values Script types 12 / 61 Event Name | Your Name

Iterating over cells (old way) … loop down a column … and the inner loop: double nSum = 0.0; ScBaseCell* pCell = pCol >maItems[nColRow].pCell; ++nColRow; switch (pCell->GetCellType()) { case CELLTYPE_VALUE: nSum += ((ScValueCell*)pCell)->GetValue(); break; case CELLTYPE_FORMULA: … something worse ... case CELLTYPE_STRING: case CELLTYPE_EDIT: … case CELLTYPE_NOTE: … } 13 / 61 Event Name | Your Name

Iterating over cells (new way) double nSum = 0.0; for (size_t i = 0; i < nChunkLength; i++) nSum += pDoubleChunk[i]; ONO. from a vectoriser ... 14 / 61 Event Name | Your Name

Shared Formula

Before … Tokens ScTokenArray ScFormulaCell … RPN ... ScFormulaCell ScTokenArray ... ScFormulaCell ScTokenArray ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell ScTokenArray ScFormulaCell 16 / 61 Event Name | Your Name

After ScFormulaCell ScFormulaCell ScFormulaCellGroup ScFormulaCell … Tokens ScTokenArray ScFormulaCell … RPN ScFormulaCell ScFormulaCell ScFormulaCell 17 / 61 Event Name | Your Name

Memory usage 400 372 Heap memory size (MB) 300 259 200 100 27 0 Shared formula on Empty document Shared formula off Test document used: http://kohei.us/wp-content/uploads/2013/08/shared-formula-memory-test.ods 18 / 61 Event Name | Your Name

Shared string re-work ● String comparisons were slow ● Also not tractable for a GPU ● Case-insensitive equality is a hard problem – ICU & heavy lifting. ● String comparisons a lot in functions, and Pivot Tables. ● Shared string storage is useful. ● So fix it ...

Concept svl::SharedStringPool svl::SharedString Original string pool svl::SharedString Upcased string pool svl::SharedString 20 / 61 Event Name | Your Name

String comparison (old way) 21 / 61 Event Name | Your Name

String comparison (new way) 22 / 61 Event Name | Your Name

OpenCL / calculation ...

Why OpenCL & HSA ... ● GPU and CPU optimisation … ● Why write custom SSE2/SSE3 etc. assembly detect arch, and select backend cross platforms. ● Instead get OpenCL (from APU vendor) to generate the best code ... ● Hetrogenous System Architecture rocks: ● An AMD64 like innovation: ● shared Virtual Memory Address space & pointers: ↔ GPU CPU. ● Avoid wasteful copies, fast dispatch ● Great OpenCL 2.0 support. ● Use the right Compute Unit for the job.

Auto-compile Formula → OpenCL #pragma OPENCL EXTENSION cl_khr_fp64: enable int isNan(double a) { return isnan(a); } double legalize(double a, double b) { return isNan(a)?b:a;} double tmp0_0_fsum(__global double *tmp0_0_0) { double tmp = 0; { int i; i = 0; tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); i = 1; Formulae compiled idly / on tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); entry in a thread … to hide i = 2; tmp = legalize(((tmp0_0_0[i])+(tmp)), tmp); latency. } // to scope the int i declaration return tmp; Kernel generation thanks } to: double tmp0_nop(__global double *tmp0_0_0) { double tmp = 0; int gid0 = get_global_id(0); tmp = tmp0_0_fsum(tmp0_0_0); return tmp; } __kernel void DynamicKernel_nop_fsum(__global double *result, __global double *tmp0_0_0) { int gid0 = get_global_id(0); result[gid0] = tmp0_nop(tmp0_0_0); }

__kernel void The same formula for a longer sum … tmp0_0_0_reduction(__global double* A, __global double *result, int arrayLength, int windowSize) Compiled from standard formula syntax { double tmp, current_result =0; int writePos = get_group_id(1); int lidx = get_local_id(0); double tmp0_0_fsum(__global double __local double shm_buf[256]; *tmp0_0_0) { int offset = 0; double tmp = 0; int end = windowSize; int gid0 = get_global_id(0); end = min(end, arrayLength); tmp = ((tmp0_0_0[gid0])+(tmp)); barrier(CLK_LOCAL_MEM_FENCE); return tmp; int loop = arrayLength/512 + 1; } for (int l=0; l<loop; l++) { double tmp0_nop(__global double tmp = 0; *tmp0_0_0) { int loopOffset = l*512; double tmp = 0; if((loopOffset + lidx + offset + 256) < end) { int gid0 = get_global_id(0); tmp = legalize(((A[loopOffset + lidx + offset])+ tmp = tmp0_0_fsum(tmp0_0_0); (tmp)), tmp); return tmp; tmp = legalize(((A[loopOffset + lidx + offset + } 256])+(tmp)), tmp); __kernel void } else if ((loopOffset + lidx + offset) < end) DynamicKernel_nop_fsum(__global double tmp = legalize(((A[loopOffset + lidx + offset])+ *result, (tmp)), tmp); shm_buf[lidx] = tmp; __global double *tmp0_0_0) barrier(CLK_LOCAL_MEM_FENCE); { for (int i = 128; i >0; i/=2) { int gid0 = get_global_id(0); if (lidx < i) result[gid0] = tmp0_nop(tmp0_0_0); shm_buf[lidx] = ((shm_buf[lidx])+ } (shm_buf[lidx + i])); barrier(CLK_LOCAL_MEM_FENCE); } if (lidx == 0) current_result =((current_result)+(shm_buf[0])); barrier(CLK_LOCAL_MEM_FENCE); } if (lidx == 0) result[writePos] = current_result; }

Performance numbers for sample sheets. GPU / OpenCL Software min_max_avg_r 30x → 500x faster for destination-workbook Shorter is better these samples vs. dates-worked the legacy software calculation stock-history on Kaveri. ground-water 1 10 100 1,000 10,000 100,000 Yet another log plot … milliseconds on the X axis ...

How that works in practise:

Enabling Custom Calculation ● Turn on OpenCL computation: Tools → Options

Enabling OpenCL goodness ● Auto-select the best OpenCL device via a micro-benchmark ● Or disable that and explicitly select a device. 30 / 61 Event Name | Your Name

Big data needs Document Load optimization

Parallelized Loading ... ● Desktop CPU cores are often idle. ● XML parsing: ● The ideal application of parallelism ● SAX parsers: “ S ucking ic A che e X perience” parsers – read, parse a tiny piece of XML & emit an event … punch that deep into the core of the APP logic, and return .. – Parse another tiny piece of XML. ● Better APIs and impl's needed: Tokenizing, Namespace handling etc. ● Luckily easy to retro-fit threading ... ● Dozens of performance wins in XFastParser.

LibreOffice Calc Now available on your GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Now available on your GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it,

Handling Messages about LibreOffice Italo Vignoli 1 LibreOffice Bern 2014 Conference

LibreOffice Calc Spreadsheets on the GPU Michael Meeks <michael.meeks@collabora.com>

Life after Calc Core Change Kohei Yoshida <kohei.yoshida@collabora.com> T opics What

LibreOffice - Prsentation - Diaporama Prsentation de LibreOffice vous permet de crer

LibreOffice Asia Conference 2019 Tokyo Expanding LibreOffice Contribution to West Africa Agada

LibreOffice Theme Changer Customize Your LibreOffice Appearance in Easy Way Rania Amina FOSDEM -

LibreOffice configuration management tools, approaches & best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING & ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

Dealing with Journalists Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Media

LibreOffice Development Infrastructure State of The Union LibreOffice Conference, Aarhus 2015

Handling PR Announcements Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Our

Promoting LibreOffice in YOUR language Charles-H. Schulz 04.09.2014 1 LibreOffice Bern 2014

Calc/Cream OpenOffice spreadsheet front-end for constraint programming Naoyuki Tamura,

The (quantitative) history of LibreOffice Jesus M. Gonzalez-Barahona jgb@bitergia.com

Successful Stories: Migrating Libreoffice in Taiwan Sep. 24, 2015 Franklin Weng About Me An

Understanding the Characteristics of Android Wear OS Renju Liu and Felix Xiaozhu Lin Purdue ECE

collaboration on social media @AngelaCorbalan May 2016 WWW.BETTERTHANCASH.ORG The Better Than

TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw

Malware Classification into Families based on File - Content and Characteristics KARAN BANSAL

The Anointed One Redeemer of Mind, Heart, and Soul The Anointed One Redeemer of Our Minds:

Motivation Current Scenario : Rising interest in vector space word embeddings and their use, given

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

2014 - FEDIOL contribution to palm sustainability 80% 54% 46% 20% non-certified palm oil used

LibreOffice Calc Now available on your GPU Michael Meeks - PowerPoint PPT Presentation

LibreOffice Calc Now available on your GPU Michael Meeks <michael.meeks@collabora.com> mmeeks, #libreoffice-dev, irc.freenode.net Stand at the crossroads and look; ask for the ancient paths, ask where the good way is, and walk in it,

Handling Messages about LibreOffice Italo Vignoli 1 LibreOffice Bern 2014 Conference

LibreOffice Calc Spreadsheets on the GPU Michael Meeks &lt;michael.meeks@collabora.com&gt;

Life after Calc Core Change Kohei Yoshida &lt;kohei.yoshida@collabora.com&gt; T opics What

LibreOffice - Prsentation - Diaporama Prsentation de LibreOffice vous permet de crer

LibreOffice Asia Conference 2019 Tokyo Expanding LibreOffice Contribution to West Africa Agada

LibreOffice Theme Changer Customize Your LibreOffice Appearance in Easy Way Rania Amina FOSDEM -

LibreOffice configuration management tools, approaches &amp; best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING &amp; ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER

Measuring LibreOffice Interoperability Dushyant Bhalgami LibreOffice Conference 2014, Bern

Dealing with Journalists Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Media

LibreOffice Development Infrastructure State of The Union LibreOffice Conference, Aarhus 2015

Handling PR Announcements Italo Vignoli 1 LibreOffice Bern 2014 Conference Presentation Our

Promoting LibreOffice in YOUR language Charles-H. Schulz 04.09.2014 1 LibreOffice Bern 2014

Calc/Cream OpenOffice spreadsheet front-end for constraint programming Naoyuki Tamura,

The (quantitative) history of LibreOffice Jesus M. Gonzalez-Barahona jgb@bitergia.com

Successful Stories: Migrating Libreoffice in Taiwan Sep. 24, 2015 Franklin Weng About Me An

Understanding the Characteristics of Android Wear OS Renju Liu and Felix Xiaozhu Lin Purdue ECE

collaboration on social media @AngelaCorbalan May 2016 WWW.BETTERTHANCASH.ORG The Better Than

TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw

Malware Classification into Families based on File - Content and Characteristics KARAN BANSAL

The Anointed One Redeemer of Mind, Heart, and Soul The Anointed One Redeemer of Our Minds:

Motivation Current Scenario : Rising interest in vector space word embeddings and their use, given

AUGMENTED REALITY A complete overview of what augmented reality is and how it will revolutionize

2014 - FEDIOL contribution to palm sustainability 80% 54% 46% 20% non-certified palm oil used

LibreOffice Calc Spreadsheets on the GPU Michael Meeks <michael.meeks@collabora.com>

Life after Calc Core Change Kohei Yoshida <kohei.yoshida@collabora.com> T opics What

LibreOffice configuration management tools, approaches & best practices Thorsten Behrens -

GPG4LIBRE: OPENPGP SIGNING & ENCRYPTION IN LIBREOFFICE LIBREOFFICE CONFERENCE ROME OCTOBER