LZ4, BulkIO, and offset removal performance Jim Pivarski Princeton - - PowerPoint PPT Presentation

lz4 bulkio and offset removal performance
SMART_READER_LITE
LIVE PREVIEW

LZ4, BulkIO, and offset removal performance Jim Pivarski Princeton - - PowerPoint PPT Presentation

LZ4, BulkIO, and offset removal performance Jim Pivarski Princeton University DIANA October 11, 2017 1 / 15 Motivation for this study Three updates to ROOT I/O are aimed at speeding up or reducing file size for end-user analysis: new


slide-1
SLIDE 1

LZ4, BulkIO, and offset removal performance

Jim Pivarski

Princeton University – DIANA

October 11, 2017

1 / 15

slide-2
SLIDE 2

Motivation for this study

Three updates to ROOT I/O are aimed at speeding up or reducing file size for end-user analysis:

◮ new compression algorithm: LZ4 (speed) ◮ reading TBasket data directly into arrays: BulkIO (speed) ◮ removing offset data from TBranches that have a counter (size)

2 / 15

slide-3
SLIDE 3

Motivation for this study

Three updates to ROOT I/O are aimed at speeding up or reducing file size for end-user analysis:

◮ new compression algorithm: LZ4 (speed) ◮ reading TBasket data directly into arrays: BulkIO (speed) ◮ removing offset data from TBranches that have a counter (size)

Focus on CMS NanoAOD in particular because

◮ it is aimed at end-users (1–2 kB/event) ◮ it is broadly intended for 30–50% of analyses (not an individual user’s ntuple)

2 / 15

slide-4
SLIDE 4

Motivation for this study

Three updates to ROOT I/O are aimed at speeding up or reducing file size for end-user analysis:

◮ new compression algorithm: LZ4 (speed) ◮ reading TBasket data directly into arrays: BulkIO (speed) ◮ removing offset data from TBranches that have a counter (size)

Focus on CMS NanoAOD in particular because

◮ it is aimed at end-users (1–2 kB/event) ◮ it is broadly intended for 30–50% of analyses (not an individual user’s ntuple)

Also including studies of LHCb (thanks, Oksana!). No ATLAS files because I can’t generate new ones or TTree::CopyTree old ones.

2 / 15

slide-5
SLIDE 5

Parameters of the NanoAOD studies

◮ AWS instance with a fast SSD disk (i2.xlarge). ◮ No resource contention because I paid for exclusive access. ◮ “Writing” means a TTree::CopyTree with new TFile compression. ◮ “Reading” means filling a class made by MakeClass. ◮ “BulkIO” means filling arrays through GetEntriesSerialized. ◮ Always reading from warmed cache. ◮ Five repeated trials; standard deviations are small compared to trends.

3 / 15

slide-6
SLIDE 6

LZ4 doesn’t compress as well as ZLIB, LZMA

4 / 15

slide-7
SLIDE 7

. . . same for LHCb

5 / 15

slide-8
SLIDE 8

But it’s faster: levels 1–3 are as fast as writing uncompressed

6 / 15

slide-9
SLIDE 9

. . . same for LHCb

7 / 15

slide-10
SLIDE 10

More importantly: reading is as fast as uncompressed

8 / 15

slide-11
SLIDE 11

And BulkIO reading is super-fast: serious penalty for LZMA

9 / 15

slide-12
SLIDE 12

Speed vs. size trade-offs

write speed vs size read speed vs size BulkIO speed vs size

10 / 15

slide-13
SLIDE 13

Removing unnecessary offsets

TBranches for variable-sized data contain offsets indicating where each entry starts.

◮ This is unnecessary for branches with counters (e.g. "Muon.pt[nMuons]/F"). ◮ A fix is in progress (PR #1003) to optionally not write these offsets. ◮ May also write counts, instead of offsets, since repeated values might be more

compressible. My study pre-dated (inspired) this PR; I constructed a copy

  • f NanoAOD without offsets by putting all muon data into a

flat TTree, all jet data into a flat TTree, etc.

11 / 15

slide-14
SLIDE 14

After compression, this saves 8–18%

12 / 15

slide-15
SLIDE 15

And it closes the LZ4/LZMA gap to a factor of 1.5×

13 / 15

slide-16
SLIDE 16

And it closes the LZ4/LZMA gap to a factor of 1.5×

13 / 15

slide-17
SLIDE 17

Do offsets vs. counts matter? Yes for LZ4.

Synthetic test: I generated Poisson-random counts and integrated them to make offsets, then ZLIB and LZ4 compressed them.

14 / 15

slide-18
SLIDE 18

Conclusions

LZ4 is as fast as uncompressed data for traditional GetEntry jobs. BulkIO is an order of magnitude faster than GetEntry, especially with LZ4. Unnecessary offsets add ∼10% to file size; may be removed. Counts compress better than offsets, especially for LZ4.

15 / 15