How to make a petabyte ROOT file: proposal for managing data with columnar granularity
Jim Pivarski
Princeton University – DIANA
October 11, 2017
1 / 12
How to make a petabyte ROOT file: proposal for managing data with - - PowerPoint PPT Presentation
How to make a petabyte ROOT file: proposal for managing data with columnar granularity Jim Pivarski Princeton University DIANA October 11, 2017 1 / 12 Motivation: start by stating the obvious ROOTs selective reading is very important
Princeton University – DIANA
1 / 12
13116 ATLAS MC, 1717 ATLAS data, 2151 CMS MiniAOD, 675+ CMS NanoAOD, 560 LHCb 2 / 12
13116 ATLAS MC, 1717 ATLAS data, 2151 CMS MiniAOD, 675+ CMS NanoAOD, 560 LHCb 2 / 12
13116 ATLAS MC, 1717 ATLAS data, 2151 CMS MiniAOD, 675+ CMS NanoAOD, 560 LHCb 2 / 12
3 / 12
3 / 12
3 / 12
3 / 12
4 / 12
4 / 12
4 / 12
4 / 12
4 / 12
5 / 12
5 / 12
5 / 12
2Implementation dependent, but common. “WHERE” selection may be implemented with a stencil. 6 / 12
2Implementation dependent, but common. “WHERE” selection may be implemented with a stencil. 6 / 12
2Implementation dependent, but common. “WHERE” selection may be implemented with a stencil. 6 / 12
2Implementation dependent, but common. “WHERE” selection may be implemented with a stencil. 6 / 12
7 / 12
7 / 12
7 / 12
7 / 12
8 / 12
8 / 12
8 / 12
8 / 12
8 / 12
8 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
◮ Methods for deriving new TTrees from old TTrees:
◮ share common TBranch data by default; 9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
◮ Methods for deriving new TTrees from old TTrees:
◮ share common TBranch data by default; ◮ “soft skim” by stencil (event list/event bitmap), “hard skim” only if re-basketization
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
◮ Methods for deriving new TTrees from old TTrees:
◮ share common TBranch data by default; ◮ “soft skim” by stencil (event list/event bitmap), “hard skim” only if re-basketization
◮ save all provenance and use git-like versioning to determine if two branches are
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
◮ Methods for deriving new TTrees from old TTrees:
◮ share common TBranch data by default; ◮ “soft skim” by stencil (event list/event bitmap), “hard skim” only if re-basketization
◮ save all provenance and use git-like versioning to determine if two branches are
◮ No user-facing partition boundaries: huge dataset appears as one TTree.
9 / 12
◮ Subclass of TFile initializes itself by getting data from a “controlling” database
◮ Reference counts for objects referenced by TKeys (including TBaskets and user
◮ Bulk data, the contents of TKeys, are in a “warehouse” database (object store—
◮ REST APIs for flexibility; TBaskets fetched by HTTP GET, may be web-cached.
◮ Methods for deriving new TTrees from old TTrees:
◮ share common TBranch data by default; ◮ “soft skim” by stencil (event list/event bitmap), “hard skim” only if re-basketization
◮ save all provenance and use git-like versioning to determine if two branches are
◮ No user-facing partition boundaries: huge dataset appears as one TTree. ◮ Users work in shared TFile: home TDirectories; permissions managed by database.
9 / 12
10 / 12
10 / 12
◮ compute nodes use this same interface to communicate with storage;
10 / 12
◮ compute nodes use this same interface to communicate with storage; ◮ but a scheduler attempts to maximize shared cache locality on the compute nodes.
10 / 12
◮ compute nodes use this same interface to communicate with storage; ◮ but a scheduler attempts to maximize shared cache locality on the compute nodes.
10 / 12
auto file = TFile::Open("rootdb://data.cern/cms"); file->Get("home/username")->cd(); file->Get("derived_data")->Draw("x >> hist"); file->Get("hist")->Fit("gaus");
user's laptop compute nodes control db
cache
Get TBasket data, perform calculation, save to "hist" in db. Preferentially send jobs to compute nodes that have the TBaskets in cache...
dispatch warehouse db
HTTP HTTP REST REST Zookeeper 11 / 12
12 / 12
12 / 12
12 / 12
12 / 12
12 / 12