The detector-clocks service A case study in determining thread-safe - - PowerPoint PPT Presentation

the detector clocks service
SMART_READER_LITE
LIVE PREVIEW

The detector-clocks service A case study in determining thread-safe - - PowerPoint PPT Presentation

The detector-clocks service A case study in determining thread-safe service access patterns Kyle J. Knoepfel 17 December 2019 LArSoft coordination meeting Services The SciSoft team has been working toward making LArSoft code thread-safe.


slide-1
SLIDE 1

The detector-clocks service

A case study in determining thread-safe service access patterns

Kyle J. Knoepfel 17 December 2019 LArSoft coordination meeting

slide-2
SLIDE 2
  • The SciSoft team has been working toward making LArSoft code thread-safe.
  • Services are problematic due to widespread use of non-const mutable data.

– DetectorClocks and DetectorProperties suffer from this malady.

  • In this talk, I will present:

– A pattern that can be adopted for both services to make them thread-safe. – My work toward that end for the DetectorClocks service. – A proposal for adopting the pattern.

Services

12/17/19 Kyle J. Knoepfel | LArSoft coordination meeting 2

slide-3
SLIDE 3

Thread-unsafe approach

12/17/19

Job-level data Run-level data Event-level data

  • Monolithic data structures are often chosen for managing mutable data

corresponding to different processing granularities.

  • This is true for various LArSoft facilities (e.g.

DetectorClocks and DetectorProperties).

  • It is inherently thread-unsafe as it often relies on the

notion of “current”, which is ill-defined in multi-threaded environments.

Kyle J. Knoepfel | LArSoft coordination meeting 3

slide-4
SLIDE 4

Thread-unsafe approach

12/17/19

Job-level data Run-level data Event-level data Thread 1 Thread 2

Create service

Kyle J. Knoepfel | LArSoft coordination meeting 4

slide-5
SLIDE 5

Thread-unsafe approach

12/17/19

Job-level data Run-level data Event-level data Thread 1 Thread 2

Begin Run 1

Kyle J. Knoepfel | LArSoft coordination meeting 5

slide-6
SLIDE 6

Thread-unsafe approach

12/17/19

Job-level data Run-level data Event-level data Thread 1 Thread 2

Process Event 1

Kyle J. Knoepfel | LArSoft coordination meeting 6

slide-7
SLIDE 7

Thread-unsafe approach

12/17/19

Job-level data Run-level data Evednt-laevetla Thread 1 Thread 2

Process Event 1 Process Event 2

Data race

Kyle J. Knoepfel | LArSoft coordination meeting 7

slide-8
SLIDE 8

Thread-unsafe approach

12/17/19

Job-level data Run-level data Evednt-laevetla Thread 1 Thread 2

Process Event 1 Process Event 2

Data race

  • To solve this problem for the DetectorClocks provider/service, I have adopted

the “persistent data structure” approach.

– Data structures broken up according to the processing steps required. – In what follows, all boxes represent immutable objects.

Kyle J. Knoepfel | LArSoft coordination meeting 8

slide-9
SLIDE 9

Persistent data structure approach

12/17/19

Thread 1 Thread 2

Kyle J. Knoepfel | LArSoft coordination meeting 9

slide-10
SLIDE 10

Persistent data structure approach

12/17/19

Job-level data Thread 1 Thread 2

Create service

Kyle J. Knoepfel | LArSoft coordination meeting 10

slide-11
SLIDE 11

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Thread 1 Thread 2

Begin Run 1

Kyle J. Knoepfel | LArSoft coordination meeting 11

slide-12
SLIDE 12

Persistent data structure approach

12/17/19

Job-level data Run-level data Event-level data

uses creates uses creates

Thread 1 Thread 2

Process Event 1

Kyle J. Knoepfel | LArSoft coordination meeting 12

slide-13
SLIDE 13

Persistent data structure approach

12/17/19

Job-level data Run-level data Event-level data

uses creates

Event-level data

uses creates

Thread 1 Thread 2

Process Event 1 Process Event 2

Kyle J. Knoepfel | LArSoft coordination meeting 13

slide-14
SLIDE 14

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Event-level data

uses creates

Thread 1 Thread 2

Process Event 2 Finish Event 1

Kyle J. Knoepfel | LArSoft coordination meeting 14

slide-15
SLIDE 15

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Run-level data Event-level data

uses creates

Thread 1 Thread 2

Process Event 2 Begin Run 2

Kyle J. Knoepfel | LArSoft coordination meeting 15

slide-16
SLIDE 16

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Run-level data Event-level data Event-level data

uses creates creates uses

Thread 1 Thread 2

Process Event 2 Process Event 3

Kyle J. Knoepfel | LArSoft coordination meeting 16

slide-17
SLIDE 17

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Run-level data Event-level data Event-level data

uses creates creates uses

  • Why does this work?

– All objects are immutable. – Object construction/destruction happens on one thread. – Object of one processing level refers to the object directly above it (via pointer

  • r reference).

– Assuming data corresponding to each processing levels is small, extra

  • verhead is minimal wrt. thread-unsafe
  • ption.

Kyle J. Knoepfel | LArSoft coordination meeting 17

slide-18
SLIDE 18

Persistent data structure approach

12/17/19

Job-level data Run-level data

uses creates

Run-level data Event-level data Event-level data

uses creates creates uses

  • Why does this work?

– All objects are immutable. – Object construction/destruction happens on one thread. – Object of one processing level refers to the object directly above it (via pointer

  • r reference).

– Assuming data corresponding to each processing levels is small, extra

  • verhead is minimal wrt. thread-unsafe
  • ption.
  • Downsides to this approach

– May require caching of data across

  • threads. Not so much an issue for

DetectorClocks.

Kyle J. Knoepfel | LArSoft coordination meeting 18

slide-19
SLIDE 19

Example: Thread-unsafe code

12/17/19

class ClockService { public: ClockService(ParameterSet const& pset, ActivityRegistry& reg); string const& mode() const noexcept { return mode_; } RunNumber_t run() const noexcept { return run_; } Clock const* clock() const noexcept { return clock_.get(); } private: void prepareRun(Run const& r); void prepareEvent(Event const& e, ScheduleID); string const mode_; bool goodRun_{false}; // Updated per run unique_ptr<Clock const> clock_{nullptr}; // Updated per event };

Kyle J. Knoepfel | LArSoft coordination meeting 19

slide-20
SLIDE 20

Example: Thread-unsafe code

12/17/19

class ClockService { public: ClockService(ParameterSet const& pset, ActivityRegistry& reg); string const& mode() const noexcept { return mode_; } RunNumber_t run() const noexcept { return run_; } Clock const* clock() const noexcept { return clock_.get(); } private: void prepareRun(Run const& r); void prepareEvent(Event const& e, ScheduleID); string const mode_; bool goodRun_{false}; // Updated per run unique_ptr<Clock const> clock_{nullptr}; // Updated per event }; ClockService::ClockService(ParameterSet const& pset, ActivityRegistry& reg) : mode_{pset.get<string>("mode")} { reg.sPreProcessRun.watch(this, &ClockService::prepareRun); reg.sPreProcessEvent.watch(this, &ClockService::prepareEvent); }

Kyle J. Knoepfel | LArSoft coordination meeting 20

slide-21
SLIDE 21

Example: Thread-unsafe code

12/17/19

class ClockService { public: ClockService(ParameterSet const& pset, ActivityRegistry& reg); string const& mode() const noexcept { return mode_; } RunNumber_t run() const noexcept { return run_; } Clock const* clock() const noexcept { return clock_.get(); } private: void prepareRun(Run const& r); void prepareEvent(Event const& e, ScheduleID); string const mode_; bool goodRun_{false}; // Updated per run unique_ptr<Clock const> clock_{nullptr}; // Updated per event }; ClockService::ClockService(ParameterSet const& pset, ActivityRegistry& reg) : mode_{pset.get<string>("mode")} { reg.sPreProcessRun.watch(this, &ClockService::prepareRun); reg.sPreProcessEvent.watch(this, &ClockService::prepareEvent); } void ClockService::prepareRun(Run const& r) { goodRun_ = clock_is_valid_for(r); } void ClockService::prepareEvent(Event const& e, ScheduleID) { clock_ = get_clock(mode_, goodRun_, e); }

Kyle J. Knoepfel | LArSoft coordination meeting 21

slide-22
SLIDE 22

Example: Thread-unsafe code

12/17/19

class ClockService { public: ClockService(ParameterSet const& pset, ActivityRegistry& reg); string const& mode() const noexcept { return mode_; } RunNumber_t run() const noexcept { return run_; } Clock const* clock() const noexcept { return clock_.get(); } private: void prepareRun(Run const& r); void prepareEvent(Event const& e, ScheduleID); string const mode_; bool goodRun_{false}; // Updated per run unique_ptr<Clock const> clock_{nullptr}; // Updated per event }; ClockService::ClockService(ParameterSet const& pset, ActivityRegistry& reg) : mode_{pset.get<string>("mode")} { reg.sPreProcessRun.watch(this, &ClockService::prepareRun); reg.sPreProcessEvent.watch(this, &ClockService::prepareEvent); } void ClockService::prepareRun(Run const& r) { goodRun_ = clock_is_valid_for(r); } void ClockService::prepareEvent(Event const& e, ScheduleID) { clock_ = get_clock(mode_, goodRun_, e); }

Not everything is const. L

Kyle J. Knoepfel | LArSoft coordination meeting 22

slide-23
SLIDE 23

Example: Thread-safe code (using persistent data structures)

12/17/19

class ClockService { public: ClockService(ParameterSet const& pset) : mode_{pset.get<string>("mode")} {} string const& mode() const noexcept { return mode_; } class RunData; class EventData; RunData DataForRun(Run const& r) const; private: string const mode_; };

Kyle J. Knoepfel | LArSoft coordination meeting 23

slide-24
SLIDE 24

class ClockService { public: ClockService(ParameterSet const& pset) : mode_{pset.get<string>("mode")} {} string const& mode() const noexcept { return mode_; } class RunData; class EventData; RunData DataForRun(Run const& r) const; private: string const mode_; };

Example: Thread-safe code (using persistent data structures)

12/17/19

class ClockService::RunData { public: RunData(string const& mode, Run const& r) : mode_{mode} , goodRun_{clock_is_valid_for(r)} {} string const& mode() const noexcept { return mode_; } bool goodRun() const noexcept { return goodRun_; } EventData DataForEvent(Event const& e) const; private: string const& mode_; bool const goodRun_; };

Kyle J. Knoepfel | LArSoft coordination meeting 24

slide-25
SLIDE 25

class ClockService { public: ClockService(ParameterSet const& pset) : mode_{pset.get<string>("mode")} {} string const& mode() const noexcept { return mode_; } class RunData; class EventData; RunData DataForRun(Run const& r) const; private: string const mode_; };

Example: Thread-safe code (using persistent data structures)

12/17/19

class ClockService::RunData { public: RunData(string const& mode, Run const& r) : mode_{mode} , goodRun_{clock_is_valid_for(r)} {} string const& mode() const noexcept { return mode_; } bool goodRun() const noexcept { return goodRun_; } EventData DataForEvent(Event const& e) const; private: string const& mode_; bool const goodRun_; }; class ClockService::EventData { public: EventData(RunData const& runData, Event const& e) : runData_{runData} , clock_{get_clock(runData.mode(), runData.goodRun(), e)} {} RunData const& runData() const noexcept { return runData_; } Clock const* clock() const noexcept { return clock_.get(); } private: RunData const& runData_; // By reference to avoid large memory unique_ptr<Clock const> const clock_; };

Kyle J. Knoepfel | LArSoft coordination meeting 25

slide-26
SLIDE 26

class ClockService { public: ClockService(ParameterSet const& pset) : mode_{pset.get<string>("mode")} {} string const& mode() const noexcept { return mode_; } class RunData; class EventData; RunData DataForRun(Run const& r) const; private: string const mode_; };

Example: Thread-safe code (using persistent data structures)

12/17/19

Everything is const. J

Kyle J. Knoepfel | LArSoft coordination meeting 26

class ClockService::RunData { public: RunData(string const& mode, Run const& r) : mode_{mode} , goodRun_{clock_is_valid_for(r)} {} string const& mode() const noexcept { return mode_; } bool goodRun() const noexcept { return goodRun_; } EventData DataForEvent(Event const& e) const; private: string const& mode_; bool const goodRun_; }; class ClockService::EventData { public: EventData(RunData const& runData, Event const& e) : runData_{runData} , clock_{get_clock(runData.mode(), runData.goodRun(), e)} {} RunData const& runData() const noexcept { return runData_; } Clock const* clock() const noexcept { return clock_.get(); } private: RunData const& runData_; // By reference to avoid large memory unique_ptr<Clock const> const clock_; };

slide-27
SLIDE 27
  • As only events within a subrun can be processed concurrently at the moment, only

event-level data must be protected.

  • The majority of the DetectorClocks interface still exists, but there is an extra

layer in between called DetectorClocksData.

But what does DetectorClocks look like?

12/17/19 Kyle J. Knoepfel | LArSoft coordination meeting 27

slide-28
SLIDE 28
  • As only events within a subrun can be processed concurrently at the moment, only

event-level data must be protected.

  • The majority of the DetectorClocks interface still exists, but there is an extra

layer in between called DetectorClocksData.

But what does DetectorClocks look like?

12/17/19

using detinfo::DetectorClocksService; MyProducer::MyProducer(ParameterSet const& pset) { ServiceHandle<DetectorClocksService const> clocks; double beam_time = clocks->BeamGateTime(); } void MyProducer::produce(art::Event& e) { ServiceHandle<DetectorClocksService const> clocks; double beam_time = clocks->BeamGateTime(); }

Old interface

Kyle J. Knoepfel | LArSoft coordination meeting 28

slide-29
SLIDE 29
  • As only events within a subrun can be processed concurrently at the moment, only

event-level data must be protected.

  • The majority of the DetectorClocks interface still exists, but there is an extra

layer in between called DetectorClocksData.

But what does DetectorClocks look like?

12/17/19

using detinfo::DetectorClocksService; MyProducer::MyProducer(ParameterSet const& pset) { ServiceHandle<DetectorClocksService const> clocks; double beam_time = clocks->BeamGateTime(); } void MyProducer::produce(art::Event& e) { ServiceHandle<DetectorClocksService const> clocks; double beam_time = clocks->BeamGateTime(); } using detinfo::DetectorClocksService; MyProducer::MyProducer(ParameterSet const& pset) { ServiceHandle<DetectorClocksService const> clocks; auto const clockData = clocks->GlobalData(); double beam_time = clockData.BeamGateTime(); } void MyProducer::produce(art::Event& e) { ServiceHandle<DetectorClocksService const> clocks; auto const clockData = clocks->DataForEvent(e); double beam_time = clockData.BeamGateTime(); }

Old interface New interface

Kyle J. Knoepfel | LArSoft coordination meeting 29

slide-30
SLIDE 30
  • Code using the DetectorClocks service must know if event-level data or global

data is needed.

– There are cases in the code where global-level data is cached by a module, and then used along with event-level detector-clocks values.

  • Framework-agnostic code that creates a DetectorClocks service handle must

be adjusted to receive the correct information.

  • Sounds like a big change (it is!), but there are upsides to it:

– I’ve implemented the majority of the changes on feature branches. – There are no new run-time dependencies; the dependence on DetectorClocks is just more explicit, and thus clearer. – In some cases, dependence on DetectorProperties was removed. – This gets us closer to multi-threaded execution of LArSoft facilities.

Consequences of this change

12/17/19 Kyle J. Knoepfel | LArSoft coordination meeting 30

slide-31
SLIDE 31
  • Proposal: The “persistent data structures” approach should be adopted for the

DetectorClocks and DetectorProperties providers and services.

  • Current status

– All LArSoft repositories have feature/team_for_mt branches using the new interface. – I am working on adjusting experiment repositories’ use of DetectorClocksService. – I suggest merging the feature/team_for_mt branches after the move to GitHub. This will allow the design to solidify, possibly enabling me to adjust the DetectorProperties interface before then. – FYI: Due to large number of changes, I have applied clang-format to those files that required adjustment.

  • I will present the list of breaking changes once the feature branches are ready to go

to GitHub.

Proposal

12/17/19 Kyle J. Knoepfel | LArSoft coordination meeting 31