the icecube data pipeline from the south pole to
play

The IceCube data pipeline: from the South Pole to publication - PowerPoint PPT Presentation

The IceCube data pipeline: from the South Pole to publication Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21 2 Deutsches Elektronen-Synchrotron (DESY) Zeuthen Helmholtz research institute with ~200 scientists, postdocs,


  1. The IceCube data pipeline: from the South Pole to publication Jakob van Santen jakob.van.santen@desy.de PyData Berlin, 2016-05-21

  2. 2 Deutsches Elektronen-Synchrotron (DESY) Zeuthen Helmholtz research institute with ~200 scientists, postdocs, and students studying Kosmos high-energy astrophysics with gamma rays and neutrinos Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  3. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  4. 5 What’s a neutrino? Charged (electromagnetic interactions) Neutral (weak interactions only) 2.5e6 times less massive Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  5. 6 Sources of neutrinos Image: Wikipedia Image: chemistryviews.com Image: N. Svoboda Image: CERN Radioactive Nuclear Man-made Cosmic The Sun decay reactors particle accelerators accelerators ~10 6 eV ~10 9 eV ~10 15 eV Higher energy Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  6. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  7. 8 Cosmic rays Something accelerates nuclei to …but we don’t know what, or macroscopic energies… where! Knee Tibet-III 4 10 2nd Knee Amenomori et al., ICRC 2011 Grigorov ] JACEE 5 TeV -1 sr MGU 3 -1 10 Tien-Shan s -2 Ankle Tibet07 m Akeno 1.6 CASA-MIA [GeV HEGRA 2 10 Fly’s Eye F(E) Kascade Kascade Grande 2.6 1 Joule IceTop-73 E 10 HiRes 1 HiRes 2 Telescope Array Auger PRD 86 : 010001 (2013) 20 TeV 1 13 15 16 17 18 19 20 14 10 10 10 10 10 10 10 10 IceCube-59 E [eV] Abbasi et al., ApJ, 746 , 33, 2012 Neutrinos can point back to the cosmic accelerators! Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  8. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  9. South Pole Station: 90 deg South, 2835 m above sea level Image: NASA ~2800 m of pure, clear ice Image: USAF

  10. 11 South Pole Station IceCube Lab Main station Photo: Haley Buffman/NSF Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  11. 13 IceCube: a cubic-kilometer neutrino telescope buried in ice IceCube Lab (data center) Digital Optical Module (single-pixel camera) Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  12. Why look for them at the South Pole? What are we trying to learn? IceCube South Pole Neutrino Observatory How does IceCube find neutrinos? What’s a neutrino?

  13. 15 IceCube data pipeline South Pole offline (real time) Challenges: ‣ Getting data out of the South Pole Simulation ‣ Generating simulated data ‣ Allowing non-expert users to configure & extend data pipeline for Data Feature calculation many distinct science topics acquisition & event selection ‣ Distributing data to analyzers Analysis Science! Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  14. 16 A neutrino event in IceCube Color ⇔ time Size ⇔ light intensity Neutrino Interaction Muon Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  15. 17 Raw data ‣ 1 neutrino for every 1 million penetrating muons ‣ ~10 high-energy neutrino events per year ‣ Need features to select them! 10 milliseconds of raw data Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  16. Feature calculation & data selection

  17. 19 IceTray: IceCube’s processing framework ‣ Core written in ~20k lines of C++ ‣ User interface exposed via boost::python ‣ Two main components: • I3Frame : container for event data • I3Module : manipulates I3Frames ‣ Data storage in files Images: boost.org Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  18. 20 I3Frames I3Frame: dictionary of [immutable] C++ objects related to a single event In [ 1 ]: from icecube import icetray, dataio, dataclasses In [ 2 ]: f=dataio.I3File('hese.i3.bz2') In [ 3 ]: print f.pop_frame(icetray.I3Frame.DAQ) [ I3Frame (DAQ): 'CalibrationErrata' [DAQ] ==> I3Vector<OMKey> (137) 'FilterMask' [DAQ] ==> I3Map<string, I3FilterResult> (749) 'I3Geometry' [Geometry] ==> I3Geometry (401222) 'I3TriggerHierarchy' [DAQ] ==> I3Tree<I3Trigger> (616) 'OfflinePulses' [DAQ] ==> I3Map<OMKey, vector<I3RecoPulse> > (52917) 'PoleCascadeLinefit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFit' [DAQ] ==> I3Particle (150) 'PoleMuonLlhFitFitParams' [DAQ] ==> I3LogLikelihoodFitParams (68) 'PoleToIParams' [DAQ] ==> I3TensorOfInertiaFitParams (78) Flexible! Schema can change from event to event ] “I3” file is a stream of serialized I3Frames boost::serialization provides load/save, object versioning, etc. Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  19. 21 I3Modules ‣ I3Module : single-purpose processing stage ‣ User (physicist) configures module chain in Python I3Module ‣ An I3Module can: Frame • Add new objects to the frame I3Module • Remove objects from the frame • Drop the frame Frame tray = I3Tray() I3Module tray.Add("I3Reader", filenamelist="foo.i3") Frame tray.Add('HomogenizedQTot', Output='HomogenizedQTot', Pulses='OfflinePulsesHLC') I3Module tray.Add("I3Writer", filename="bar.i3") tray.Execute() Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  20. 22 User-defined I3Modules class Counter(icetray.I3ConditionalModule): def __init__(self, context): super(Counter,self).__init__(context) self.AddParameter("Key", "Name of counter to put in the frame", "Count") def Configure(self): self.key = self.GetParameter("Key") self.counter = 0 def Physics(self, frame): frame[self.key] = icetray.I3Int(self.counter) self.counter += 1 self.PushFrame(frame) tray.Add(Counter, Name="CountCount") Prototype rapidly in Python, rewrite in C++ as needed Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  21. 23 Filtering at the South Pole IceCube Lab Satellite relay IceCube Data Warehouse 300 events/s (Madison, WI) 100 GB/day 4 PB and counting 3000 events/s 1 TB/day Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  22. 24 Aside: grid computing ‣ Simulation requires tens of millions of CPU and GPU hours ‣ Opportunistic computing on academic grids in US and Europe with HTCondor glide-ins, custom Python middleware • Some Linux flavor (usually Red hat variant) • Software provisioned on CVMFS (HTTP-based read- only filesystem) Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  23. 25 Data formats for analysis I3Frame Event data ‣ I3Frame: flexible, but inefficient for partial reads ‣ Analysis development means Specific I3ParticleConverter I3FilterMaskConverter coercion for reading the same data over and each object over again → tabular formats Abstract table I3TableRow I3TableRow row ‣ tableio : framework for turning irregular event data into table rows Format-specific HDF5 ROOT backend pytables, pandas, h5py, etc. Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  24. Analysis

  25. 27 Histogramming Most IceCube analyses use binned 10 9 data Pre-selection 10 8 10 7 Pro Penetrating Events per year 10 6 ‣ Predicted mean in each bin is muons 10 5 straightforward to calculate with 10 4 Monte Carlo 10 3 ‣ Statistics are easy to understand 10 2 Atmos. neutrinos 10 1 Con 10 0 ‣ Have to choose how to bin 10 1 10 2 10 3 10 4 Number of collected photons Blindness Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  26. 28 dashi: histograms that do more numpy.histogramdd()-backed histogram objects with built-in ‣ summary statistics ‣ manipulation methods: add, multiply, slice, project, etc. ‣ storage in hdf5 datasets # create & fill 3d histogram h = dashi.histogram.histogram(3, (linspace(0, 1, 101),)*3) h.fill(get_3d_data()) # project out dimension 1 h.project([0,2]) # plot a 1-d slice https://github.com/emiddell/dashi h[1,1,:].line(differential=True) # store for later with tables.open_file('foo.hdf5', 'a') as hdf: dashi.histsave(h, hdf, '/', 'my_histogram') Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  27. 29 Example: discovering astrophysical neutrinos Simple event selection based on 2 features: ‣ > 6000 photon hits ‣ hit pattern starts inside detector volume 28 events survived in 2 years of Veto data μ Veto ν μ ✓ μ ✘ Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

  28. 30 Analysis Q: What is the chance that the data is a fluctuation of the background? Energy Zenith angle 80 Showers Tracks 60 IceCube Preliminary Declination (degrees) 40 20 0 -20 -40 -60 -80 10 2 10 3 Deposited EM-Equivalent Energy in Detector (TeV) Bin data in observable space, compare counts to predicted mean in each bin A: < 5e-7 (discovery!) → doi:10.1126/science.1242856 Jakob van Santen - The IceCube data pipeline - jakob.van.santen@desy.de

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend