HERCULES European School Neutron & Synchrotron radiation for - - PowerPoint PPT Presentation

hercules
SMART_READER_LITE
LIVE PREVIEW

HERCULES European School Neutron & Synchrotron radiation for - - PowerPoint PPT Presentation

Cloud-based data analysis: GPU-accelerated Coherent X-ray Imaging & HERCULES school perspective Vincent Favre-Nicolin ESRF, X-ray NanoProbe HERCULES director HERCULES European School Neutron & Synchrotron radiation for science


slide-1
SLIDE 1

Cloud-based data analysis: GPU-accelerated Coherent X-ray Imaging & HERCULES school perspective

Vincent Favre-Nicolin ESRF, X-ray NanoProbe HERCULES director

HERCULES

European School

Neutron & Synchrotron radiation for science

slide-2
SLIDE 2

COHERENT IMAGING SOFTWARE: PYNX

Page 2 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

30-100x increase in coherent flux after ESRF upgrade Need for software:

  • Robust
  • Algorithms
  • Standard experimental protocols
  • Fast (online/live analysis)
  • Computer clusters, GPU
  • Evolutive: flexible toolbox
  • Simple for users
  • Online (during experiment) and offline (back in home laboratory)
  • Can run in a notebook (cloud-based software)

http://ftp.esrf.fr/pub/scisoft/PyNX/doc/ https://software.pan-data.eu/software/102/pynx

2019/01/15

slide-3
SLIDE 3

GPU VS CPU COST EFFICIENCY (AMAZON)

Page 3 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

GPU (V100), OpenCL (clFFT) GPU (V100), CUDA Xeon E5-2686 4 cores, FFTW

2D FFT (16x1024x1024) 2.14 ms 1.09 ms 38 ms 3D FFT (128**3) 0.28 ms 0.12 ms 4 ms 3D FFT (256**3) 4.2 ms 0.7 ms 60 ms 3D FFT (512**3) 46 ms 5.54 ms 550 ms Amazon price/hour 3 € 3 € 0.4 € Cost per 10**6 2D FFT 0.11 € 0.06 € 0.26 € Cost per 10**6 3D FFT 38 € 4.6 € 61 €

  • GPU are 2 orders of magnitude faster compared to CPU (4 cores)
  • Price per FFT is ~1 order of magnitude cheaper per FFT
  • GPU memory max 48 Gb L

NB: timing does not include data transfer to GPU (implies long on-GPU computing) The 3D 512**3 FFT on the V100 runs at 3.3 Tflop/s x13 x5 x7

Notes: Xeon E5-2686 test on the Amazon V100 machine (4 core=8 vCPUs). 256 and 512 3D FFTs are 10-20% faster on ESRF scisoft14 (Xeon Gold 6134). FFTW with FFTW_MEASURE

2019/01/15

slide-4
SLIDE 4

You can use PyNX (without any GPU knowledge):

  • Python API with operators
  • Command-line scripts
  • Notebooks

For:

  • Coherent Diffraction Imaging (CDI)
  • Ptychograohy (near and far field)
  • Small-angle and Bragg geometry

Simple installation script but requires GPU workstation with CUDA and/or OpenCL

DATA ANALYSIS WITH PYNX (GPU-BASED)

Page 4 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

pynx-id16apty.py ptychomotors=mot_pos.txt,-x,y probe=focus,120e-6x120e-6,0.1 h5meta=meta.h5 h5data=data..nxs algorithm=analysis,ML**200,DM**300,probe=1,nbprobe=3 saveplot=object_phase save=all defocus=250e-6

2019/01/15

slide-5
SLIDE 5

EXAMPLE NOTEBOOK: PTYCHOGRAPHY

2019/01/15 Page 5 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

slide-6
SLIDE 6

CXI: COHERENT X-RAY IMAGING FORMAT

2019/01/15 Page 6 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

The CXI file format aims to create a data format with the following requirements: 1. Simple-both writing and reading should be made simple. 2. Flexible-users should be able to easily extend it. 3. Fast-it should be efficient so as not to become a bottleneck. 4. Extendable - new features should be easily added without breaking com- patibility with previous versions. 5. Unambiguous - it should be possible to interpret the files without using external information. 6. Compatible-the format should be as compatible as possible with existing formats Based on the HDF5 format Now with NeXus implementation (NXcxi_ptycho)

http://cxidb.org/cxi.html

slide-7
SLIDE 7

EXAMPLE 1: 3D CDI

Page 7 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

Test K20m Titan X Titan V DGX-V100 (1 GPU) 3D CDI, 512**3 235 s 127 s 39 s 32 s Time for 30 runs 2 h 1h 20 mn 16 mn

  • id10 dataset: Crystal Growth and Design, 14, 4183 (2017)
  • Data acquisition time: 3h

EBS Outlook:

  • id10 plans for 1k**3 and even 2k**3 datasets:
  • Size*2 = memory *8 = computation time * 10
  • Faster data acquisition
  • Expect 10x to 50x increase in data analysis
  • GPU with large amounts of memory are essential. 2k cannot fit in

current generation GPU. Need multi-GPU FFT (slow)

Notes:

  • 512**3 dataset, need 5 Gb
  • Recipe used for solution: 1000 cycles (800 RAAR + 200 ER)
  • 30 runs necessary to be sure of correct solution. Can be distributed.

2019/01/15

slide-8
SLIDE 8

3D PTYCHO-TOMO, NEAR FIELD

Page 8 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

Test K20m Titan X Titan V DGX-V100 (1 GPU) 2D Ptycho 17 frames 2k*2k 65 mn 36 mn 13 mn 10.5 mn Time for 720 projections (extrapolated) 33 days 18 days 6.6 days 5.2 days

  • id16A dataset, courtesy of P. Cloetens & J. da Silva
  • Reconstruction with ptypy took ~20h (10 cores) per projection
  • Extrapolation to a ptycho-tomo dataset with 720 angles
  • Data acquisition time for 700-800 angles: ~14h

EBS Outlook:

  • Faster data acquisition (10x)
  • Up to 2000 projections, and 4k frames
  • 50 to 100x increase in data analysis

Notes:

  • Recipe used for solution: 4000 DM + 300 ML, 3 probe modes
  • Time does not include tomography reconstruction & unwrapping

2019/01/15

slide-9
SLIDE 9

(COHERENT) IMAGING DATA ANALYSIS

  • Analysis algorithms/workflows exist
  • Data often needs some tuning of algorithms or data

corrections

  • Need GPU
  • Computing resources @facility:
  • Will allow online analysis to follow the experiment and tune

parameters/understand samples

  • Won’t allow analysis of all acquired datasets
  • Users need to continue data analysis in their home

laboratory as seamlessly as possible:

  • avoid relying on facility scientist post-experiment
  • Avoid too complicated software deployment
  • Hardware (virtual or not) solutions should be accessible
  • Reduce time-to and rate of publication after experiment
  • What about paid-for DaaS ? (private companies)
  • Can we have a DaaS marketplace ?

2019/01/15 Page 9 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

slide-10
SLIDE 10

CLOUD & GPU-BASED DATA ANALYSIS

  • Need GPU for fast data analysis (cloud: tested

with amazon EC2 machines)

  • Easy to provide virtual images with all the

requirements

  • Notebooks:
  • GPU analysis is possible
  • GPU memory not released by kernels (persistent

GPU context) => issue for multi-user machines

  • Client/GPU-server approach ? At least for

large datasets => broadcast jobs to GPU cluster

  • Need reliable / stable API for 2D and 3D data

display & manipulation

2019/01/15 Page 10 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

slide-11
SLIDE 11

3D IMAGING ANALYSIS: SEGMENTATION

2019/01/15 Page 11 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

Segmentation of coccoliths from 3D coherent diffraction imaging data. Þ Takes much longer than data acquisition & 3D reconstruction ! Þ Automated data processing through known algorithms work fine in the cloud (notebooks), but parts requiring heavy user manipulation are still challenging (no solution or no stable API)

A Gibaud, T. Beuvier, Y. Chushkin et al. – in press

Coccoliths: CaCO3 shells around phytoplankton, responsible for storing >50% of human produced CO2 (>500 Gt since 200 years)

slide-12
SLIDE 12

CLOUD-BASED TOOLS VALIDATION

2019/01/15 Page 12 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

Moving (user) communities to new tools:

  • “doing better than imageJ is hard”
  • “My users only want TIFF files, they can’t use

hdf5 ”

  • “It took us 10 years to create our scripts”
  • Workflows often have lots of options/need for

tuning to specific parameters

  • Need to involve & convince user communities

early.

  • Validate tools
  • If all PaN facilities could use the same portal,

that will be great !

slide-13
SLIDE 13

HERCULES SCHOOL

  • Annual school in ~March
  • Training PhD students and post-docs since 1991 to use neutrons &

synchrotron radiation, with a wide range of applications (from biology to condensed matter physics)

  • 70-80 participants every year (>2000 since 1991)
  • 5-week school, with 35-40% hands-on:
  • Practicals on instruments & beamlines
  • Tutorials with data analysis
  • 1 week (2 this year !) with groups in European partners
  • 19 HERCULES Specialised Schools (HSC):
  • 5 days, focused on a single topic
  • Also with 2 days of practicals & tutorials
  • Other schools:
  • Brazil in 2010, Taiwan in 2015, SESAME in 2019…
  • Calipsoplus 1-week regional schools (Solaris, …)

2019/01/15 Page 13 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

slide-14
SLIDE 14

HERCULES CLOUD ANALYSIS TRAINING

Evangelise:

  • the use of cloud-based data analysis
  • What FAIR principles mean
  • What is Open Data
  • What resources are available
  • Use cloud-based solutions for data analysis

Schedule is already packed but we can promote this through lectures, tutorials and practicals

2019/01/15 Page 14 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN

slide-15
SLIDE 15

MOTIVATING TUTORS & LECTURERS

Main difficulty for data analysis evolution (workstation->cloud):

  • scientists are already overwhelmed by running their beamline or

instrument, and need to help users after their experiment too !

  • Existing workflows ‘work’

Þ Difficult to find extra time to develop e-learning documents/examples unless it fits an immediate purpose However:

  • Cloud-based solutions should improve how easy users can analyse

data in their lab

  • E-learning examples should be identical to real experimental

workflows (or code/notebooks, …)

2019/01/15 Page 15 l PaNOSC Kickoff meeting l Vincent FAVRE-NICOLIN