Accelerating Science Platforms for Machine Learning, Big Data, and - - PowerPoint PPT Presentation

accelerating science platforms for machine learning big
SMART_READER_LITE
LIVE PREVIEW

Accelerating Science Platforms for Machine Learning, Big Data, and - - PowerPoint PPT Presentation

Accelerating Science Platforms for Machine Learning, Big Data, and Earth System Science John Taylor, John Zic, Jose Alverez, Oliver Obst George Opletal, Maciej Golebiewski, Amanda Barnard Emlyn Jones, Josh Bowden August 2015 www.csiro.au


slide-1
SLIDE 1

www.csiro.au

Accelerating Science Platforms for Machine Learning, Big Data, and Earth System Science

John Taylor, John Zic, Jose Alverez, Oliver Obst George Opletal, Maciej Golebiewski, Amanda Barnard Emlyn Jones, Josh Bowden August 2015

slide-2
SLIDE 2

62% of our people hold

university degrees

2000 doctorates 500 masters

In partnership with universities, we develop 650 postgraduate research students

Top 1% of global research

institutions in 14 of 22 research fields

Top 0.1% in 4 research fields

About CSIRO

Darwin

Alice Springs Geraldton

2 sites

Atherton Townsville

2 sites

Rockhampton Toowoomba Gatton Myall Vale Narrabri Mopra Parkes Griffith Belmont Geelong Hobart Sandy Bay Wodonga Newcastle Armidale

2 sites

Perth

3 sites

Adelaide

2 sites

Sydney 5 sites Canberra 7 sites

Murchison Cairns Irymple

Melbourne 5 sites

Werribee 2 sites

Brisbane

6 sites

Bribie Island

People Locations Flagships Budget 5000 58 9 $1.3B+

slide-3
SLIDE 3

CSIRO Computational and Simulation Sciences/IMT

2009: CSIRO Bragg Cluster Launch, first of its kind in AU

2013: Bragg upgrade - 384 Kepler K20M GPUs November 2015: #298 TOP500 List #24 Green500 List November 2014: #154 TOP500 List #11 Green500 List

slide-4
SLIDE 4

25 50 75 100 125 2013 2014 2015

100+ accelerated systems now on Top500 list 1/3 of total FLOPS powered by accelerators NVIDIA Tesla GPUs sweep 23 of 24 new accelerated supercomputers Tesla supercomputers growing at 50% CAGR

  • ver past five years

Top500: # of Accelerated Supercomputers

ACCELERATORS SURGE IN WORLD’S TOP SUPERCOMPUTERS

Source: NVIDIA, TOP500 List

slide-5
SLIDE 5

CSIRO Bragg GPU Cluster

TOP500 and Green500 Rankings

50 100 150 200 250 300 350 2010/11 2011/6 2011/11 2012/6 2012/11 2013/6 2013/11 2014/6 2014/11 2015/6 2015/11 TOP500 Rank Green500 rank

CSIRO Computational and Simulation Sciences/IMT

slide-6
SLIDE 6

Section 1: ConvNets in Bragg

Jose Alverez

slide-7
SLIDE 7

Simplifying ConvNets via Filter Compositions

  • Key properties of the network:
  • Low-rank filter restrictions during training.
  • Larger receptive fields.
  • Deeper models (more non-linear layers).
  • Additional parameter sharing.
  • Reduced parameter redundancy.
  • Overall important reduction in the number of parameters.

Presentation title | Presenter name 7 |

*Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear

slide-8
SLIDE 8

Quantitative Results: ImageNet

  • ImageNet dataset:
  • 1.2 million training images and 50.000 for validation split in 1000 categories.
  • Between 5000 and 30000 training images per class
  • Accuracy reported as Top-1 using a single centered crop.
  • No data augmentation for training

Presentation title | Presenter name 8 |

NETWORK NUMBER OF PARAMETERS NUMBER OF

  • CONV. LAYERS

TOP-1 ACCURACY AlexNet OWT Bn 61M 5 57.9% B-NET (VGG-B) 133M 10 62.5% OURS* 15M 16 66.6% *Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear

slide-9
SLIDE 9

Quantitative Results: Places2

  • Places2 dataset:
  • 10+ million images split 401 unique scene categories.
  • Between 5000 and 30000 training images per class and 20000 validation images.
  • Accuracy reported as Top-1 using a single centered crop.
  • No data augmentation for training.

Presentation title | Presenter name 9 |

NETWORK NUMBER OF PARAMETERS NUMBER OF

  • CONV. LAYERS

TOP-1 ACCURACY AlexNet OWT Bn 58.6 M 5 44.5% B-NET (VGG-B) 130M 10 44.0% OURS* 10.2M 16 47.4% *Alvarez and Petersson, Simplifying ConvNets for End-to-End Learning. To appear

slide-10
SLIDE 10

Timings

Presentation title | Presenter name 10 |

slide-11
SLIDE 11

Section 2: Simulated Nanostructure Assembly (SNAP)

George Opletal, Maciej Golebiewski, Amanda Barnard

slide-12
SLIDE 12
  • Traditional atomistic molecular dynamics (MD)

modelling of nanoparticle self-assembly is computational prohibitive.

  • However, in many cases, the interactions between

nanoparticles are dominated by surface electrostatic forces, and thus internal bonding can be neglected.

  • Approximate many atom nanoparticle by a course

grained surface point mesh model.

  • Developed the Simulated Nanostructure Assembly

with Protoparticles (SNAP) package.

Atomistic Nanoparticle Surface mesh representation

SNAP - Introduction

  • E. Osawa, D. Ho, Nanodiamond and its application

to drug delivery, J. Med. Applied. Sci. 2(2) 2012, 31- 40.

slide-13
SLIDE 13

SNAP package

Analyser

Analysis of the final configuration and dynamical evolution of particle assembly.

Simulator

Usually an NVT simulation quenched to produce a particle aggregate.

Generator

Designs particles, initial configuration and potentials

0.0 1.0

  • 1000

1000 3000 5000

Interfacial Probability Time (ps)

(100)|(100) (111)|(111)

  • SNAP is installed on CSIRO BRAGG GPU Cluster
slide-14
SLIDE 14
  • Interactions between pairs of nanoparticle facets in

different orientations calculated via ab-initio methods.

  • Binding energy curves then fitted to Morse

potentials with parameters for each pair of facet combination interactions. The parameters are then distributed over a facet’s points.

  • Morse parameters can incorporate functionalised

surfaces (hydroxylation, hydrogenation etc).

  • User defined nanoparticles held together by a

harmonic potential.

SNAP – Simulator Modelling Interactions

Clean Hydrogen Passivation Hydroxyl functionalization

slide-15
SLIDE 15
  • 2. SNAP – Simulator Acceleration via parallelization

2 4 6 8 10 12 1000 5000 10000 20000 50000 100000

STEPS/SEC Number of Nanodiamonds 9 GPUs CUDA-MPI Versus Serial CPU Code

CUDA-MPI 9 GPUs Serial CPU

slide-16
SLIDE 16
  • 2. SNAP – Analyser

Reads in output from Simulator and performs a variety of analysis including,

  • Interfacial Probabilities (which facets align and which are free pointing into voids)
  • Pore size distributions (shows the range of void sizes in the aggregate)
  • Particle distribution functions (gives information on the short, medium and long range ordering)
  • Fractal dimension (probes self similarity at different size scales and useful for characterization
  • f aggregates)
  • Visualization via POVRAY or VMD

Often analysis is dynamical (as a function of time)

0.0 1.0 1000 2000 3000 4000

Interfacial Probability Time (ps)

(100)|(100 ) (110)|(110 ) Void locations where a 3.2 nm particle could fit

slide-17
SLIDE 17
  • 3. Vast experimental parameter space

CUBE OCTAHEDRON RHOMBIC DODECAHEDRON

Particle Geometry Particle Size Particle Density Particle Geometry Composition Surface Functionalization

100-100 facet binding energy

slide-18
SLIDE 18
  • 3. A few points in parameter space…..

CUBE 100 facets

CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 570 points, 664Å, 150,000 x 1fs steps

slide-19
SLIDE 19
  • 4. A few points in parameter space…..

111 facets

CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 544 points, 664Å, 150,000 x 1fs steps

OCTAHEDRON

slide-20
SLIDE 20
  • 4. A few points in parameter space…..

111 facets

CSIRO Bragg GPU Cluster (6 GPUs used over 15 hours), 5832 particles X 544 points, 664Å, 150,000 x 1fs steps

OCTAHEDRON

slide-21
SLIDE 21
  • 4. Larger, more complex using Bragg GPU cluster
  • Size distribution - 22Å (20%), 27Å (50%), 32Å

(30%)

  • Experimental density - 2x1019 particles / cm3
  • Facet interaction energies from DFT
  • Clean facets
  • 46656 particles (about 25 million surface

interaction points)

  • 0.132 µm cell length
  • 0.15 ns simulation time (150,000 steps at 1 fs)

6 GPUs over 130 hours

RED – (100) BLUE – (111) GREEN – (110)

slide-22
SLIDE 22

0.1 0.2 0.3 0.4 0.5

Interfacial Probability Nanoparticle Facet

22Å (100) 22Å (111) 22Å (110) 27Å (100) 27Å (111) 27Å (110) 32Å (100) 32Å (111)

All 32Å Mixed Sizes

0.005 0.01 0.015 0.02 0.025 0.03 20 40 60 80 100

Pore Size Distribution (cm3/g.Å) Pore Diameter (Å)

Type 1 50 100 150 200 250 300 350 400 2 3 4 5 6 7

Number of nanoparticles (out of 5000) Number of q6·q6 interactions

Type 1

All 32Å Mixed Sizes

Mixtures produce larger pore sizes Mixtures are more “random”

All 32Å Mixed Sizes

Largest ‘111’ facets dominate interaction

Applications – Nanodiamonds

Polydisperse aggregate

slide-23
SLIDE 23

Porous Cages Mega-clusters

GPU-Accelerated Molecular Dynamics

“We performed the largest self-assembly simulation of organic cages”

Evans et al. Journal of Physical Chemistry C, 2015, DOI:101.1021/jp512944r

  • 424,000 atoms
  • 47,000 bonds
  • 786,000 angles
  • 126,000 dihedrals
  • 2 million molecular dynamics steps
  • Pairwise interactions
  • Long-range coulombic interactions
  • Periodic boundary conditions

Wall time reduced from 100 to 15 hours using GPUs

slide-24
SLIDE 24

Section 3: Big Data Analytics

John Zic, Emlyn Jones, Josh Bowden

slide-25
SLIDE 25

Presentation title | Presenter name | Page 25

Pulsar data from CSIRO's Parkes telescope

slide-26
SLIDE 26
  • Opportunity? Providing external collaborators access to

internationally significant science data + compute to process = “Science as a Service”

PPTA-HPC progress to date

DAP pulsar repository Compute on Bragg Cluster

slide-27
SLIDE 27

Eigenvalue decomposition using MAGMA

27 | More information josh.bowden@csiro.au

2 4 6 8 10 12 14

5000 10000 15000 20000 25000 30000 35000 40000 45000

Speedup

Problem Size (N = M = K) MAGMA magma_2stage_syevdx() and MAGMAMIC magma_dsyevd() speedup over 16 core Sandybridge MKL dsyevr() R function eigen()

3 K20 2 K20 1 K20 MIC (7120)

The functionality is being incorporated into an R package used for predictive genomic modelling from large sequencing datasets.

slide-28
SLIDE 28

1980 1990 2000 2010 20 30 40 50 60 70

Field 1

Y ear Soil Carbon (t/ha)

  • A

Wheat−Wheat Wheat−Wheat 1980 1990 2000 2010 20 30 40 50 60 70

Field 2

Y ear Soil Carbon (t/ha)

  • B

Wheat−Fallow Wheat−Fallow 1980 1990 2000 2010 20 30 40 50 60 70

Field 3

Y ear Soil Carbon (t/ha)

  • C

Wheat−Pasture Wheat−Pasture

Quantifying Uncertainty and in Soil Carbon Dynamics

slide-29
SLIDE 29

Quantifying sediment Loads to the Great Barrier Reef

slide-30
SLIDE 30

Instrument Design

Instrument design

CSIRO Computational and Simulation Sciences

“We’ve started to use the GPU cluster to speed up modelling of nuclear analysers such as CSIRO’s air cargo scanner. The speed is up to 5,000 to 10,000 times that of a normal desktop computer if we use most of the cluster. With this performance increase, simulations that normally take hours can be run interactively in real-time. We expect this interactivity to significantly benefit the design and

  • ptimisation of new nuclear

instruments.”

slide-31
SLIDE 31

www.csiro.au

Data61 John A. Taylor t +61 2 6216 7077 E John.Taylor@data61.csiro.au w www.csiro.au

Thank you