eScience in the Netherlands
Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl
eScience in the Netherlands Rob van Nieuwpoort - - PowerPoint PPT Presentation
eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl We work demand-driven 35 Career paths Scale eScience eScience top eScience top 13 manager specialist researcher eScience eScience eScience 12
Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl
Research Technical Managerial Scale … 13 12 11 10 eScience research engineer eScience research engineer
eScience coordinator eScience specialist eScience researcher eScience manager eScience top researcher eScience top specialist
– Hiring people – Internal communication – Project kickoffs
– coordinators – Web sites, demo’s
– Courses, hackathons, sprints, … – Switch disciplines
NLeSC eScience competences applied in research
Data integration, data base optimization, structured & unstructured data, real time data
Statistics, machine learning, visualization, text mining
Distributed & accelerated computing, efficient algorithms
Optimized data handling Big Data analytics Efficient computing
Distributed computing
eStep
Accelerated computing Low power computing Orchestrated computing High-performance computing Natural Language processing Machine learning Information visualization Scientific visualization Information retrieval Computer vision Handling sensor data Linked data Information integration Databases Data assimilation
NLeSC projects eStep
Tailor Generalize Develop Adopt
eStep
project-specific software
discipline-specific software
enhanced science
e-infrastructure
generic libraries, tools, and algorithms
NLeSC projects
technology in eStep:
– State-of-the-art / best-of-breed? – Generic and overarching? – Match with our expertise areas? – Includes externally developed software
– Sufficiently generic – Modular – High quality – Must be taken into account from the start
Travis is CI
Test and Deploy with Confidence. Easily sync your GitHub projects with Travis CI and you’ll be testing your code in minutes!
We run a Jenkins CI instance locally. Used for private repositories and repositories requiring HPC middleware.
Open platform for building, shipping and running distributed applications.
deploy
software
eScience software
general audience
and knowledge you need, in one place
developers, PIs
Checklist available
Osmium Semanticizer xtas EDAL NLTK CommonSense AHN2 viewer
Cities poorly protected against heat Three persons in elderly house died due to heat End of 16 day heat wave Heat protection plan abandoned Elderly use heat info call desk massively
Courtesy Bert Holtslag
for assessing the green vegetation coverage and the soil moisture content
reflectivity and thermal characteristics of buildings and streets, the abundance of vegetation
Novel hourly forecasting system for human thermal comfort in urban areas on street level
Courtesy Bert Holtslag
simulations, data assimilation, data integration, multi-scale pattern recognition, geographic information systems, databases, …
MapReduce, Oracle, MySQL, Postgres, MonetDB, ElasticSearch, DataVault, JSON, Spark, …
Courtesy Willem Bouten
standing sitting floating
X-flapping gliding flapping flight
Heave vertical Surge forward Sway sideward
Courtesy Willem Bouten
Courtesy Willem Bouten
Modis satellite image of dust Courtesy Willem Bouten
– To be downhearted – Clenching fists – My heart fills with joy – My blood is boiling
– Shift in experienced emotions – Shift in embodiment of emotions
– Establishing corpus & standardizing text – Establishing emotional and bodily vocabularies – Emotion mining – Visualizing results
Source: Nummenmaa et al., 2013
Courtesy Lars Ridder
Courtesy Lars Ridder
scientific visualization, information retrieval, computer vision
Twiqs.nl, D3, ExtJS, Cesium, Leaflet, OpenLayers, GeoExt, X3Dom, X3DomExt, Mapnik, CommonSense, …
A A r r i t t h m h m e e t t i c c I I n n t t e n s e n s i t y t y
O( N ) O( log(N) ) O( 1 ) SpMV, BLAS1,2 Stencils (PDEs) Lattice Methods FFTs Dense Linear Algebra (BLAS3) Particle Methods
– Need completely different optimizations, algorithms
problem for many radio astronomy observations
– Lightning, Vehicles, airplanes, satellites, electrical equipment, GSM, FM Radio, fences, reflection of wind turbines, …
– Complete dataset available – Good overview / statistics / model – Can spend compute cycles
– Image-based transient detection (LOFAR/AARTFAAC) – Pulsar searching (WSRT/Apertif)
– Data rates simply too high to store
– Only very little loss in quality
– GPUs, Xeon Phi, … – Astronomy, ocean modeling, digital forensics, radar systems, high- energy physics
– Generate many codes at run-time, select most efficient
Pulsar B1919+21 in the Vulpecula nebula. Pulse profile created with folding and the LOFAR software telescope.
Background picture courtesy European Southern Observatory.
10 20 30 40 50 60 70 80 Xeon Phi NVIDIA GTX Titan GPU AMD HD7990 GPU 2 4 6 8 10 Xeon Phi NVIDIA GTX Titan GPU AMD HD7990 GPU
Performance compared to CPU Power usage compared to CPU
POP has a very large codebase written in Fortran 90 Callgraph obtained using gprof 3 kernels GPU optimized, 20% improvement
Henk Dijkstra, Ben van Werkhoven, et al.
Towards 2 km resolution. Better than space filling curves for distributed runs (36% improvement). Courtesy Jason Maassen
Several NLeSC applications require access to distributed compute and storage resources. Xenon provides a simple API for this, allowing rapid development of such applications Middleware independent: portable & reusable
– Magma: eMetabolomics mass spectrometry analysis – SIMCity: decision support for urban social economic complexity
– Named entity recognition, sentiment detection, document clustering, topic modeling
Generalized software from NLeSC portfolio
eStep
Externally developed software
you are here ASDI
Generalized software from NLeSC portfolio
eStep
Externally developed software
you are here DTEC
Essential Skills in Data-Intensive Research: Enabling your Research 25-29 January 2016, SURF Academy, Utrecht (for Life Sciences) NLeSC, SURFsara, SURFnet and DTL 5 day workshop, with both hands-on and taught components for 1st year PhD students without programming background Days 1-2 = Dealing with Data (Data Carpentry) Days 3-4 = Software and Programming (Software Carpentry) Day 5 = Introducing the e-Infrastructure Future courses requested with other domain focus
Based on SoftwareCarpenty.org model (NLeSC & SURFsara have trained course leaders)
Dealing with Data Data Stewardship and FAIR best practice From Excel to databases SQL practical R practical Computation & Automation Using the Shell Introduction to programming (Python) Github and version control. Unit testing Debugging Documentation Introduction to the e- Infrastructure SURF to introduce the national e-infrastructure Real world examples of e- infrastructure enhanced research from NLeSC
– Version Control and Unit Testing for Scientific Software – Shell, Git, Scientific Python – Testing and Continuous Integration with Python – From Excel to a Database – Data Management in the Ocean, Weather and Climate Sciences – Visualizing Your Data on the Web Using D3 – Working With Data on the Web – Intermediate/Advanced R Lessons – Programming with GAP
– Ecology
Analysis and Visualization in R, Data Analysis and Visualization in Python – Genomics
processing, Data analysis in R, Data visualization in R – Social sciences
– Biology – Geospatial data
– General frontend guidelines: https://github.com/bendc/frontend-guidelines – AngularJS: https://github.com/johnpapa/angular-styleguide – Airbnb JavaScript Style Guide: https://github.com/airbnb/javascript
– PEP8: https://www.python.org/dev/peps/pep-0008/
– Code Conventions for the JavaTM Programming Language (Oracle)
– https://github.com/ripienaar/free-for-dev#code-quality – http://shields.io/
Article about good development practices: The Joel Test: 12 Steps to Better Code.
– Jasmine, a behavior-driven development framework for testing JavaScript code. – Karma, Runs tests in web browser with code coverage. – PhantomJS, headless web browser on CI-servers.
– Unittest, nose and pytest.
– testthat
– To interact with web-browsers use Selenium. – Sauce Labs hosts a matrix of web-browsers and Operating Systems for testing. – AngularJS applications can be tested with Protractor.
– Source code comments – API documentation – Installation and usage documentation
to many formats
GitHub Flow
formulated in a readable way
– Use packaging that is well known and appropriate for user community: pypi, npm, maven, docker
NLeSC project portfolio
– Fortran, Python, vBrowser, TwiNL, XNAT, … – No support, not disseminated
as well as external software that NLeSC extends and improves
– Potree, OpenDA, ElasticSearch – Support for project partners only – Contribute improvements back to community
– Xenon, Magnesium, Osmium, xtas, esiBayes, Aether, … – Full support for project partners, limited support for Dutch scientific community, best effort for international community
including NLeSC