escience in the netherlands
play

eScience in the Netherlands Rob van Nieuwpoort - PowerPoint PPT Presentation

eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl We work demand-driven 35 Career paths Scale eScience eScience top eScience top 13 manager specialist researcher eScience eScience eScience 12


  1. eScience in the Netherlands Rob van Nieuwpoort R.vanNieuwpoort@esciencecenter.nl

  2. We work demand-driven

  3. 35

  4. Career paths Scale … eScience eScience top eScience top 13 manager specialist researcher eScience eScience eScience 12 coordinator specialist researcher 11 eScience research engineer eScience research engineer 10 Managerial Technical Research

  5. Lessons learnt • Demand-driven: start from the science • Collaboration, not competition (connected projects, calls) • Good is good enough • Generalization (10% ring-fenced) • Communication communication communication – Hiring people – Internal communication – Project kickoffs • IP, work place, generalization, co-authorships – coordinators – Web sites, demo’s • Keep challenging the RSEs – Courses, hackathons, sprints, … – Switch disciplines

  6. eStep The eScience technology platform A coherent set of technologies to tackle the grand challenges in eScience

  7. Cross-cutting basic skills • Code quality and best practices • Integration of software • Scaling of software • Analytics and statistics • Visualization

  8. NLeSC eScience competences applied in research Handling sensor data Linked data Optimized data handling Information integration 1. Optimized data handling Databases Data integration, data base optimization, Data assimilation structured & unstructured data, real time data Natural Language processing Machine learning Information visualization 2. Big data analytics eStep Big Data analytics Scientific visualization Statistics, machine learning, Information retrieval visualization, text mining Computer vision 3. Efficient computing Distributed computing Distributed & accelerated Accelerated computing Efficient computing computing, efficient algorithms Low power computing Orchestrated computing High-performance computing

  9. • Key expertises are used in many projects • Projects often use quite a number of different competences and technologies

  10. eStep Goals • Prevent fragmentation and duplication • Promote the exchange and re-use of best practices • Represent NLeSC’s expertise and knowledge base • Improve the eScience state of the art with a fundamental eScience research line

  11. Tailor Adopt Develop eStep NLeSC projects Generalize

  12. enhanced science • Main criteria for integrating technology in eStep: project-specific software projects NLeSC – State-of-the-art / best-of-breed? discipline-specific software – Generic and overarching? – Match with our expertise areas? overarching software generic libraries, tools, and algorithms – Includes externally developed eStep software Open platform! e-infrastructure

  13. Our sustainability approach • Prevent duplication, fragmentation • Build something that is worth sustaining! – Sufficiently generic – Modular – High quality – Must be taken into account from the start • Enforce software engineering guidelines and best practices • Educate partners with software carpentry and data carpentry • Open source / open access, open standards, unless… • Community coding • Standardization for software and data formats • eStep is an open platform

  14. A Common Workflow @ NLeSC Gi GitHub Hub Travis is CI Test and Deploy with Confidence. We run a Jenkins CI instance locally. Easily sync your GitHub projects with Travis CI Used for private repositories and and you’ll be testing your code in minutes! repositories requiring HPC middleware. deploy Open platform for building, shipping and running distributed applications.

  15. • technology.esciencecenter.nl • Non-technical, targets general audience software eScience software

  16. • estep.esciencecenter.nl • All eScience software and knowledge you need, in one place • Technical, targets developers, PIs

  17. Knowledge base • knowledge.esciencecenter.nl • training and education • best practices • tutorials • white papers • training resources • Software development Checklist available

  18. More info on eStep technology.esciencecenter.nl estep.esciencecenter.nl R.vanNieuwpoort@esciencecenter.nl

  19. Logo Bingo CommonSense xtas Osmium NLTK EDAL Semanticizer AHN2 viewer

  20. Optimized data handling

  21. Summer in the city example: human thermal comfort Three persons in elderly house died due to heat Elderly use heat info call desk massively Heat protection plan abandoned Cities poorly protected against heat End of 16 day heat wave Courtesy Bert Holtslag

  22. Summer in the city example Novel hourly forecasting system for human thermal comfort in urban areas on street level • Kilometer scale: elevation (AHN2) and land-use data (Kadaster), imagery for assessing the green vegetation coverage and the soil moisture content • Street scale: sky view factor, the building height to street width ratio, the reflectivity and thermal characteristics of buildings and streets, the abundance of vegetation • Network of weather stations and crowd sourcing: wunderground.com • Special measuring campaigns • Social media? • Combine with fine-grained models Courtesy Bert Holtslag

  23. Via Appia

  24. Optimized Data Handling Technology • Distributed sensor networks, multi-model and multi-scale simulations, data assimilation, data integration, multi-scale pattern recognition, geographic information systems, databases, … • Xenon, NetCDF, HDF5, ROOT, XNAT, OpenDA, Hadoop, MapReduce, Oracle, MySQL, Postgres, MonetDB, ElasticSearch, DataVault , JSON, Spark, …

  25. Big Data Analytics

  26. eEcology example Courtesy Willem Bouten

  27. Accelerometer and Behaviour Heave vertical Surge forward Sway sideward Static acceleration sitting floating standing Dynamic acceleration flapping flight gliding X-flapping Courtesy Willem Bouten

  28. Machine learning / annotation interface

  29. Routes and Geology Courtesy Willem Bouten

  30. Detours and Climate Modis satellite image of dust Courtesy Willem Bouten

  31. Embodied Emotion Project • Mapping bodily expression of emotions – To be downhearted – Clenching fists – My heart fills with joy – My blood is boiling • Test case: Dutch theatre texts 1600-1830 – Shift in experienced emotions – Shift in embodiment of emotions • Approach – Establishing corpus & standardizing text – Establishing emotional and bodily vocabularies – Emotion mining – Visualizing results Source: Nummenmaa et al., 2013

  32. eMetabolomics example • Use reaction rules to identify compounds in Mass-spectrometry datasets • Online at http://www.emetabolomics.org/ • Public and private version, private allows bigger/longer calculations • Lars Ridder, Laboratory of Biochemistry, Wageningen University

  33. Courtesy Lars Ridder

  34. Courtesy Lars Ridder

  35. forecast.ewatercycle.org

  36. Big Data Analytics Technology • Natural language processing, machine learning, information and scientific visualization, information retrieval, computer vision • Matlab, R, NumPy, SciPy, scikit learn, Pandas, Weka, Xtas, Twiqs.nl, D3, ExtJS, Cesium, Leaflet, OpenLayers, GeoExt, X3Dom, X3DomExt, Mapnik, CommonSense , …

  37. Efficient Computing

  38. Efficient Computing • Smart algorithms can improve performance dramatically • Power consumption is becoming the bottleneck • Legacy codes are inefficient on modern architectures – Need completely different optimizations, algorithms O( 1 ) O( N ) O( log(N) ) A r A r i t t h m h m e e t t i c c I I n n t t e n s e n s i t y t y SpMV, BLAS1,2 FFTs Dense Linear Algebra Stencils (PDEs) (BLAS3) Lattice Methods Particle Methods

  39. Efficient Computing Example • Radio Frequency Interference (RFI) is a huge problem for many radio astronomy observations • Caused by – Lightning, Vehicles, airplanes, satellites, electrical equipment, GSM, FM Radio, fences, reflection of wind turbines, … • Best removed offline – Complete dataset available – Good overview / statistics / model – Can spend compute cycles • Partner: Astron

  40. Real-time RFI mitigation • Some pipelines need to run in real time today – Image-based transient detection (LOFAR/AARTFAAC) – Pulsar searching (WSRT/Apertif) • SKA will be entirely real-time – Data rates simply too high to store • Novel algorithms with linear computational complexity – Only very little loss in quality

  41. RFI mitigation on accelerators Pulsar B1919+21 in the Vulpecula nebula. Pulse profile created with folding and the LOFAR software telescope. Background picture courtesy European Southern Observatory. • Accelerator-based computing – GPUs, Xeon Phi, … – Astronomy, ocean modeling, digital forensics, radar systems, high- energy physics • Auto-tuning & runtime compilation – Generate many codes at run-time, Performance compared to CPU Power usage compared to CPU select most efficient 80 10 70 8 60 50 6 40 4 30 20 2 10 0 0 Xeon Phi NVIDIA GTX AMD HD7990 Xeon Phi NVIDIA GTX AMD HD7990 Titan GPU GPU Titan GPU GPU

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend