Ms all de la fisica: el boom de la ciencia de datos From HEP to Big - - PowerPoint PPT Presentation

m s all de la fisica el boom de la ciencia de datos
SMART_READER_LITE
LIVE PREVIEW

Ms all de la fisica: el boom de la ciencia de datos From HEP to Big - - PowerPoint PPT Presentation

Ms all de la fisica: el boom de la ciencia de datos From HEP to Big Data Dra. Brbara Milln Mejas Dra. Camila Rangel Smith Booking.com The Alan Turing Institute barbaramillan@gmail.com camila.rangel.smith@gmail.com 1 Our journey:


slide-1
SLIDE 1

Más allá de la fisica: el boom de la ciencia de datos

From HEP to Big Data

  • Dra. Bárbara Millán Mejías

Booking.com barbaramillan@gmail.com

  • Dra. Camila Rangel Smith

The Alan Turing Institute camila.rangel.smith@gmail.com

1

slide-2
SLIDE 2

Our journey: From Venezuela to Science to Data Science Bárbara: ○ La Guaira ○ Bachelor Physics - USB ○ Master - Particles and Astroparticles UvA (ATLAS experiment/ CERN) ○ PhD - University of Zurich ○ CMS collaboration LHC @CERN ○ 5 years Booking.com ■ Data Scientist ■ Product Manager Data Science

slide-3
SLIDE 3

Our journey: From Venezuela to Science to Data Science

  • Camila:

○ Mérida ○ Bachelor Physics - ULA. ○ PhD Particle Physics in Université Paris Diderot (ATLAS experiment). ○ Postdoctoral fellow at Uppsala University (ATLAS experiment). ○ Data Scientist: ■ Digital Assess (2016-2018) ■ The Alan Turing Institute (present).

slide-4
SLIDE 4

Data Scientist

High-ranking professional with the training and curiosity to make discoveries in the world of big data.

4

slide-5
SLIDE 5

5

slide-6
SLIDE 6

What does a data scientist do?

  • Define the questions
  • Define the data sets
  • Obtain the data
  • Clean the data
  • Exploratory data analysis
  • Statistical prediction or modeling
  • Results interpretation
  • Challenge results
  • Synthesize and writes up results
  • Create reproducible code
  • Distribute results

6

slide-7
SLIDE 7

What does a data scientist do?

Follows the scientific method

7

slide-8
SLIDE 8

Techniques

  • Statistical analysis

○ Bayesian/Frequentist ○ Statistical hypothesis ■ A/B testings e-commerce

  • Simulations
  • Machine learning

○ Linear regressions ○ Logistic regressions ○ Visualisation

  • Time series analysis
  • Deep learning
  • Natural language processing

8

slide-9
SLIDE 9

An example from e-commerce: Booking.com

9

slide-10
SLIDE 10

Understanding families

slide-11
SLIDE 11

30% of the searches done by ‘Family with children’ guests do not specify number

  • f children

Missing children

slide-12
SLIDE 12

Hypothesis: People forget to add their children

slide-13
SLIDE 13

Missing kids

slide-14
SLIDE 14

Role of machine learning

  • At the stay review form, users tell

us if they are a family, a group, solo or a couple

  • Build a Machine Learning Model

that guesses the traveller type using information like location etc.

  • Apply the treatment only when

the model says the user is most likely a family.

slide-15
SLIDE 15

A/B testing

A/B testing is jargon for a randomized controlled trials with two variants, A and B, which are the control and treatment in the controlled experiment. Looking for statistically significants.

15

slide-16
SLIDE 16

Base. Variant.

Which one performed better?

16

slide-17
SLIDE 17

An example of academy/industry collaborations: The Alan Turing Institute

17

slide-18
SLIDE 18
  • UK national institute for data science and artificial

intelligence.

  • Collaborate with universities, businesses and public and third

sector organisations to apply research to real-world problems.

  • Break down disciplinary boundaries; at the Turing, computer

scientists, engineers, statisticians, mathematicians, and scientists work together under one shared goal.

About the institute

18

slide-19
SLIDE 19

19

Safety of offshore floating facilities: Predicting the hazardous conditions faced by offshore oil and gas facilities, to inform and improve operational decision-making

○ Combination of tides and seabed shape around the continental shelf can lead to the formation of powerful ‘soliton’ waves, these are solitary non-linear waves that retain their shape and speed as they propagate. ○ Soliton waves can pose a hazard to offshore oil+gas facilities, particularly when loading/unloading to a tanker.

https://www.turing.ac.uk/research/research- projects/safety-offshore-floating-facilities

19

slide-20
SLIDE 20

20

Safety of offshore floating facilities: Predicting the hazardous conditions faced by offshore oil and gas facilities, to inform and improve operational decision-making

20

https://www.turing.ac.uk/research/research- projects/safety-offshore-floating-facilities

○ Industry Question:

  • i. What will be the maximum amplitude of the wave?
  • ○ Oceanographers at UWA have a Partial Differential Equation

solver to model solitons formation and propagation (Korteweg-de Vries equation for continuously stratified fluids). ○ At the Turing, researcher Nick Barlow (former ATLAS experiment) worked with statisticians at UWA to turn this into a probabilistic model, and visualize the output.

slide-21
SLIDE 21

21

Safety of offshore floating facilities: Predicting the hazardous conditions faced by offshore oil and gas facilities, to inform and improve operational decision-making

21

Combining the physics, statistics and computing for industrial impact:

  • i. Probabilistic modeling: Monte

Carlo simulations

  • ii. Computationally demanding:

Parallel, distributed and cloud computing

  • iii. Software development:

Necessary for industrial uptake

https://www.turing.ac.uk/research/research- projects/safety-offshore-floating-facilities

slide-22
SLIDE 22

Conclusion

  • The tools you have learnt and the statistical knowledge

you are aware of can be used in different areas

  • Keep an eye on the technologies advancing in the world:

○ Physics ○ Computer Science ○ Governments ○ Finance ○ Business

  • Interdisciplinarity is in the essence of Data Science.

Review the work done on different areas, it can inspire and drive your own study and research.

22

slide-23
SLIDE 23

Free data science courses

  • Coursera course on Data

Sciencehttps://www.coursera.org/learn/data-scientists-tools

  • Machine learning: Andrew NG Machile learning course on

Standford for free

  • http://datascienceacademy.com/free-data-science-courses/
  • https://www.codecademy.com/ Free coding courses

23