Math Science and Machines ... or what I wish Id known when I was - - PowerPoint PPT Presentation

math science and machines
SMART_READER_LITE
LIVE PREVIEW

Math Science and Machines ... or what I wish Id known when I was - - PowerPoint PPT Presentation

Overview Science Data Acquisition Math Machines Conclusion Math Science and Machines ... or what I wish Id known when I was younger Jaroslav Vn Masaryk University / Astronomical Institute / Gauss Algorithmic / 4comfort.cz 5.


slide-1
SLIDE 1

Overview Science Data Acquisition Math Machines Conclusion

Math Science and Machines

... or what I wish I’d known when I was younger Jaroslav Vážný

Masaryk University / Astronomical Institute / Gauss Algorithmic / 4comfort.cz

  • 5. prosince 2013

Jaroslav Vážný Practical approach

slide-2
SLIDE 2

Overview Science Data Acquisition Math Machines Conclusion

Concepts introduced in this talk

Math Science and Machines What to study Machines What is Big Data anyway? Data Mining Machine Learning Examples Unspurevised Dimensionality reduction Supervised HW/SW analogy Computer literacy? Education MOOC = new era? Math Examples Probability Science Paradigm shift? Reproducibility What is Science? Jaroslav Vážný Practical approach

slide-3
SLIDE 3

Overview Science Data Acquisition Math Machines Conclusion

What is Science?

Jaroslav Vážný Practical approach

slide-4
SLIDE 4

Overview Science Data Acquisition Math Machines Conclusion

Why is important?

∇ · E = 0 ∇ × E = −∂B ∂t , (1) ∇ · B = 0 ∇ × B = 1 c2 ∂E ∂t . (2)

Jaroslav Vážný Practical approach

slide-5
SLIDE 5

Overview Science Data Acquisition Math Machines Conclusion

Why is important?

∇ · E = 0 ∇ × E = −∂B ∂t , (1) ∇ · B = 0 ∇ × B = 1 c2 ∂E ∂t . (2) Miracle happen c =

1 √µ0ε0 = 2.99792458 × 108 m s−1

Jaroslav Vážný Practical approach

slide-6
SLIDE 6

Overview Science Data Acquisition Math Machines Conclusion

Reproducibility

http://jakevdp.github.io/blog/2013/10/26/ big-data-brain-drain/ http://nbviewer.ipython.org/ http://pdos.csail.mit.edu/scigen/ ;-)

Jaroslav Vážný Practical approach

slide-7
SLIDE 7

Overview Science Data Acquisition Math Machines Conclusion

Data Avalanche?

Large Synoptic Survey Telescope

20 TB per night 60 PB for the raw data (after 10 years) 15 PB for the catalog database The total data volume after processing will be several hundred PB CERN 1 PB per day

Jaroslav Vážný Practical approach

slide-8
SLIDE 8

Overview Science Data Acquisition Math Machines Conclusion

Sloan Digital Sky Survey

Why is it important?

Lots of data (>106 objects) Perfect documentation Tools to access the data

Where I can learn it?

http://www.sdss3.org/

Jaroslav Vážný Practical approach

slide-9
SLIDE 9

Overview Science Data Acquisition Math Machines Conclusion

Virtual Observatory

Why is it important?

Uniform access to astronomy data Based on Web standards Many tools with vo support (Topcat, Aladin, Tapsh)

Where I can learn it?

http://physics.muni.cz/~vazny/wiki/index.php/ Diploma_work

Jaroslav Vážný Practical approach

slide-10
SLIDE 10

Overview Science Data Acquisition Math Machines Conclusion

Probability

Test your intuition! Roll dice. 5 times you got 6. What is P(6)=? Monty Hall problem Show examples in IPython! 1 2

? ?

Jaroslav Vážný Practical approach

slide-11
SLIDE 11

Overview Science Data Acquisition Math Machines Conclusion

MOOC == new era?

https://www.khanacademy.org/ https://www.coursera.org/ https://www.udacity.com/ https://www.edx.org/

Jaroslav Vážný Practical approach

slide-12
SLIDE 12

Overview Science Data Acquisition Math Machines Conclusion

What is

Machine Learning (Data astrology) Data Mining Artificial Inteligence

Jaroslav Vážný Practical approach

slide-13
SLIDE 13

Overview Science Data Acquisition Math Machines Conclusion

Supervised Machine Learning

Training Text, Documents, Images, etc. Feature Vectors Machine Learning Algorithm New Text, Document, Image, etc. Feature Vector

Predictive Model

Labels Expected Label

Supervised Learning Model

Jaroslav Vážný Practical approach

slide-14
SLIDE 14

Overview Science Data Acquisition Math Machines Conclusion

Unsupervised Machine Learning

Training Text, Documents, Images, etc. Feature Vectors Machine Learning Algorithm New Text, Document, Image, etc. Feature Vector

Predictive Model Likelihood

  • r Cluster ID
  • r Better

Representation

Unsupervised Learning Model

Jaroslav Vážný Practical approach

slide-15
SLIDE 15

Overview Science Data Acquisition Math Machines Conclusion

Example of feature extraction

Jaroslav Vážný Practical approach

slide-16
SLIDE 16

Overview Science Data Acquisition Math Machines Conclusion

Example: Decison Tree

1

ug <= 0.663668

2

| gr <= -0.191208: 1 (7.0)

3

| gr > -0.191208: 3 (104.0/5.0)

4

ug > 0.663668

5

| ri <= 0.285854: 1 (88.0/5.0)

6

| ri > 0.285854

7

| | ri <= 0.314657

8

| | | gr <= 0.692108: 2 (6.0)

9

| | | gr > 0.692108: 1 (3.0)

10

| | ri > 0.314657: 2 (90.0/2.0)

Jaroslav Vážný Practical approach

slide-17
SLIDE 17

Overview Science Data Acquisition Math Machines Conclusion

Example: Suport Vector Machine

Jaroslav Vážný Practical approach

slide-18
SLIDE 18

Overview Science Data Acquisition Math Machines Conclusion

References

http://ipython.org/ http://www.greenteapress.com/thinkstats/ http://www.greenteapress.com/thinkpython/ http://scikit-learn.org/stable/ http://pandas.pydata.org/ http://jakevdp.github.io/ blog/2013/10/26/big-data-brain-drain/ http://www.galaxyzoo.org/ http://www.planethunters.org/ http://www.sdss3.org/

Jaroslav Vážný Practical approach

slide-19
SLIDE 19

Overview Science Data Acquisition Math Machines Conclusion

Discussion

Jaroslav Vážný Practical approach