Reminders: Code can be found on github.com/jackel119/python102 - - PowerPoint PPT Presentation

reminders
SMART_READER_LITE
LIVE PREVIEW

Reminders: Code can be found on github.com/jackel119/python102 - - PowerPoint PPT Presentation

presents Intermediate Python Reminders: Code can be found on github.com/jackel119/python102 Slides on docsoc.co.uk/education Today well be looking at more numpy, pandas, matplotlib, and a little bit of machine learning/AI!


slide-1
SLIDE 1

Intermediate Python

presents

slide-2
SLIDE 2

Reminders:

2

  • Code can be found on github.com/jackel119/python102
  • Slides on docsoc.co.uk/education
  • Today we’ll be looking at more numpy, pandas, matplotlib,

and a little bit of machine learning/AI!

slide-3
SLIDE 3

(Recap) NumPy:

3

  • Has a very powerful N-dimensional array object

○ Fast ○ Easy to generate ○ Can enforce types ○ Has TONS of useful methods/operations

  • Linear Algebra (and Matrix operations) support
  • Other useful functions as well
slide-4
SLIDE 4

Enter Pandas:

4

  • Series object, similar to 1-D Numpy Array (actually built on top of it)
  • DataFrame object, which represents a table

○ Has column names (which are accessible) ○ Row accessible ○ Again, LOTS of features

  • Lots of other useful datatypes (dates, times, etc)
  • Combined with Numpy, has anything and everything you will ever

need for data processing

slide-5
SLIDE 5

What about visualising data?

5

slide-6
SLIDE 6

What about visualising data? We have matplotlib

6

slide-7
SLIDE 7

Numpy, Pandas, MatPlotLib

7

  • Has endless amount of features and functionality

○ If it’s something you want to do, they’ve got it ○ Would take FOREVER to cover everything in class

  • Almost every data/maths related library in Python is compatible or

built on top of these

slide-8
SLIDE 8

Numpy, Pandas, MatPlotLib

8

  • Has endless amount of features and functionality

○ If it’s something you want to do, they’ve got it ○ Would take FOREVER to cover everything in class

  • Almost every data/maths related library in Python is compatible or

built on top of these

  • Note: this is meant to serve as an introduction to these libraries, not an

end-all-be-all. There are an endless amount of tutorials and documentation on the internet, and you should all be at a point where you can make use of them if you so wish.

slide-9
SLIDE 9

A more sophisticated demo

9

slide-10
SLIDE 10

A more sophisticated demo

with a bit of machine learning!

10

slide-11
SLIDE 11

Drug Use Dataset

11

  • We have some (a lot) of data of 1885 people, and for each of them:

○ Age, gender, ethnicity, country, personality traits ○ Their consumption of legal substances e.g. chocolate, nicotine, alcohol, etc… ○ Their consumption of illegal drugs, as well as an overall ‘severity’ score, etc

slide-12
SLIDE 12

Drug Use Dataset

12

  • We have some (a lot) of data of 1885 people, and for each of them:

○ Age, gender, ethnicity, country, personality traits ○ Their consumption of legal substances e.g. chocolate, nicotine, alcohol, etc… ○ Their consumption of illegal drugs, as well as an overall ‘severity’ score, etc

  • Let’s explore the data!
slide-13
SLIDE 13

Exploratory Data Analysis

13

  • What can we learn from the data?

○ Correlation between how much someone likes chocolate vs how much they drink? Or nicotine (smoking) and coffee? ○ Age and drug use? ○ Certain countries do more drugs?

  • Many of you have done R in a scientific concept before - the concept

is the same here!

slide-14
SLIDE 14

Onto the machine learning bit!

14

slide-15
SLIDE 15

Machine Learning Demo

15

  • This is meant to show you how/what Python can do in the domain of

machine learning.

  • You will NOT be an expert after this, and may not understand every

single thing.

  • However, you should be able to follow along most of it, and

appreciate the power of Python in machine learning.

  • If you want to learn more/do some yourself, there’s plenty of great

tutorials from the internet!

slide-16
SLIDE 16

Machine Learning Intro

16

  • Supervised Learning:

○ Train a model to be able to predict/identify things, i.e. there are ‘right or wrong’ answers - called labeled data.

  • Unsupervised Learning:

○ Given some data, have a model tell us about the structure, arrangement of the data, etc..

  • Reinforcement Learning:

○ Train a model to make decisions, play games, etc.

slide-17
SLIDE 17

Machine Learning Intro

17

  • Supervised Learning:

○ Train a model to be able to predict/identify things, i.e. there are ‘right or wrong’ answers - called labeled data.

  • Unsupervised Learning:

○ Given some data, have a model tell us about the structure, arrangement of the data, etc..

  • Reinforcement Learning:

○ Train a model to make decisions, play games, etc.

slide-18
SLIDE 18

Supervised Learning

18

  • We might be interested in:

○ Given all the data in our dataset apart from druguse (age, gender, personality, chocolate…), can we predict if someone is a drug user? ■ Or how severe their drug usage is? ○ What about predicting personality traits from other features (drug use, age, country, alcohol, nicotine…)?

slide-19
SLIDE 19

Supervised Learning

19

  • We might be interested in:

○ Given all the data in our dataset apart from druguse (age, gender, personality, chocolate…), can we predict if someone is a drug user? ■ Or how severe their drug usage is? ○ What about predicting personality traits from other features (drug use, age, country, alcohol, nicotine…)?

slide-20
SLIDE 20

1.) Prepare Data 2.) Create, train, and use the model

20

slide-21
SLIDE 21

Data Preparation

21

  • (Machine Learning/Statistical) Models rely on

Maths, and being able to perform calculation

  • n numbers.
  • Can you see a problem with our data right

now?

slide-22
SLIDE 22

Data Preparation

22

  • What does “Male” or “Female” mean?
  • Or “UK”, “US”?
  • The string “18-24” doesn’t mean anything

either!

slide-23
SLIDE 23

Data Preparation

23

  • What does “Male” or “Female” mean?
  • Or “UK”, “US”?
  • The string “18-24” doesn’t mean anything

either!

  • We have to encode this data numerically!
slide-24
SLIDE 24

Data Encoding Approach #1

24

  • Change strings to numerical values e.g.

○ Male = 1, Female = 0 (or vice versa) ○ “18-24” -> 21 (mean), same with other ages,

  • r we might just use 1, 2, 3, 4….
slide-25
SLIDE 25

Data Encoding Approach #1

25

  • Change strings to numerical values e.g.

○ Male = 1, Female = 0 (or vice versa) ○ “18-24” -> 21 (mean), same with other ages,

  • r we might just use 1, 2, 3, 4….

○ UK = 1, US = 2, Canada = 3, Other = 4

slide-26
SLIDE 26

Data Encoding Approach #1

26

  • Change strings to numerical values e.g.

○ Male = 1, Female = 0 (or vice versa) ○ “18-24” -> 21 (mean), same with other ages,

  • r we might just use 1, 2, 3, 4….

○ UK = 1, US = 2, Canada = 3, Other = 4 ■ What’s wrong with this?

slide-27
SLIDE 27

Data Encoding Approach #1

27

UK = 1, US = 2, Canada = 3, Other = 4

  • This implies US > UK, Canada > US, that

there is an ordering of some sort

  • This could mislead our model
slide-28
SLIDE 28

Data Encoding Approach #2

28

“One Hot Encode”

slide-29
SLIDE 29

Data Encoding Approach #2

29

“One Hot Encode” Traits are now independent, and there is no implied order/hierarchy.

slide-30
SLIDE 30

Data Preparation

30

  • We can think of any supervised learning model as trying to

estimate a function f(x) = y. ○ x is our predictor(s), usually a vector ○ y is the value(s) we want to predict

slide-31
SLIDE 31

Data Preparation

31

  • Think of a student trying to learn a course purely by doing

exam papers ○ The first attempt is “blind” - then the student checks his/her answers with the real answers, and that is how they learn. Called “training”.

slide-32
SLIDE 32

Data Preparation

32

  • Think of a student trying to learn a course purely by doing

exam papers ○ We now want to evaluate how well the student has

  • learned. Obviously, if we use the same exam paper, the

student already knows the answers to this. Since we are testing how well a student has learned the course, we would give him/her an unseen paper. This is ”test data”.

slide-33
SLIDE 33

Data Preparation

33

  • Think of a student trying to learn a course purely by doing

exam papers ○ In other words, how well does the model we train generalize to data it hasn’t seen before?

slide-34
SLIDE 34

Data Preparation

34

  • Therefore, we need 4 sets of data:

○ train_x ○ train_y ○ test_x ○ test_y

slide-35
SLIDE 35

Data Preparation

35

  • Therefore, we need 4 sets of data:

○ train_x Training ○ train_y Training ○ test_x We make test predictions on this -> pred_y ○ test_y We compare our pred_y to this to evaluate

slide-36
SLIDE 36

Data Preparation

36

  • Therefore, we need 4 sets of data:

○ train_x Training ○ train_y Training ○ test_x We make test predictions on this -> pred_y ○ test_y We compare our pred_y to this to evaluate ○ Train:test split usually around 80:20 or 90:10

slide-37
SLIDE 37

Neural Networks

37

  • What you hear about on the news - Deep Learning, AlphaGo,

etc...

  • “Cool”
  • Usually requires lots of computational power
slide-38
SLIDE 38

Classical Models

38

  • Lots of different types:

○ Linear Regression, Logistic Regression, Decision Trees, Random Forests, Matrix Factorization, K-Means Clustering

  • “Old” Machine Learning
  • Not as computationally expensive, and still usually VERY

good results!

slide-39
SLIDE 39

Decision Trees

39

  • Algorithms to generate

decision trees based on “information entropy” (what can we find out with a true/false question?).

  • In real cases, the questions

are “is feature N > value?”

slide-40
SLIDE 40

Random Forests

40

Basically….a lot of trees that vote on what y (the prediction) should be!

slide-41
SLIDE 41

Quick note about types used in models

41

  • Different Machine Learning libraries will support different

types, have different functions, arguments (the interface!).

  • Most, if not all, support input/output as Numpy arrays!
  • We will first look at Sci-kit-learn.
  • `pip install sklearn`
slide-42
SLIDE 42

Model #1

42

from sklearn.ensemble import RandomForestRegressor rf = RandomForestRegressor() rf.fit(train_x, train_y) pred_y = rf.predict(test_y) # Now compare pred_y and actual test_y

slide-43
SLIDE 43

Looking at and evaluating results

43

  • Can just look and compare each prediction and actual result

individually

  • Metrics like Root Mean Squared Error, Mean Absolute Error,

Percentage Error, etc…. ○ Mostly basic statistics

slide-44
SLIDE 44

Neural Networks Explained (Very quickly!)

44

  • Consists of Layers of “Neurons”, which take in numerical vectors
slide-45
SLIDE 45

Neural Networks Explained (Very quickly!)

45

slide-46
SLIDE 46

Neural Networks Explained (Very quickly!)

46

  • If you want to know more:

○ https://blog.goodaudience.com/artificial-neural-netwo rks-explained-436fcf36e75 ○ http://neuralnetworksanddeeplearning.com/chap1.htm l ○ Lots of good resources online! ○ “Optimizers”, “Loss function”, “Activation Function”, etc….

slide-47
SLIDE 47

Model #2

47

from keras.models import Sequential from keras.layers.core import Dense, Activation model = Sequential() model.add(Dense(32, input_shape=(28,), activation='relu')) model.add(Dense(64, activation='relu')) model.add(Dense(1)) model.compile(optimizer='adam', loss='mean_squared_error') model.fit(train_x, train_y, nb_epoch=100, batch_size=1) pred_y = model.predict(test_x).reshape(len(test_x))

slide-48
SLIDE 48

Thanks for coming!

48

  • Next week:

○ HTTP Requests, web servers, scripting… ○ Possibly more!

  • jackel119/python102 on Github for code and csv data
  • docsoc.co.uk/education for slides