timeseries kinds and applications
play

Timeseries kinds and applications MAC H IN E L E AR N IN G FOR - PowerPoint PPT Presentation

Timeseries kinds and applications MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science Time Series MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON Time Series


  1. Timeseries kinds and applications MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

  2. Time Series MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  3. Time Series MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  4. What makes a time series ? Datapoint Datapoint Datapoint Datapoint Datapoint Datapoint 1 34 12 54 76 40 Timepoint Timepoint Timepoint Timepoint Timepoint Timepoint 2:00 2:01 2:02 2:03 2:04 2:05 Timepoint Timepoint Timepoint Timepoint Timepoint Timepoint Jan Feb March April Ma y J u n Timepoint Timepoint Timepoint Timepoint Timepoint Timepoint MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  5. Reading in a time series w ith Pandas import pandas as pd import matplotlib.pyplot as plt data = pd.read_csv('data.csv') data.head() date symbol close volume 0 2010-01-04 AAPL 214.009998 123432400.0 46 2010-01-05 AAPL 214.379993 150476200.0 92 2010-01-06 AAPL 210.969995 138040000.0 138 2010-01-07 AAPL 210.580000 119282800.0 184 2010-01-08 AAPL 211.980005 111902700.0 MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  6. Plotting a pandas timeseries import matplotlib.pyplot as plt fig, ax = plt.subplots(figsize=(12, 6)) data.plot('date', 'close', ax=ax) ax.set(title="AAPL daily closing price") MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  7. A timeseries plot MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  8. Wh y machine learning ? We can u se reall y big data and reall y complicated data MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  9. Wh y machine learning ? We can ... Predict the f u t u re A u tomate this process MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  10. Wh y combine these t w o ? MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  11. A machine learning pipeline Feat u re e x traction Model � � ing Prediction and v alidation MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  12. Let ' s practice ! MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON

  13. Machine learning basics MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

  14. Al w a y s begin b y looking at y o u r data array.shape (10, 5) array[:3] array([[ 0.735528 , 1.00122818, -0.28315978], [-0.94478393, 0.18658748, -0.00241224], [-0.74822942, -1.46636618, 0.69835096]]) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  15. Al w a y s begin b y looking at y o u r data df.head() col1 col2 col3 0 0.735528 1.001228 -0.283160 1 -0.944784 0.186587 -0.002412 2 -0.748229 -1.466366 0.698351 3 1.038589 -0.171248 0.831457 4 -0.161904 0.003972 -0.321933 MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  16. Al w a y s v is u ali z e y o u r data Make s u re it looks the w a y y o u' d e x pect . # Using matplotlib fig, ax = plt.subplots() ax.plot(...) # Using pandas fig, ax = plt.subplots() df.plot(..., ax=ax) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  17. Scikit - learn Scikit - learn is the most pop u lar machine learning librar y in P y thon from sklearn.svm import LinearSVC MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  18. Preparing data for scikit - learn scikit-learn e x pects a partic u lar str u ct u re of data : (samples, features) Make s u re that y o u r data is at least t w o - dimensional Make s u re the � rst dimension is samples MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  19. If y o u r data is not shaped properl y If the a x es are s w apped : array.T.shape (10, 3) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  20. If y o u r data is not shaped properl y If w e ' re missing an a x is , u se .reshape() : array.shape (10,) array.reshape([-1, 1]).shape (10, 1) -1 w ill a u tomaticall y � ll that a x is w ith remaining v al u es MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  21. Fitting a model w ith scikit - learn # Import a support vector classifier from sklearn.svm import LinearSVC # Instantiate this model model = LinearSVC() # Fit the model on some data model.fit(X, y) It is common for y to be of shape (samples, 1) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  22. In v estigating the model # There is one coefficient per input feature model.coef_ array([[ 0.69417875, -0.5289162 ]]) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  23. Predicting w ith a fit model # Generate predictions predictions = model.predict(X_test) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  24. Let ' s practice MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON

  25. Combining timeseries data w ith machine learning MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

  26. Getting to kno w o u r data The datasets that w e ' ll u se in this co u rse are all freel y- a v ailable online There are man y datasets a v ailable to do w nload on the w eb , the ones w e ' ll u se come from Kaggle MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  27. The Heartbeat Aco u stic Data Man y recordings of heart so u nds from di � erent patients Some had normall y- f u nctioning hearts , others had abnormalities Data comes in the form of a u dio � les + labels for each � le Can w e � nd the " abnormal " heart beats ? MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  28. Loading a u ditor y data from glob import glob files = glob('data/heartbeat-sounds/files/*.wav') print(files) ['data/heartbeat-sounds/proc/files/murmur__201101051104.wav', ... 'data/heartbeat-sounds/proc/files/murmur__201101051114.wav'] MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  29. Reading in a u ditor y data import librosa as lr # `load` accepts a path to an audio file audio, sfreq = lr.load('data/heartbeat-sounds/proc/files/murmur__201101051104.wav') print(sfreq) 2205 In this case , the sampling freq u enc y is 2205 , meaning there are 2205 samples per second MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  30. Inferring time from samples If w e kno w the sampling rate of a timeseries , then w e kno w the timestamp of each datapoint relati v e to the � rst datapoint Note : this ass u mes the sampling rate is �x ed and no data points are lost MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  31. Creating a time arra y ( I ) Create an arra y of indices , one for each sample , and di v ide b y the sampling freq u enc y indices = np.arange(0, len(audio)) time = indices / sfreq MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  32. Creating a time arra y ( II ) Find the time stamp for the N -1 th data point . Then u se linspace() to interpolate from z ero to that time final_time = (len(audio) - 1) / sfreq time = np.linspace(0, final_time, sfreq) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  33. The Ne w York Stock E x change dataset This dataset consists of compan y stock v al u es for 10 y ears Can w e detect an y pa � erns in historical records that allo w u s to predict the v al u e of companies in the f u t u re ? MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  34. Looking at the data data = pd.read_csv('path/to/data.csv') data.columns Index(['date', 'symbol', 'close', 'volume'], dtype='object') data.head() date symbol close volume 0 2010-01-04 AAPL 214.009998 123432400.0 1 2010-01-04 ABT 54.459951 10829000.0 2 2010-01-04 AIG 29.889999 7750900.0 3 2010-01-04 AMAT 14.300000 18615100.0 4 2010-01-04 ARNC 16.650013 11512100.0 MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  35. Timeseries w ith Pandas DataFrames We can in v estigate the object t y pe of each col u mn b y accessing the dtypes a � rib u te df['date'].dtypes 0 object 1 object 2 object dtype: object MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  36. Con v erting a col u mn to a time series To ens u re that a col u mn w ithin a DataFrame is treated as time series , u se the to_datetime() f u nction df['date'] = pd.to_datetime(df['date']) df['date'] 0 2017-01-01 1 2017-01-02 2 2017-01-03 Name: date, dtype: datetime64[ns] MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

  37. Let ' s practice ! MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend