Classification and feat u re engineering MAC H IN E L E AR N IN G - PowerPoint PPT Presentation

Classification and feat u re engineering MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

Al w a y s v is u ali z e ra w data before fitting models MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Vis u ali z e y o u r timeseries data ! ixs = np.arange(audio.shape[-1]) time = ixs / sfreq fig, ax = plt.subplots() ax.plot(time, audio) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

What feat u res to u se ? Using ra w timeseries data is too nois y for classi � cation We need to calc u late feat u res ! An eas y start : s u mmari z e y o u r a u dio data MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating m u ltiple feat u res print(audio.shape) # (n_files, time) (20, 7000) means = np.mean(audio, axis=-1) maxs = np.max(audio, axis=-1) stds = np.std(audio, axis=-1) print(means.shape) # (n_files,) (20,) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Fitting a classifier w ith scikit - learn We 'v e j u st collapsed a 2- D dataset ( samples x time ) into se v eral feat u res of a 1- D dataset ( samples ) We can combine each feat u re , and u se it as an inp u t to a model If w e ha v e a label for each sample , w e can u se scikit - learn to create and � t a classi � er MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Preparing y o u r feat u res for scikit - learn # Import a linear classifier from sklearn.svm import LinearSVC # Note that means are reshaped to work with scikit-learn X = np.column_stack([means, maxs, stds]) y = labels.reshape([-1, 1]) model = LinearSVC() model.fit(X, y) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Scoring y o u r scikit - learn model from sklearn.metrics import accuracy_score # Different input data predictions = model.predict(X_test) # Score our model with % correct # Manually percent_score = sum(predictions == labels_test) / len(labels_test) # Using a sklearn scorer percent_score = accuracy_score(labels_test, predictions) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Let ' s practice ! MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON

Impro v ing the feat u res w e u se for classification MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

The a u ditor y en v elope Smooth the data to calc u late the a u ditor y en v elope Related to the total amo u nt of a u dio energ y present at each moment of time MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Smoothing o v er time Instead of a v eraging o v er all time , w e can do a local a v erage This is called smoothing y o u r timeseries It remo v es short - term noise , w hile retaining the general pa � ern MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Smoothing y o u r data MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating a rolling w indo w statistic # Audio is a Pandas DataFrame print(audio.shape) # (n_times, n_audio_files) (5000, 20) # Smooth our data by taking the rolling mean in a window of 50 samples window_size = 50 windowed = audio.rolling(window=window_size) audio_smooth = windowed.mean() MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating the a u ditor y en v elope First rectif y y o u r a u dio , then smooth it audio_rectified = audio.apply(np.abs) audio_envelope = audio_rectified.rolling(50).mean() MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Feat u re engineering the en v elope # Calculate several features of the envelope, one per sound envelope_mean = np.mean(audio_envelope, axis=0) envelope_std = np.std(audio_envelope, axis=0) envelope_max = np.max(audio_envelope, axis=0) # Create our training data for a classifier X = np.column_stack([envelope_mean, envelope_std, envelope_max]) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Preparing o u r feat u res for scikit - learn X = np.column_stack([envelope_mean, envelope_std, envelope_max]) y = labels.reshape([-1, 1]) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Cross v alidation for classification cross_val_score a u tomates the process of : Spli � ing data into training / v alidation sets Fi � ing the model on training data Scoring it on v alidation data Repeating this process MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Using cross _v al _ score from sklearn.model_selection import cross_val_score model = LinearSVC() scores = cross_val_score(model, X, y, cv=3) print(scores) [0.60911642 0.59975305 0.61404035] MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

A u ditor y feat u res : The Tempogram We can s u mmari z e more comple x temporal information w ith timeseries - speci � c f u nctions librosa is a great librar y for a u ditor y and timeseries feat u re engineering Here w e ' ll calc u late the tempogram , w hich estimates the tempo of a so u nd o v er time We can calc u late s u mmar y statistics of tempo in the same w a y that w e can for the en v elope MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Comp u ting the tempogram # Import librosa and calculate the tempo of a 1-D sound array import librosa as lr audio_tempo = lr.beat.tempo(audio, sr=sfreq, hop_length=2**6, aggregate=None) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

The spectrogram - spectral changes to so u nd o v er time MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science

Fo u rier transforms Timeseries data can be described as a combination of q u ickl y- changing things and slo w l y- changing things At each moment in time , w e can describe the relati v e presence of fast - and slo w- mo v ing components The simplest w a y to do this is called a Fo u rier Transform This con v erts a single timeseries into an arra y that describes the timeseries as a combination of oscillations MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

A Fo u rier Transform ( FFT ) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Spectrograms : combinations of w indo w s Fo u rier transforms A spectrogram is a collection of w indo w ed Fo u rier transforms o v er time Similar to ho w a rolling mean w as calc u lated : 1. Choose a w indo w si z e and shape 2. At a timepoint , calc u late the FFT for that w indo w 3. Slide the w indo w o v er b y one 4. Aggregate the res u lts Called a Short - Time Fo u rier Transform ( STFT ) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating the STFT We can calc u late the STFT w ith librosa There are se v eral parameters w e can t w eak ( s u ch as w indo w si z e ) For o u r p u rposes , w e ' ll con v ert into decibels w hich normali z es the a v erage v al u es of all freq u encies We can then v is u ali z e it w ith the specshow() f u nction MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating the STFT w ith code # Import the functions we'll use for the STFT from librosa.core import stft, amplitude_to_db from librosa.display import specshow # Calculate our STFT HOP_LENGTH = 2**4 SIZE_WINDOW = 2**7 audio_spec = stft(audio, hop_length=HOP_LENGTH, n_fft=SIZE_WINDOW) # Convert into decibels for visualization spec_db = amplitude_to_db(audio_spec) # Visualize specshow(spec_db, sr=sfreq, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Spectral feat u re engineering Each timeseries has a di � erent spectral pa � ern . We can calc u late these spectral pa � erns b y anal yz ing the spectrogram . For e x ample , spectral band w idth and spectral centroids describe w here most of the energ y is at each moment in time MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Calc u lating spectral feat u res # Calculate the spectral centroid and bandwidth for the spectrogram bandwidths = lr.feature.spectral_bandwidth(S=spec)[0] centroids = lr.feature.spectral_centroid(S=spec)[0] # Display these features on top of the spectrogram ax = specshow(spec, x_axis='time', y_axis='hz', hop_length=HOP_LENGTH) ax.plot(times_spec, centroids) ax.fill_between(times_spec, centroids - bandwidths / 2, centroids + bandwidths / 2, alpha=0.5) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Combining spectral and temporal feat u res in a classifier centroids_all = [] bandwidths_all = [] for spec in spectrograms: bandwidths = lr.feature.spectral_bandwidth(S=lr.db_to_amplitude(spec)) centroids = lr.feature.spectral_centroid(S=lr.db_to_amplitude(spec)) # Calculate the mean spectral bandwidth bandwidths_all.append(np.mean(bandwidths)) # Calculate the mean spectral centroid centroids_all.append(np.mean(centroids)) # Create our X matrix X = np.column_stack([means, stds, maxs, tempo_mean, tempo_max, tempo_std, bandwidths_all, centroids_all]) MACHINE LEARNING FOR TIME SERIES DATA IN PYTHON

Classification and feat u re engineering MAC H IN E L E AR N IN G - PowerPoint PPT Presentation

Classification and feat u re engineering MAC H IN E L E AR N IN G FOR TIME SE R IE S DATA IN P YTH ON Chris Holdgraf Fello w, Berkele y Instit u te for Data Science Al w a y s v is u ali z e ra w data before fitting models MACHINE LEARNING

Feat u re engineering P R E P R OC E SSIN G FOR MAC H IN E L E AR N IN G IN P YTH ON Sarah G

Feat u re e x traction D IME N SION AL ITY R E D U C TION IN P YTH ON Jeroen Boe y e Machine

Feat u re Generation FE ATU R E E N G IN E E R IN G W ITH P YSPAR K John Hog u e Lead Data

Feat ure Select ion using/ f or Feat ure Select ion using/ f or Transduct ive ransduct ive S

Wh y generate feat u res ? FE ATU R E E N G IN E E R IN G FOR MAC H IN E L E AR N IN G IN P

Regression : feat u re selection P R AC TIC IN G MAC H IN E L E AR N IN G IN TE R VIE W QU E

Feat u re crossing FE ATU R E E N G IN E E R IN G IN R Jose Hernande z Data Scientist , Uni v

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

Management of Classification Lookup Files The basics of classification The basics of

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Library of Congress Classification: Module 1.3 1 Library of Congress Classification: Module 1.3

Classification K-nearest neighbor classification D istance functions Choice of k Choice of k

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Signals and Systems Fall 2003 Lecture #6 23 September 2003 1. CT Fourier series reprise,

GCT535- Sound Technology for Multimedia Fourier Representations of Audio Graduate School of

The Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

02/04/1439 Chapter 3

Chapter 14: Fourier Transforms and Boundary Value Problems in an Unbounded Region Department of

Fourier Transform Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical

Fourier transform for nilpotent Lie groups Index sets and representations Granada Index sets

Nature is an infinite sphere of which the center is everywhere and the circumference nowhere.