What is an anomaly? Alastair Rushworth Data Scientist DataCamp - - PowerPoint PPT Presentation

what is an anomaly
SMART_READER_LITE
LIVE PREVIEW

What is an anomaly? Alastair Rushworth Data Scientist DataCamp - - PowerPoint PPT Presentation

DataCamp Anomaly Detection in R ANOMALY DETECTION IN R What is an anomaly? Alastair Rushworth Data Scientist DataCamp Anomaly Detection in R Defining the term anomaly Anomaly: a data point or collection of data points that do not follow the


slide-1
SLIDE 1

DataCamp Anomaly Detection in R

What is an anomaly?

ANOMALY DETECTION IN R

Alastair Rushworth

Data Scientist

slide-2
SLIDE 2

DataCamp Anomaly Detection in R

Defining the term anomaly

Anomaly: a data point or collection of data points that do not follow the same pattern or have the same structure as the rest of the data

slide-3
SLIDE 3

DataCamp Anomaly Detection in R

Point anomaly

A single data point Unusual when compared to the rest of the data Example: A single 30C daily high temperature among a set of ordinary spring days

summary(temperature)

  • Min. 1st Qu. Median Mean 3rd Qu. Max.

18.00 20.45 22.45 22.30 22.98 30.00

slide-4
SLIDE 4

DataCamp Anomaly Detection in R

Visualizing point anomalies with a boxplot

boxplot(temperature, ylab = "Celsius")

slide-5
SLIDE 5

DataCamp Anomaly Detection in R

Collective anomaly

An anomalous collection of data instances Unusual when considered together Example: 10 consecutive high daily temperatures

slide-6
SLIDE 6

DataCamp Anomaly Detection in R

Let's practice!

ANOMALY DETECTION IN R

slide-7
SLIDE 7

DataCamp Anomaly Detection in R

Testing the extremes with Grubbs' test

ANOMALY DETECTION IN R

Alastair Rushworth

Data Scientist

slide-8
SLIDE 8

DataCamp Anomaly Detection in R

Visual assessment is not always reliable!

boxplot(temperature, ylab = "Celsius")

slide-9
SLIDE 9

DataCamp Anomaly Detection in R

Grubbs' test

Statistical test to decide if a point is outlying Assumes the data are normally distributed Requires checking the normality assumption first

slide-10
SLIDE 10

DataCamp Anomaly Detection in R

Checking normality with a histogram

Symmetrical & bell shaped?

hist(temperature, breaks = 6)

slide-11
SLIDE 11

DataCamp Anomaly Detection in R

Running Grubbs' test

Use the grubbs.test() function:

grubbs.test(temperature) Grubbs test for one outlier data: temp G = 3.07610, U = 0.41065, p-value = 0.001796 alternative hypothesis: highest value 30 is an outlier

slide-12
SLIDE 12

DataCamp Anomaly Detection in R

Interpreting the p-value

p-value Near 0 - stronger evidence of an outlier Near 1 - weaker evidence of an outlier

grubbs.test(temperature) Grubbs test for one outlier data: temperature G = 3.07610, U = 0.41065, p-value = 0.001796 alternative hypothesis: highest value 30 is an outlier

slide-13
SLIDE 13

DataCamp Anomaly Detection in R

Get the row index of an outlier

Location of the maximum Location of the minimum

which.max(weights) [1] 5 which.min(temperature) [1] 12

slide-14
SLIDE 14

DataCamp Anomaly Detection in R

Let's practice!

ANOMALY DETECTION IN R

slide-15
SLIDE 15

DataCamp Anomaly Detection in R

Detecting multiple anomalies in seasonal time series

ANOMALY DETECTION IN R

Alastair Rushworth

Data Scientist

slide-16
SLIDE 16

DataCamp Anomaly Detection in R

Monthly revenue data

Grubbs' test not appropriate here Seasonality may be present May be multiple anomalies

head(msales) sales month 1 6.068 1 2 5.966 2 3 6.133 3 4 6.230 4 5 6.407 5 6 6.433 6

slide-17
SLIDE 17

DataCamp Anomaly Detection in R

Visualizing monthly revenue

plot(sales ~ month, data = msales, type = 'o')

slide-18
SLIDE 18

DataCamp Anomaly Detection in R

Seasonal-Hybrid ESD algorithm usage

Arguments

x: vector of values period: period of repeating pattern direction: find anomalies that are small ('neg'), large ('pos'), or both ('both')

  • ฀ Download from

library(AnomalyDetection) sales_ad <- AnomalyDetectionVec(x = msales$sales, period = 12, direction = 'both')

https://github.com/twitter/AnomalyDetection

slide-19
SLIDE 19

DataCamp Anomaly Detection in R

Seasonal-Hybrid ESD algorithm output

sales_ad <- AnomalyDetectionVec(x = msales$sales, period = 12, direction = 'both') sales_ad$anoms index anoms 1 14 1.561 2 108 2.156

slide-20
SLIDE 20

DataCamp Anomaly Detection in R

Seasonal-Hybrid ESD algorithm plot

AnomalyDetectionVec(x = msales$sales, period = 12, direction = 'both', plot = T)

slide-21
SLIDE 21

DataCamp Anomaly Detection in R

Let's practice!

ANOMALY DETECTION IN R