SCAN STATISTICS USING FOR ECONOMICAL RESEARCH
- V. Jansons, V. Jurenoks, K. Didenko
Riga Technical University, Latvia – Bulgaria, Yundola - 2008
SCAN STATISTICS USING FOR ECONOMICAL RESEARCH V. Jansons, V. - - PowerPoint PPT Presentation
SCAN STATISTICS USING FOR ECONOMICAL RESEARCH V. Jansons, V. Jurenoks, K. Didenko Riga Technical University, Latvia Bulgaria, Yundola - 2008 Traditional Statistics Methods more Appropriate for Local Investigations Taking Into Account
Riga Technical University, Latvia – Bulgaria, Yundola - 2008
the city. New concepts in urban planning: fusion between
A B C B B B A
neighbourhood;
Defence Troops Defence Troops Local Government Local Government Communicat ion Service Communicat ion Service Medical Service Medical Service Civil Defence Civil Defence Latvian Railway Latvian Railway Fire Fighting Department Fire Fighting Department National Guard National Guard Latvian Air Transport Latvian Air Transport
State Forest Service State Forest Service
Forests in Latvia cover 45 % of the surface of the country The state is the largest forest
control of approximately 50%
activities in Latvian forests must be conducted according to Latvian Forest Law.
are used rather than 2D circular zones.
Total Number of Fires in Latvia in Time
T1 T2
Time
Network Analysis of Biological Integrity in Freshwater Streams
Each sampling stations control water parameters: Bacteria Chlorine levels pH Inorganic and organic pollutants Colour, turbidity, odour Many others
Scalable Wireless Geo- Telemetry with Miniature Smart Sensors
Decisions
Information Retrieval Information Analysis
Data Analysis
Data Integration: Sensors, Time, Location Data Processing:Refinement and Filltering Signal Acquisition From Sensors
Key Forest Areas Forest Outside factors Threat Locations Forest Infected Non-infected Sample Ground Sensors Air/Space Platforms Data from sensors Benchmarking Data Processing - Compare Hot Spot Identification Module Benchmarking Module Verification Decision
detect areas of significantly high or low rates. Indicates
whether there is clustering;
distributed over space, over time or over space and time.
Shows us where it is;
phenomena clusters, to see if they are statistically
surveillance for the early detection of phenomena
detection of clusters;
which the occurrences of a phenomenon within a region are higher than outside it;
those which occurred by chance.
neighbours in search for overdensities;
A circular scanning window is placed at different coordinates with radius that vary from 0 to some set upper limit. For each location and size of window: the statistical criteria (Likelihood Ratio) is computed and the maximum is considered the most likely cluster
Grid points Circles around red point Circles around blew point
(Kulldorff, 1997; Neill and Moore, 2005)
To detect and localize
search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.
To detect and localize
search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.
To detect and localize
for spatial regions where the counts are significantly higher than expected. Imagine moving a space- time window around the scan area, allowing the window size, shape, and duration to vary.
To detect and localize
search for spatial regions where the counts are significantly higher than expected. Imagine moving a space-time window around the scan area, allowing the window size, shape, and duration to vary.
In either case, we find the regions with highest values of a likelihood ratio statistic, and compute the statistical significance of each region by randomization testing. Parametric scan statistic approaches assume some parametric model for the distribution of counts, and learn the parameters from historical data.
Null hypothesisH0: no outbreak Alternative hypothesis H1: outbreak in region S
) | Data Pr( )) ( | Data Pr( ) ( L
1
H S H S =
Significant! (p = 0.01) Maximum region score = 9.5 Not significant! (p = 0.18)
qout = 0.01 qin = 0.02
S In Figure we illustrate a suspicion cluster – region with high level of intensity qin = 0.02 of phenomena. Scan statistic must gives answer – is this cluster real or it is “visual illusion”?
qi = qall everywhere (use maximum likelihood estimate of qall in S);
qi = qin inside region S, qi = qout elsewhere (use maximum likelihood estimates of qin and qout, subject to qin > qout).
Likelihood function is created depending on model selected. Likelihood Function is maximized over all window locations and sizes The one with the maximum likelihood is most likely cluster (least likely to have occurred by chance). Likelihood Ratio for this window becomes maximum likelihood.
Carlo hypothesis testing
distributed under null hypothesis H0, i.e.,
=
n i i w n w
1 1
the same function with parameters unrestricted.
heterogeneous population distribution. We want to find the zone which maximizes the LR (likelihood ratio) between likelihoods L1 and L0:
Z
1
Likelihood ratio takes the following form:
Z st =
In the case of Poisson distributed process, the Likelihood ratio takes the following form:
I n c n c n c LF
tot
in
c tot tot c
c in in i
⋅ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ =
⎪ ⎪ ⎩ ⎪ ⎪ ⎨ ⎧ > =
n c n c if I
in in
1
where
c
n
nin and nout
window; I
No Yes Scanning process with fixed window w Defining the alocation of the factor with high intensities level Modeled p-value with Monte-Carlo for testing null hypothesis Testing null hypothesis - no cluster No clusters Is clusters !!! Repeat this procedure for other window w
least 1000) using the parameters λ0 estimated for that zone, and we obtain a distribution for LR:
false is H H H
1
: : λ λ =
Bernoulli process is a discrete-time stochastic process based on Bernoulli trials An experiment whose outcome is random and can be either of two possible outcomes, “success” and “failure” Values expressed as 0 or 1 (non-cases or cases)
Scan similar to Poisson, visiting each event Likelihood function: C
c
n
N
( ) ( )
( ) ( )
() I n N c C n N n N c C n c n n c
c C n N c C c n c
⋅ ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − − − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ − ⎟ ⎠ ⎞ ⎜ ⎝ ⎛
− − − − −
Summation of points intensities and filtration with some level L
Summation of points intensities:
1
Z
2
Z
3
Z
Hotspot zones at level L (Connected Components of upper level set)
Intensity Region S Level L
Spatially distributed response variables Hotspot analysis Prioritization Decision support systems
Geoinformatic spatio-temporal data from a variety of data products and data sources with agencies, academia, and industry
Masks, filters Indicators, weights Masks, filters
Geoinformatic Surveillance System
Spatially distributed response variables Hotspot analysis Prioritization Decision support systems
Geoinformatic spatio-temporal data from a variety of data products and data sources with agencies, academia, and industry
Masks, filters Indicators, weights Masks, filters
Geoinformatic Surveillance System
Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies
Statistical Processing: Hotspot Detection, Prioritization, etc.
Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard
describe urban space in terms of density of service types;
consistent clusters for investigated regions;
(Yundola)
We perform time series analysis to find the expected counts for each recent day; then compare actual to expected counts. For the standard scan statistic approach, we assume that each count is drawn from a Poisson distribution with unknown mean. For each of these regions, we compare the current counts for each location to the time series of historical counts for that location.
Expected counts Historical counts Current counts (3 day duration)