Motivation Computation and Aggregation of Quantiles Application at - - PowerPoint PPT Presentation

motivation
SMART_READER_LITE
LIVE PREVIEW

Motivation Computation and Aggregation of Quantiles Application at - - PowerPoint PPT Presentation

Motivation Computation and Aggregation of Quantiles Application at Lucent Technologies: from Data Streams software to monitor distributed IP-based services. John Chambers, David James, Goal: characterize various metrics (e.g. e- Diane


slide-1
SLIDE 1

Computation and Aggregation of Quantiles from Data Streams John Chambers, David James, Diane Lambert, Scott Vander Wiel

(related article to appear with discussion in “Statistical Science”)

Vienna, June 17, 2006

Motivation

  • Application at Lucent Technologies:

software to monitor distributed IP-based services.

  • Goal: characterize various metrics (e.g. e-

mail transaction times), locally and aggregated, updated over time.

  • Constraint: computing at the node, amount
  • f data transmitted to server.

Quantile Estimation

Metrics are often unusually distributed (long tails, bimodal, ...) Need to estimate quantiles (often in tail).

The Idea

(Approximate, Update, Aggregate)

  • Approximate the empirical distribution for

each metric & node (agent)

  • Update each approximation periodically for

new data at the node.

  • Aggregate the ecdfs for relevant groupings
  • f nodes (e.g., regions)
slide-2
SLIDE 2

Update for each agent

D: Data Buffer X1, X2, X3, ... Q: Quantile Buffer Agent Summary fill update report

.10 .25 .50 .75 .90 .95 .98 .99 .10 .25 .50 .75 .90 .95

Aggregate agent records

Agent Summary D: Data Buffer fill update report Server Summary Q: Quantile Buffer

.10 .25 .50 .75 .90 .95 .10 .25 .50 .75 .90 .95 .05 .10 .25 .50 .75 .90 .95 .98 .99 .995

Software

  • Objects represent each evolving quantile

estimate: a <- seqQuants(....)

  • OOP-style functions to simulate updating,

aggregating: a$merge(data) (modifies a)

  • Using R closures (object contains functions

with a shared environment for updates).

Software

  • R simplifies large-scale simulation studies,

with varying statistical assumptions.

  • R also helps in the algorithm development in

C, by calling an R tracer from C.

slide-3
SLIDE 3

Summary

  • An example of the productive interaction

between applications and research, typical of Bell Labs research (in the old days).

  • An interesting algorithmic study to estimate

distributions with distributed, ongoing data.

  • The productive computing environment

centered on R essential for productivity.