Welcome Overview of Predictive Analytics Claudia Perlich Chief - - PowerPoint PPT Presentation

welcome overview of predictive analytics
SMART_READER_LITE
LIVE PREVIEW

Welcome Overview of Predictive Analytics Claudia Perlich Chief - - PowerPoint PPT Presentation

Welcome Overview of Predictive Analytics Claudia Perlich Chief Scientist, Dstillery Predictive Modeling: Algorithms that Learn from Data Example: Micro Loans Ag e Inc ome De fa ult 35 75K no 68 83K ye s 43 61K no 71 56K ye s


slide-1
SLIDE 1

Welcome

slide-2
SLIDE 2

Overview of Predictive Analytics

Claudia Perlich

Chief Scientist, Dstillery

slide-3
SLIDE 3

Predictive Modeling: Algorithms that Learn from Data

slide-4
SLIDE 4
slide-5
SLIDE 5

Example: Micro Loans

Ag e Inc ome De fa ult

35 75K no 68 83K ye s 43 61K no 71 56K ye s … … …

slide-6
SLIDE 6

Balance Age

Bad risk (Default) – 16 cases Good risk (Not default) – 14 cases

Split over age Split over balance 50K 45 Prob.= 1 Prob.= 4/7 Balance > = 50K < 50K Age > = 45 < 45 Default Default Default

Learning to Classify

Classification tree

Prob.= 12/13

Probability of default= 4/ 7

slide-7
SLIDE 7

Learning to Classify

Balance Age

Bad risk (Default) – 16 cases Good risk (Not default) – 14 cases

50K 45

Logistic Regression p(+|x) = 0.48

p(+|x)= β0 = 123 β1 = -1.3

slide-8
SLIDE 8

Lending Club Data

  • Text
  • Loan Category
  • Demographic information
  • Credit Score
slide-9
SLIDE 9
slide-10
SLIDE 10

Targeted Online Display Advertising

slide-11
SLIDE 11

Shopping at one of

  • ur campaign sites

cookies

100 Million URL’s 100 Million Brow sers 0.0001% to 1% baserate Billions of Auctions per day

conversion

Ad

Exchange Where should w e advertise and at w hat price? Does the ad have an effect? What data should w e pay for? Attribution? Who should w e target for a product? Which request are fraud?

slide-12
SLIDE 12

T he Non- Bra nde d We b

A c onsume r’s online a c tivity

T he Bra nde d We b

g e ts re c orde d like this:

Agnostic Data

I do not want/need to ‘understand’ who you are …

Browsing History Ha she d URL ’s:

da te 1 a b kc c da te 2 kkllo da te 3 88io k da te 4 7uio l

Browsing History Ha she d URL ’s:

da te 1 a b kc c da te 2 kkllo da te 3 88io k da te 4 7uio l

Purc ha se s E nc ode d

da te 1 3012L 20 da te 2 4199L 30 … da te n 3075L 50

Purc ha se s E nc ode d

da te 1 3012L 20 da te 2 4199L 30 … da te n 3075L 50

slide-13
SLIDE 13

Model in 10 Million Dimensions

Using Na ïve Ba ye s a nd Sto c ha stic Gra die nt De c e nt L

  • g istic Re g re ssio n, we e stima te

sta tistic a l c o rre la tio ns b e twe e n 10s o f millio ns o f we b URL s a nd 1000s o f b ra nde d a c tio ns.

L ike lihood to Conve rt g ive n Visit

Pa ssion Ave rsion

non- bra nde d we bsite s

p(buy|urls) =

slide-14
SLIDE 14

Ad Ad Ad Ad Ad Ad

Real‐time Scoring of a Browser

Ad Ad

O BSERVATIO N

Pur c ha se

Prospe c tRa nk T hre shold

site visit with po sitive c o rre la tio n site visit with ne g a tive c o rre la tio n

ENG AG EMENT

Some pr

  • spe c ts

fall

  • ut of favor onc e the ir

in-mar ke t indic ator s de c line .

p(buy|urls) =

slide-15
SLIDE 15

Models in Our World

  • Spam Detection
  • Fraud/Fault Detection
  • Financial Trading
  • Medial Diagnosis/Quality control
  • Sentiment Analysis
  • Prioritization in General
  • CRM
  • Recommender systems
  • Advertising/Targeting
slide-16
SLIDE 16

Important Takeaways

  • The algorithm is secondary
  • The data is KEY
  • Quality control is HARD
  • Model is only as good as the modeler
  • Very difficult to really understand the data
slide-17
SLIDE 17

Panel Discussion

  • Pamela Dixon, Founder, World Privacy Forum
  • Edmund Mierzwinski, Consumer Program Director and

Senior Fellow, U.S. Public Interest Research Group

  • Claudia Perlich, Chief Scientist, Dstillery
  • Stuart Pratt, President and CEO, Consumer Data

Industry Association

  • Ashkan Soltani, Independent Researcher and Consultant
  • Rachel Nyswander Thomas, Executive Director of Data‐

Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association

  • Joseph Turow, Professor, University of Pennsylvania
slide-18
SLIDE 18

Presentation

Ashkan Soltani

Independent Researcher and Consultant

slide-19
SLIDE 19

twitter: @ashk4n ashkan.soltani@gmail.com independent researcher & consultant

whoami

slide-20
SLIDE 20
  • methodology
  • findings
  • data sources

today: alternative scoring

slide-21
SLIDE 21
slide-22
SLIDE 22

methodology

slide-23
SLIDE 23

user‐agent

slide-24
SLIDE 24
  • lder findings: orbitz
slide-25
SLIDE 25

findings: orbitz

Some sites, for example, gave discounts based

  • n whether or not a person was using a mobile
  • device. A person searching for hotels from the

Web browser of an iPhone or Android phone

  • n travel sites Orbitz and CheapTickets would

see discounts of as much as 50% off the list price, Orbitz said. Both sites are run by Orbitz Worldwide Inc., which in fact markets the differences as "mobile steals." Orbitz says the deals are also available on the iPad if a person installs the Orbitz app.

slide-26
SLIDE 26

findings: gogo inflight

User‐Agent: Desktop $12.95 User‐Agent: iPhone $7.95

slide-27
SLIDE 27

location

slide-28
SLIDE 28

findings: staples

slide-29
SLIDE 29

findings: staples

slide-30
SLIDE 30

Home Depot's website offered price variations that appeared to be based on the nearest brick‐and‐ mortar store as well. A 250‐foot spool of electrical wiring fell into six pricing groups, including $70.80 in Ashtabula, Ohio; $72.45 in Erie, Pa.; $75.98 in Olean, N.Y and $77.87 in Monticello, N.Y.

findings: more geography

Location also seemed to be important for some international companies. The Journal saw Rosetta Stone, which sells software for learning languages, offering discounts of as much as 20% for people who bought multiple levels of its German lessons from certain locations in the U.S. or Canada, but not others from the U.K. or Argentina.

slide-31
SLIDE 31

findings: discover

In the tests, Discover, for instance, showed a prominent offer for the company's new "it" card to computers connecting from cities including Denver, Kansas City, Mo., and Dallas, Texas. Computers connecting from Scranton, Penn., Kingsport, Tenn., and Los Angeles didn't see the same offer. A Discover spokeswoman said that the company was testing the card, but that for competitive reasons, it wouldn't comment further

  • n its "acquisition strategy" for new customers.
slide-32
SLIDE 32

findings: staples

In the Journal's examination of Staples' online pricing, the weighted average income among ZIP Codes that mostly received discount prices was roughly $59,900, based on Internal Revenue Service data. ZIP Codes that saw generally high prices had a lower weighted average income, $48,700.

higher income = lower price

slide-33
SLIDE 33

profiles*

slide-34
SLIDE 34

findings: nextag / shoplet

slide-35
SLIDE 35

findings: nextag / shoplet

slide-36
SLIDE 36

Capital One was showing different users different cards first— either those for "excellent credit" or "average credit."

findings: capital one

slide-37
SLIDE 37

findings: capital one

slide-38
SLIDE 38

data sources

slide-39
SLIDE 39

data sources

slide-40
SLIDE 40

data sources

slide-41
SLIDE 41

data sources

slide-42
SLIDE 42

data sources

slide-43
SLIDE 43

conclusion

slide-44
SLIDE 44

conclusion: staples

As a final test, the Journal ordered two separate Swingline staplers from Staples.com, from two nearby ZIP Codes—one costing $14.29 and the other one $15.79. The staplers arrived the same day. They appear to be indistinguishable from one another and do an equally thorough job of stapling.

slide-45
SLIDE 45

Panel Discussion

  • Pamela Dixon, Founder, World Privacy Forum
  • Edmund Mierzwinski, Consumer Program Director and

Senior Fellow, U.S. Public Interest Research Group

  • Claudia Perlich, Chief Scientist, Dstillery
  • Stuart Pratt, President and CEO, Consumer Data

Industry Association

  • Ashkan Soltani, Independent Researcher and Consultant
  • Rachel Nyswander Thomas, Executive Director of Data‐

Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association

  • Joseph Turow, Professor, University of Pennsylvania