welcome overview of predictive analytics
play

Welcome Overview of Predictive Analytics Claudia Perlich Chief - PowerPoint PPT Presentation

Welcome Overview of Predictive Analytics Claudia Perlich Chief Scientist, Dstillery Predictive Modeling: Algorithms that Learn from Data Example: Micro Loans Ag e Inc ome De fa ult 35 75K no 68 83K ye s 43 61K no 71 56K ye s


  1. Welcome

  2. Overview of Predictive Analytics Claudia Perlich Chief Scientist, Dstillery

  3. Predictive Modeling: Algorithms that Learn from Data

  4. Example: Micro Loans Ag e Inc ome De fa ult 35 75K no 68 83K ye s 43 61K no 71 56K ye s … … …

  5. Learning to Classify Classification tree Balance Split over balance > = 50K < 50K Age Default Age Prob.= 12/13 45 < 45 > = 45 Split over age Default Default Prob.= 4/7 Prob.= 1 50K Balance Bad risk (Default) – 16 cases Probability of default= 4/ 7 Good risk (Not default) – 14 cases

  6. Learning to Classify Logistic Regression p(+|x)= Age 45 β 0 = 123 β 1 = -1.3 50K Balance Bad risk (Default) – 16 cases p(+|x) = 0.48 Good risk (Not default) – 14 cases

  7. Lending Club Data • Text • Loan Category • Demographic information • Credit Score

  8. Targeted Online Display Advertising

  9. 100 Million Who should Brow sers w e target for a product? cookies 100 Million Shopping at one of Does the ad URL’s our campaign sites have an effect? conversion Where should What data should 0.0001% to 1% w e advertise and Billions of w e pay for? baserate at w hat price? Auctions Attribution? per day Ad Exchange Which request are fraud?

  10. Agnostic Data A c onsume r’s online a c tivity g e ts re c orde d like this: Purc ha se s Purc ha se s E E nc ode d nc ode d T he Bra nde d We b T he Non- Bra nde d We b da te 1 3012L da te 1 3012L 20 20 da te 2 4199L da te 2 4199L 30 30 … … da te n 3075L da te n 3075L 50 50 Browsing History Browsing History Ha she d URL Ha she d URL ’s: ’s: da te 1 a b kc c da te 1 a b kc c da te 2 kkllo da te 2 kkllo da te 3 88io k da te 3 88io k da te 4 7uio l da te 4 7uio l … … I do not want/need to ‘understand’ who you are …

  11. Model in 10 Million Dimensions Using Na ïve Ba ye s a nd Sto c ha stic Gra die nt De c e nt L o g istic Re g re ssio n, we e stima te sta tistic a l c o rre la tio ns b e twe e n 10s o f millio ns o f we b URL s a nd 1000s o f b ra nde d a c tio ns. Pa ssion ike lihood to Conve rt g ive n Visit non- bra nde d we bsite s L Ave rsion p(buy|urls) =

  12. Real ‐ time Scoring of a Browser ENG AG EMENT O BSERVATIO N Pur c ha se Ad Ad Ad Ad Ad Ad Ad Ad Prospe c tRa nk T hre shold Some pr ospe c ts fall out of favor onc e the ir in-mar ke t indic ator s de c line . site visit with po sitive c o rre la tio n site visit with ne g a tive c o rre la tio n p(buy|urls) =

  13. Models in Our World • Spam Detection • Fraud/Fault Detection • Financial Trading • Medial Diagnosis/Quality control • Sentiment Analysis • Prioritization in General • CRM • Recommender systems • Advertising/Targeting

  14. Important Takeaways • The algorithm is secondary • The data is KEY • Quality control is HARD • Model is only as good as the modeler • Very difficult to really understand the data

  15. Panel Discussion • Pamela Dixon , Founder, World Privacy Forum • Edmund Mierzwinski , Consumer Program Director and Senior Fellow, U.S. Public Interest Research Group • Claudia Perlich , Chief Scientist, Dstillery • Stuart Pratt , President and CEO, Consumer Data Industry Association • Ashkan Soltani , Independent Researcher and Consultant • Rachel Nyswander Thomas , Executive Director of Data ‐ Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association • Joseph Turow , Professor, University of Pennsylvania

  16. Presentation Ashkan Soltani Independent Researcher and Consultant

  17. whoami twitter: @ashk4n ashkan.soltani@gmail.com independent researcher & consultant

  18. today: alternative scoring • methodology • findings • data sources

  19. methodology

  20. user ‐ agent

  21. older findings: orbitz

  22. findings: orbitz Some sites, for example, gave discounts based on whether or not a person was using a mobile device. A person searching for hotels from the Web browser of an iPhone or Android phone on travel sites Orbitz and CheapTickets would see discounts of as much as 50% off the list price , Orbitz said. Both sites are run by Orbitz Worldwide Inc., which in fact markets the differences as "mobile steals." Orbitz says the deals are also available on the iPad if a person installs the Orbitz app.

  23. findings: gogo inflight User ‐ Agent: Desktop User ‐ Agent: iPhone $12.95 $7.95

  24. location

  25. findings: staples

  26. findings: staples

  27. findings: more geography Home Depot's website offered Location also seemed to be important for price variations that appeared to some international companies. The Journal be based on the nearest brick ‐ and ‐ saw Rosetta Stone, which sells software for mortar store as well. A 250 ‐ foot learning languages, offering discounts of as spool of electrical wiring fell into much as 20% for people who bought multiple six pricing groups, including levels of its German lessons from certain $70.80 in Ashtabula, Ohio; $72.45 locations in the U.S. or Canada , but not others in Erie, Pa.; $75.98 in Olean, N.Y from the U.K. or Argentina. and $77.87 in Monticello, N.Y.

  28. findings: discover In the tests, Discover, for instance, showed a prominent offer for the company's new "it" card to computers connecting from cities including Denver, Kansas City, Mo., and Dallas, Texas. Computers connecting from Scranton, Penn., Kingsport, Tenn., and Los Angeles didn't see the same offer. A Discover spokeswoman said that the company was testing the card, but that for competitive reasons, it wouldn't comment further on its "acquisition strategy" for new customers.

  29. findings: staples higher income = lower price In the Journal's examination of Staples' online pricing , the weighted average income among ZIP Codes that mostly received discount prices was roughly $59,900, based on Internal Revenue Service data. ZIP Codes that saw generally high prices had a lower weighted average income, $48,700.

  30. profiles*

  31. findings: nextag / shoplet

  32. findings: nextag / shoplet

  33. findings: capital one Capital One was showing different users different cards first— either those for "excellent credit" or "average credit."

  34. findings: capital one

  35. data sources

  36. data sources

  37. data sources

  38. data sources

  39. data sources

  40. conclusion

  41. conclusion: staples As a final test, the Journal ordered two separate Swingline staplers from Staples.com, from two nearby ZIP Codes—one costing $14.29 and the other one $15.79. The staplers arrived the same day. They appear to be indistinguishable from one another and do an equally thorough job of stapling.

  42. Panel Discussion • Pamela Dixon , Founder, World Privacy Forum • Edmund Mierzwinski , Consumer Program Director and Senior Fellow, U.S. Public Interest Research Group • Claudia Perlich , Chief Scientist, Dstillery • Stuart Pratt , President and CEO, Consumer Data Industry Association • Ashkan Soltani , Independent Researcher and Consultant • Rachel Nyswander Thomas , Executive Director of Data ‐ Driven Marketing Institute, and Vice President of Government Affairs, Direct Marketing Association • Joseph Turow , Professor, University of Pennsylvania

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend