Data Visualizations of HYIP Dataset Quantifying the World April - - PowerPoint PPT Presentation

data visualizations of hyip dataset
SMART_READER_LITE
LIVE PREVIEW

Data Visualizations of HYIP Dataset Quantifying the World April - - PowerPoint PPT Presentation

Data Visualizations of HYIP Dataset Quantifying the World April 23, 2012 Jie Han Financial Cryptography 2012 This could be you!!! http://fc12.ifca.ai/pre-proceedings/paper_27.pdf Overview 1. What's an HYIP? 2. Dataset 3. Processes 4. R


slide-1
SLIDE 1

Data Visualizations of HYIP Dataset

Jie Han

Quantifying the World April 23, 2012

slide-2
SLIDE 2
slide-3
SLIDE 3

Financial Cryptography 2012

http://fc12.ifca.ai/pre-proceedings/paper_27.pdf

This could be you!!!

slide-4
SLIDE 4

Overview

  • 1. What's an HYIP?
  • 2. Dataset
  • 3. Processes
  • 4. R graph examples
  • 5. Google Chart examples
  • 6. Some helpful hints
slide-5
SLIDE 5

High Yield Investment Programs (HYIPs)

  • Also known as a Ponzi or pyramid scheme
  • Promise high returns on investment
  • Pay existing investors with revenue from new

investors

  • Unsustainable in the long run
slide-6
SLIDE 6

Why are HYIPs a problem?

  • Advertised as legitimate investments
  • Sophisticated online ecosystem in support of

the schemes

slide-7
SLIDE 7

HYIP Website

slide-8
SLIDE 8

HYIP Aggregator Websites

slide-9
SLIDE 9

HYIP Variables

slide-10
SLIDE 10

HYIP Lifetime

Typical life cycle of an HYIP:

slide-11
SLIDE 11

About the Data

  • Since 11/17/2010, still running
  • Collected data from nine "aggregator" websites
  • Total observations: 141k+
  • Total HYIPs observed: 1,576+
slide-12
SLIDE 12

Process

Data collection (Python, crontab, mongoDB) Preliminary analysis (Python, R) Continue data collection, work on parsing all aggregators (Python) Look at what we have, decide on what we want (R) Difficulties in analyzing data -> create interactive data visualizations (Python, Google Charts, JS, HTML) Use new tools to look for patterns (browser & eyes)

slide-13
SLIDE 13

How an R Chart Gets Generated

Data Collection (Python) Parse data & insert into db (Python, mongoDB) Fetch & manipulate data (Python, mongoDB, R) Output a .pdf image to server New user input (HTML forms)

Front End Back End

User interact with data in browser

Background scripts

slide-14
SLIDE 14

How Can We Trust Aggregator Data?

CDF of Standard Deviations of HYIP Lifetimes

  • Aggregators agree 80% of the time
slide-15
SLIDE 15

How Long Do HYIPs Last Before Collapsing?

Survival function of HYIP Lifetimes

  • Most HYIPs collapse within a few weeks
slide-16
SLIDE 16

What Factors Lead to Collapse?

Factors that lead to shorter HYIP lifespans:

  • Higher advertised rates of return
  • Shorter mandatory investment terms
slide-17
SLIDE 17

R vs. Google Charts

  • Useful if familiar with the

dataset

  • Good at presenting

aggregate summaries

  • Large learning curve,

especially when you want to do something specific

  • More customizable
  • Most analysis techniques

are available

  • Anyone can view & interact

with the data

  • See a complete data

distribution

  • Learning curve isn't bad
  • Not as customizable
  • Have to wait for updates for

more functionality, or write your own

R Google Charts

slide-18
SLIDE 18

How a Google Chart Gets Generated

Data Collection (Python) Parse data & insert into db (Python, mongoDB) Fetch & manipulate data (Python, mongoDB, R) Write JS & HTML page (Python, JS, HTML, CSS) New user input (HTML forms) User interact with data in browser

Background scripts Back End Front End

slide-19
SLIDE 19

Distribution of HYIPs Around the World

Link

slide-20
SLIDE 20

Motion Charts

Link

slide-21
SLIDE 21

Variable Changes Over Time

cherryshares.com, aggregator rating Link

slide-22
SLIDE 22

Relationships Between Two Variables

Link

slide-23
SLIDE 23

Multi-Dimensional Scatterplot

Link

slide-24
SLIDE 24

Multi-Dimensional Scatterplot

Link

slide-25
SLIDE 25

General Programming Tips

  • Spend time on data quality
  • Organize your code, variable names, and files
  • Keep records of working examples
  • Plan out your code to maximize pattern capture
  • Error-catching, browser consoles, and regexes

are friends

  • Test out chunks of code before putting them

together

  • Google Tables take a while to load for large

datasets

  • Google Charts Playground allows you to test code

in their environment

slide-26
SLIDE 26

Future Work

  • Create an interactive web based visualization

for our dataset - some examples I made

  • Link scams together
  • Explore larger dataset
slide-27
SLIDE 27

Thanks!