Data Visualizations of HYIP Dataset Quantifying the World April - - PowerPoint PPT Presentation
Data Visualizations of HYIP Dataset Quantifying the World April - - PowerPoint PPT Presentation
Data Visualizations of HYIP Dataset Quantifying the World April 23, 2012 Jie Han Financial Cryptography 2012 This could be you!!! http://fc12.ifca.ai/pre-proceedings/paper_27.pdf Overview 1. What's an HYIP? 2. Dataset 3. Processes 4. R
Financial Cryptography 2012
http://fc12.ifca.ai/pre-proceedings/paper_27.pdf
This could be you!!!
Overview
- 1. What's an HYIP?
- 2. Dataset
- 3. Processes
- 4. R graph examples
- 5. Google Chart examples
- 6. Some helpful hints
High Yield Investment Programs (HYIPs)
- Also known as a Ponzi or pyramid scheme
- Promise high returns on investment
- Pay existing investors with revenue from new
investors
- Unsustainable in the long run
Why are HYIPs a problem?
- Advertised as legitimate investments
- Sophisticated online ecosystem in support of
the schemes
HYIP Website
HYIP Aggregator Websites
HYIP Variables
HYIP Lifetime
Typical life cycle of an HYIP:
About the Data
- Since 11/17/2010, still running
- Collected data from nine "aggregator" websites
- Total observations: 141k+
- Total HYIPs observed: 1,576+
Process
Data collection (Python, crontab, mongoDB) Preliminary analysis (Python, R) Continue data collection, work on parsing all aggregators (Python) Look at what we have, decide on what we want (R) Difficulties in analyzing data -> create interactive data visualizations (Python, Google Charts, JS, HTML) Use new tools to look for patterns (browser & eyes)
How an R Chart Gets Generated
Data Collection (Python) Parse data & insert into db (Python, mongoDB) Fetch & manipulate data (Python, mongoDB, R) Output a .pdf image to server New user input (HTML forms)
Front End Back End
User interact with data in browser
Background scripts
How Can We Trust Aggregator Data?
CDF of Standard Deviations of HYIP Lifetimes
- Aggregators agree 80% of the time
How Long Do HYIPs Last Before Collapsing?
Survival function of HYIP Lifetimes
- Most HYIPs collapse within a few weeks
What Factors Lead to Collapse?
Factors that lead to shorter HYIP lifespans:
- Higher advertised rates of return
- Shorter mandatory investment terms
R vs. Google Charts
- Useful if familiar with the
dataset
- Good at presenting
aggregate summaries
- Large learning curve,
especially when you want to do something specific
- More customizable
- Most analysis techniques
are available
- Anyone can view & interact
with the data
- See a complete data
distribution
- Learning curve isn't bad
- Not as customizable
- Have to wait for updates for
more functionality, or write your own
R Google Charts
How a Google Chart Gets Generated
Data Collection (Python) Parse data & insert into db (Python, mongoDB) Fetch & manipulate data (Python, mongoDB, R) Write JS & HTML page (Python, JS, HTML, CSS) New user input (HTML forms) User interact with data in browser
Background scripts Back End Front End
Distribution of HYIPs Around the World
Link
Motion Charts
Link
Variable Changes Over Time
cherryshares.com, aggregator rating Link
Relationships Between Two Variables
Link
Multi-Dimensional Scatterplot
Link
Multi-Dimensional Scatterplot
Link
General Programming Tips
- Spend time on data quality
- Organize your code, variable names, and files
- Keep records of working examples
- Plan out your code to maximize pattern capture
- Error-catching, browser consoles, and regexes
are friends
- Test out chunks of code before putting them
together
- Google Tables take a while to load for large
datasets
- Google Charts Playground allows you to test code
in their environment
Future Work
- Create an interactive web based visualization
for our dataset - some examples I made
- Link scams together
- Explore larger dataset