Research paper Project proposal Austin Wilson Roberto Campos - - PowerPoint PPT Presentation

research paper project proposal
SMART_READER_LITE
LIVE PREVIEW

Research paper Project proposal Austin Wilson Roberto Campos - - PowerPoint PPT Presentation

Research paper Project proposal Austin Wilson Roberto Campos Isaac Shah Economic impact of epidemics and pandemics Market losses! https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf Market losses from


slide-1
SLIDE 1

Research paper Project proposal

Austin Wilson Roberto Campos Isaac Shah

slide-2
SLIDE 2

Economic impact of epidemics and pandemics

slide-3
SLIDE 3

Market losses!

https://www.europarl.europa.eu/RegData/etudes/BRIE/2020/646195/EPRS_BRI(2020)646195_EN.pdf

Market losses from a pandemic could be up to $500 billion Lower-middle income countries are impacted more than high income countries

slide-4
SLIDE 4

Industries affected

Healthcare industry sees a huge spike in costs when a pandemic occurs. Also insurance industry because of people going to the doctor

slide-5
SLIDE 5

Industries affected

Agricultural industry is adversely impacted.

  • In developed countries the agriculture industry is incentivized to prioritize spending on reducing infectious

disease prevention.

  • In less developed countries agricultural companies are not incentivized to spend to reduce infectious

disease

  • Some of these less developed countries may cause an infectious disease outbreak, the result being travel

and trade isolation

slide-6
SLIDE 6

Travel industry

  • People do not want to travel to places where the disease is running rampid
  • People don’t want to be on planes or ships where they think there might be an outbreak
  • Estimated $2.8 billion loss to Mexican travel industry from H1N1
slide-7
SLIDE 7

Time Series Data Mining by Phillippe Esling

  • Data representation: how can time series be represented, what is the

shape?

  • Similarity measurement: how do we compare two time series objects
  • Indexing method: how can we speed up query time for big data?
slide-8
SLIDE 8

Clustering

  • Whole series clustering tries to maximize the distance between different

clusters while also maximizing the variance within each cluster

  • We can also use subsequence clustering where we try to subset a single

time series into different clusters

  • Classification is similar to whole series clustering where we are given sets
  • f time series and a label for each set, the task is to train a classifier to

label new time series

slide-9
SLIDE 9

Segmentation

  • Create an accurate approximation while reducing dimensionality of the

time series

  • Want to keep the essential features and drop redundant or uninsightful

features

slide-10
SLIDE 10

Piecewise linear approximation

  • One of the most successful approaches of segmentation over the years
  • Try and split the time series up into segments
  • Fit individual polynomial or linear cures to each segment
  • Slicing windows

○ Keep growing a window until it exceeds an error threshold

  • Top-down

○ Recursively partition a data set until some stopping criteria is met

  • Bottom-up

○ Start from the finest segments and iteratively merge segments

slide-11
SLIDE 11

Data-adaptive vs non-data-adaptive vs model-based

  • Data-adaptive: parameters are modified based on the values of

consecutive segments

  • Non-data-adaptive: parameters of transformation remain the same for

every series

  • Model-based: assume the time series has been produced by an underlying

model and find the parameters of the model

slide-12
SLIDE 12

Data

COVID - Detailed Novel Corona Virus 2019 Dataset COVID - South Korea https://www.kaggle.com/kimjihoo/coronavirusdataset Stocks https://www.kaggle.com/borismarjanovic/price-volume-data-for-all-us-stocks-etfs .

slide-13
SLIDE 13

Task Perform + Task Division

Research: Isaac S. Study past events and find dataset’s that we can use to analyze the initial problems in past situations. Locate when the problem first initiated, when the situation plateaued and when the situation returned to normal. Tableau: Isaac Visualization will be done before choosing which models to work with. Try to find trends that are visible. Seek patterns and similarities between events. Try to map each case in a US Map and find if there is a correlation between its performance in the stock market. Modeling: Roberto Austin Explore which types of models can be used to solve each problem. For example, should we use linear regression vs logistic regression, can we find which variables are important. Is a fully connected neural network a useful method for the problem we are currently analyzing. Should we use CNN to identify important

  • features. Can we use SVM to categorize the different events from the past and

categorize the current event COVID-19. Data Pre-Processing: Roberto Austin Data pre-processing will play an important role. We have to analyze the types

  • f data we will be inputting into which model. Different models require different

processes.

slide-14
SLIDE 14

Tools

Jupyter Interactive notebook to visually present our models in detail. Python Our language of choice to pre-process data and create ML models. We are interested in using ANN or CNN for our model. We will also consider simple linear or log-linear models as well. R Used in support of Python as R is a great statistical tool that provides statistical

  • inference. It can help us mathematically prove that there is a correlation between that

which we seek to answer. Tableau A visualization tool that is versatile and creates a custom robust graph.

slide-15
SLIDE 15

Progress + Experience

Initial design/case study/prototype/ experiments With the expertise of the team combined, we will be able to analyze and seek Data that can help us answer our problem statement. Once the data is gathered quick visualizations will be rendered to further gain insights. All three members of the team have extensive knowledge of Tableau. Models can be easily prototyped with the use of Sklearn and Tensorflow

  • libraries. Two Members of the team have experience using these libraries and have

access to consulting outside of the classroom. Progress milestones what will be completed by week 11 and 14 By week 11 and 14, the team will have developed visual aids and prototype models to begin refining and preparing to approach specific details that will need to be specifically taken care of. For example, increasing the accuracy of our model. Experience Modeling with Sklearn and Tensor Flow Modeling with R Data Pre-processing Tableau Google Colab for Big Data

slide-16
SLIDE 16

sources

Economic impact of epidemics/pandemics

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6491983/

Time Series Data Mining

https://www.researchgate.net/publication/261722458_Time-Series_Data_Mining

slide-17
SLIDE 17

Thanks!