A Look back on STOR 390 4/27/17 Where did this course come from? - - PowerPoint PPT Presentation

a look back on stor 390
SMART_READER_LITE
LIVE PREVIEW

A Look back on STOR 390 4/27/17 Where did this course come from? - - PowerPoint PPT Presentation

A Look back on STOR 390 4/27/17 Where did this course come from? Data@Carolina grant Iain Carmichael, Brendan Brown, Varun Goel, Dylan Glotzer, Marshall Markham, Shankar Bhamidi and many, many more:


slide-1
SLIDE 1

A Look back on STOR 390

4/27/17

slide-2
SLIDE 2

Where did this course come from?

Data@Carolina grant Iain Carmichael, Brendan Brown, Varun Goel, Dylan Glotzer, Marshall Markham, Shankar Bhamidi and many, many more: https://idc9.github.io/stor390/course_info/ acknowledgments.html

slide-3
SLIDE 3

Outline

What you learned (and what you didn’t) Why it’s important Broader perspective on data science

slide-4
SLIDE 4

What skills you learned

Programming in R Working with data Statistical modeling Effective Communication

slide-5
SLIDE 5

You learned how to program in R

Loops If/else Boolean logic Data types vectors, lists, strings, tibbles…

slide-6
SLIDE 6

You can use R Studio

R, R Markdown, Shiny Reports, data analysis, dashboards, interactive visualizations, resume, blog post, websites https://shiny.rstudio.com/gallery/ http://rmarkdown.rstudio.com/gallery.html

slide-7
SLIDE 7

You can work with tidy data

Visualization ggplot, shiny Data munging/manipulation/transformation dplyr: select, mutate, group_by joins: filtering, mutating, etc Loading data read_csv

slide-8
SLIDE 8

You can work with text data

Regular expressions str_match, str_extract Natural language processing

  • tidytext

unnest tokens, document term matrix, tf-idf

slide-9
SLIDE 9

You have spent some time working with data

data.gov Biodiversity in North Carolina MOMA IMDB Bike Sharing iPhone moment tracking Beauty and the Beast Harry Potter Final projects

slide-10
SLIDE 10

You know how to acquire data for yourself

Web scraping rvest, SelectorGaget APIs geocaching with google maps Twitter

slide-11
SLIDE 11

You have seen different types of analyses

Exploratory Inferential Predictive

slide-12
SLIDE 12

You can do statistical modeling/machine learning

Linear regression Classification KNN, Nearest Centroid, SVM Clustering K-means Model selection/tuning cross-validation Feature engineering factors, interactions, polynomial terms

slide-13
SLIDE 13

You have learned about effective communication

General principles/advise focus on message adapt to the audience Effective visual communication static plots (ggplot), dynamic plots (Shiny) Literate programming R Markdown

slide-14
SLIDE 14

You have done a full data analysis

Ask a question Acquire data Analyze some data Communicate results

slide-15
SLIDE 15

Higher level skills

Programming Ability to acquire data Identify problems that can be solved with data Classify data problems Communication

slide-16
SLIDE 16

What you did not learn

More advanced

  • programming
  • statistics

Lot’s of experience

slide-17
SLIDE 17

Be aware you know enough to be dangerous

Very easy to make bad, but convincing data driven arguments Just because an algorithm says something does not imply it is meaningful/correct

slide-18
SLIDE 18

Inference is hard

Lot’s of great, existing statistics courses teach you inference Experience Critical thinking

slide-19
SLIDE 19

Why these skills are important

Better understanding of

  • data
  • science
  • technology

See potential opportunities Empower you to do ______

slide-20
SLIDE 20

Understand strengths and limitations

  • f data, science and technology

What is easy? What is hard? What can go wrong?

slide-21
SLIDE 21

Look for potential

  • pportunities

Data can get at a lot of problems Basic understanding can go a long way

slide-22
SLIDE 22

The ability to work with data empowers you to do _______ better

What ever it is you are interested in medicine, sports, business, law, literature, “artificial intelligence”

slide-23
SLIDE 23

Broader take aways

Teach yourself Skepticism Yak-shaving Problem solving Trade-offs

slide-24
SLIDE 24

Teach yourself

MOOCs Coursera, edX, Udacidy Textbooks Stack exchange

slide-25
SLIDE 25

Problem solving

Break up a problem into smaller sub- problems Details

slide-26
SLIDE 26

–Mike Tyson

“Everyone has a plan until they get punched in the mouth.”

slide-27
SLIDE 27

Problem solving

Break up a problem into smaller sub- problems Details Adapt Persistence

slide-28
SLIDE 28

Be unafraid of Yak Shaving

Yak Shaving (noun) Any apparently useless activity which, by allowing you to

  • vercome intermediate difficulties, allows you to solve a larger

problem.

https://en.wiktionary.org/wiki/yak_shaving

slide-29
SLIDE 29

–Mark Twain

“There are three kinds of lies: lies, damned lies, and statistics.”

slide-30
SLIDE 30

Be skeptical

Where did the data come from? biases, is it representative? Does the argument hold merit? where might it have gone wrong

slide-31
SLIDE 31

–Milton Friedman

“There ain’t no such thing as a free lunch.”

slide-32
SLIDE 32

There are always trade-offs

Time spend writing vs. quality More rigorous analysis vs. time/resources The best model depends on the data Just because you can doesn’t mean you should

slide-33
SLIDE 33

Started the course with a quote from George Box

slide-34
SLIDE 34

–George Box

“All models are wrong but some models are useful.”

slide-35
SLIDE 35

Box quote summarizes data science

Optimism/tenacity

  • Maybe we can solve this problem?

Skepticism

  • Why should I believe your solution?

Science + engineering

slide-36
SLIDE 36

Thanks!

What could we do to make this course better? Stay in touch!