Can I add this class? Lulu Liang (ll882) is handling the waiting - - PowerPoint PPT Presentation

can i add this class
SMART_READER_LITE
LIVE PREVIEW

Can I add this class? Lulu Liang (ll882) is handling the waiting - - PowerPoint PPT Presentation

Can I add this class? Lulu Liang (ll882) is handling the waiting list. We expect all majors and minors to be able to enroll. INFO 2950: Intro to Data Science Prof. David Mimno Thank you for your interest, but... This class is required for


slide-1
SLIDE 1

Can I add this class?

Lulu Liang (ll882) is handling the waiting list. We expect all majors and minors to be able to enroll.

slide-2
SLIDE 2

INFO 2950: Intro to Data Science

  • Prof. David Mimno
slide-3
SLIDE 3

Thank you for your interest, but...

This class is required for InfoSci majors and minors. If you do not need it, please consider other options.

slide-4
SLIDE 4

Where to fjnd things

  • Course website: http://mimno.infosci.cornell.edu/info2950
  • Question answering: https://campuswire.com/c/G7E579AA4

(code 3402)

  • Assignments: CMS (enrollment will sync every 24 hrs)
slide-5
SLIDE 5

Textbooks

VanderPlas, Python Data Science Handbook James, Witten, Hastie, Tibshirani, An introduction to statistical learning

Both are free, links from course website

slide-6
SLIDE 6

The wheat is stored... The information is stored... The data is stored...

slide-7
SLIDE 7
slide-8
SLIDE 8

Statistics (20th century version)

Experiments are designed Computation is hard Data is expensive Goal is causation

Wikipedia, Fisher; Gosset

slide-9
SLIDE 9

Data Science (21st century)

Observations are gathered opportunistically Computation is cheap Data is abundant Goal is prediction

linksys.com

slide-10
SLIDE 10

Drew Conway's Venn diagram

http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

slide-11
SLIDE 11

Data science pattern

  • 1. Map real-world entities to a computational representation
  • 2. Perform mathematical operations on those representations
  • 3. Interpret results of those operations
slide-12
SLIDE 12

Data science pattern

  • 1. Map real-world entities to a computational representation
  • 2. Perform mathematical operations on those representations
  • 3. Interpret results of those operations
  • 4. [go to step 1]
slide-13
SLIDE 13

Math questions

What representations are good for supporting mathematical operations? How can we create accurate mathematical models of real-world events? How can we convince ourselves and others that this isn't just randomness?

slide-14
SLIDE 14

The math is the easy part

  • Is the data reliable and complete?
  • Are we answering the right question?
  • How can we balance between what is

useful and what is easily available?

  • Will anyone believe that we have the

right answer? Should they?

Wikipedia "Town hall meeting"

slide-15
SLIDE 15

Live experiment! Find a study group

https://forms.gle/NCZ6CSMB6qiiasfUA

slide-16
SLIDE 16

Where to fjnd things

  • Course website: http://mimno.infosci.cornell.edu/info2950
  • Question answering: https://campuswire.com/c/G7E579AA4

(code 3402)

  • Assignments: CMS (enrollment will sync every 24 hrs)
slide-17
SLIDE 17

Weekly pattern

Monday Mimno offjce hours, 1:30-3:30 Gates 205 Tuesday Presentation

  • f new

material Wednesday Thursday Presentation

  • f new

material; Homework due 11:59pm Friday Lab sessions: practice and discuss

slide-18
SLIDE 18

For Friday: Install Python 3

  • Anaconda is the easiest, most

reliable installation: https://anaconda.com/download

  • NO PYTHON 2.

○ To check: type print "hello" with no (parentheses). You should get an error.

We will work in notebooks, scripts, and the command line (>>>)

slide-19
SLIDE 19

RIP Python 2

Wikipedia, "Headstone"

slide-20
SLIDE 20

How to do well in this class

Show up Don't just read, test yourself Start early Snacks! Healthy sleep

slide-21
SLIDE 21

Can I add this class?

Lulu Liang (ll882) is handling the waiting list. We expect all majors and minors to be able to enroll.