Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas - - PowerPoint PPT Presentation

web scraping with p y thon
SMART_READER_LITE
LIVE PREVIEW

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas - - PowerPoint PPT Presentation

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more !


slide-1
SLIDE 1

Web Scraping With Python

W E B SC R AP IN G IN P YTH ON

Thomas Laetsch

Data Scientist, NYU

slide-2
SLIDE 2

WEB SCRAPING IN PYTHON

Business Savvy

What are businesses looking for? Comparing prices Satisfaction of customers Generating potential leads ...and much more!

slide-3
SLIDE 3

WEB SCRAPING IN PYTHON

It's Personal

What could you do? Search for your favorite memes on your favorite sites. Automatically look through classied ads for your favorite gadgets. Scrape social site content looking for hot topics. Scrape cooking blogs looking for particular recipes, or recipe reviews. ...and much more!

slide-4
SLIDE 4

WEB SCRAPING IN PYTHON

About My Work

slide-5
SLIDE 5

WEB SCRAPING IN PYTHON

Pipe Dream

slide-6
SLIDE 6

WEB SCRAPING IN PYTHON

Pipe Dream: Setup

Setup Understand what we want to do. Find sources to help us do it.

slide-7
SLIDE 7

WEB SCRAPING IN PYTHON

Pipe Dream: Acquisition

Acquisition Read in the raw data from online. Format these data to be usable.

slide-8
SLIDE 8

WEB SCRAPING IN PYTHON

Pipe Dream: Processing

Processing Many options!

slide-9
SLIDE 9

WEB SCRAPING IN PYTHON

How do you do?

Our Focus Acquisition! (Using scrapy via python )

slide-10
SLIDE 10

Are you in?

W E B SC R AP IN G IN P YTH ON

slide-11
SLIDE 11

HyperText Markup Language

W E B SC R AP IN G IN P YTH ON

Thomas Laetsch

Data Scientist, NYU

slide-12
SLIDE 12

WEB SCRAPING IN PYTHON

The main example

slide-13
SLIDE 13

WEB SCRAPING IN PYTHON

HTML tags

<html> ... </html> <body> ... </body> <div> ... </div> <p> ... </p>

slide-14
SLIDE 14

WEB SCRAPING IN PYTHON

The HTML tree

slide-15
SLIDE 15

WEB SCRAPING IN PYTHON

The HTML tree: Example 1

slide-16
SLIDE 16

WEB SCRAPING IN PYTHON

The HTML tree: Example 2

slide-17
SLIDE 17

Introduction to HTML Outro

W E B SC R AP IN G IN P YTH ON

slide-18
SLIDE 18

HTML Tags and Attributes

W E B SC R AP IN G IN P YTH ON

Thomas Laetsch

Data Scientist, NYU

slide-19
SLIDE 19

WEB SCRAPING IN PYTHON

Do we have to?

Information within HTML tags can be valuable Extract link URLs Easier way to select elements

slide-20
SLIDE 20

WEB SCRAPING IN PYTHON

Tag, you're it!

We've seen tag names such as html, div, and p. The aribute name is followed by = followed by information assigned to that aribute, usually quoted text.

slide-21
SLIDE 21

WEB SCRAPING IN PYTHON

Let's "div"vy up the tag

id aribute should be unique class aribute doesn't need to be unique

slide-22
SLIDE 22

WEB SCRAPING IN PYTHON

"a" be linkin'

a tags are for hyperlinks href aribute tells what link to go to

slide-23
SLIDE 23

WEB SCRAPING IN PYTHON

Tag Traction

slide-24
SLIDE 24

Et Tu, Attributes?

W E B SC R AP IN G IN P YTH ON

slide-25
SLIDE 25

Crash Course X

W E B SC R AP IN G IN P YTH ON

Thomas Laetsch

Data Scientist, NYU

slide-26
SLIDE 26

WEB SCRAPING IN PYTHON

Another Slasher Video?

xpath = '/html/body/div[2]'

Simple XPath: Single forward-slash / used to move forward one generation. tag-names between slashes give direction to which element(s). Brackets [] aer a tag name tell us which of the selected siblings to choose.

slide-27
SLIDE 27

WEB SCRAPING IN PYTHON

Another Slasher Video?

xpath = '/html/body/div[2]'

slide-28
SLIDE 28

WEB SCRAPING IN PYTHON

Slasher Double Feature?

Direct to all table elements within the entire HTML code:

xpath = '//table'

Direct to all table elements which are descendants of the 2nd div child of the body element:

xpath = '/html/body/div[2]//table`

slide-29
SLIDE 29

Ex(path)celent

W E B SC R AP IN G IN P YTH ON