NTTS 2015 - Session 6A Big data sources: web scraping and smart - - PowerPoint PPT Presentation

ntts 2015 session 6a big data sources web scraping and
SMART_READER_LITE
LIVE PREVIEW

NTTS 2015 - Session 6A Big data sources: web scraping and smart - - PowerPoint PPT Presentation

Automatic price collection on Ingolf Boettcher the internet (web scraping) Brussels 10. March 2015 NTTS 2015 - Session 6A Big data sources: web scraping and smart meters www.statistik.at Wir bewegen Informationen Web scraping There is a


slide-1
SLIDE 1

www.statistik.at Wir bewegen Informationen

Automatic price collection on the internet (web scraping)

Ingolf Boettcher Brussels

  • 10. March 2015

NTTS 2015 - Session 6A – Big data sources: web scraping and smart meters

slide-2
SLIDE 2

www.statistik.at Folie 2 | 10.03.2015

Web scraping

There is a huge amount of data on the internet <HTML>

<HEAD> <TITLE> DATA </Title> </HEAD> </HTML> How can we best collect/scrape/harvest data from there for statistical purposes?

slide-3
SLIDE 3

www.statistik.at Folie 3 | 10.03.2015

Web scraping Internet data collection – Minimum goal for (Price) Statistics:

Turn website content into a spreadsheet

slide-4
SLIDE 4

www.statistik.at Folie 4 | 10.03.2015

Web scraping

Internet data collection Options:

  • 1. Manual price collection
  • 2. Develop an API /Web scraper

2.1 by writing custom computer code 2.2 by using point and click web tools

slide-5
SLIDE 5

www.statistik.at Folie 5 | 10.03.2015

Web scraping

Reasons for not writing an own web scraper IT-developer needed, therefore:

  • Expensive
  • Inflexible
  • Even maintenance cannot be

handled by CPI staff

slide-6
SLIDE 6

www.statistik.at Folie 6 | 10.03.2015

Web scraping

Reasons to use click and point webtools for web scraping: No IT-developer needed, therefore:

  • Cheap
  • Flexible
  • No programming skill required
slide-7
SLIDE 7

www.statistik.at Folie 7 | 10.03.2015

Web scraping

How web scraping with click and point using import.io looks like:

  • web-platform that allows to structure and

extract data from websites

slide-8
SLIDE 8

www.statistik.at Folie 8 | 10.03.2015

Webscraping with import.io

slide-9
SLIDE 9

www.statistik.at Folie 9 | 10.03.2015

Webscraping

Web scraping with click and point on web- based platform offers solutions to:

  • extract data by point-and-click
  • record actions on a website
  • crawl all the data of a webpage

More issues to be considered:

  • Legality to crawl on websites
  • Internal IT Security
  • Training of staff
slide-10
SLIDE 10

www.statistik.at Folie 10 | 10.03.2015 Contact: Ingolf Boettcher Guglgasse 13, 1110 Wien Tel: +43 (1) 71128-7917 Fax: +43 (1) 7180718 Ingolf.boettcher@statistik.gv.at

Automatic price collection on the internet (web scraping)