ntts 2015 session 6a big data sources web scraping and
play

NTTS 2015 - Session 6A Big data sources: web scraping and smart - PowerPoint PPT Presentation

Automatic price collection on Ingolf Boettcher the internet (web scraping) Brussels 10. March 2015 NTTS 2015 - Session 6A Big data sources: web scraping and smart meters www.statistik.at Wir bewegen Informationen Web scraping There is a


  1. Automatic price collection on Ingolf Boettcher the internet (web scraping) Brussels 10. March 2015 NTTS 2015 - Session 6A – Big data sources: web scraping and smart meters www.statistik.at Wir bewegen Informationen

  2. Web scraping There is a huge amount of data on the internet <HTML> How can we best collect/scrape/harvest <HEAD> data from there for <TITLE> DATA statistical purposes? </Title> </HEAD> </HTML> www.statistik.at Folie 2 | 10.03.2015

  3. Web scraping Internet data collection – Minimum goal for (Price) Statistics: Turn website content into a spreadsheet www.statistik.at Folie 3 | 10.03.2015

  4. Web scraping Internet data collection Options: 1. Manual price collection 2. Develop an API /Web scraper 2.1 by writing custom computer code 2.2 by using point and click web tools www.statistik.at Folie 4 | 10.03.2015

  5. Web scraping Reasons for not writing an own web scraper IT-developer needed, therefore: • Expensive • Inflexible • Even maintenance cannot be handled by CPI staff www.statistik.at Folie 5 | 10.03.2015

  6. Web scraping Reasons to use click and point webtools for web scraping: No IT-developer needed, therefore: • Cheap • Flexible • No programming skill required www.statistik.at Folie 6 | 10.03.2015

  7. Web scraping How web scraping with click and point using import.io looks like: • web-platform that allows to structure and extract data from websites www.statistik.at Folie 7 | 10.03.2015

  8. Webscraping with import.io www.statistik.at Folie 8 | 10.03.2015

  9. Webscraping Web scraping with click and point on web- based platform offers solutions to: • extract data by point-and-click • record actions on a website • crawl all the data of a webpage More issues to be considered: • Legality to crawl on websites • Internal IT Security • Training of staff www.statistik.at Folie 9 | 10.03.2015

  10. Automatic price Contact: Ingolf Boettcher collection on the internet Guglgasse 13, 1110 Wien Tel: +43 (1) 71128-7917 (web scraping) Fax: +43 (1) 7180718 Ingolf.boettcher@statistik.gv.at www.statistik.at Folie 10 | 10.03.2015

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend