web scraping with p y thon
play

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas - PowerPoint PPT Presentation

Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more !


  1. Web Scraping With P y thon W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  2. B u siness Sa vvy What are b u sinesses looking for ? Comparing prices Satisfaction of c u stomers Generating potential leads ... and m u ch more ! WEB SCRAPING IN PYTHON

  3. It ' s Personal What co u ld y o u do ? Search for y o u r fa v orite memes on y o u r fa v orite sites . A u tomaticall y look thro u gh classi � ed ads for y o u r fa v orite gadgets . Scrape social site content looking for hot topics . Scrape cooking blogs looking for partic u lar recipes , or recipe re v ie w s . ... and m u ch more ! WEB SCRAPING IN PYTHON

  4. Abo u t M y Work WEB SCRAPING IN PYTHON

  5. Pipe Dream WEB SCRAPING IN PYTHON

  6. Pipe Dream : Set u p Set u p Understand w hat w e w ant to do . Find so u rces to help u s do it . WEB SCRAPING IN PYTHON

  7. Pipe Dream : Acq u isition Acq u isition Read in the ra w data from online . Format these data to be u sable . WEB SCRAPING IN PYTHON

  8. Pipe Dream : Processing Processing Man y options ! WEB SCRAPING IN PYTHON

  9. Ho w do y o u do ? O u r Foc u s Acq u isition ! ( Using scrapy v ia python ) WEB SCRAPING IN PYTHON

  10. Are y o u in ? W E B SC R AP IN G IN P YTH ON

  11. H y perTe x t Mark u p Lang u age W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  12. The main e x ample WEB SCRAPING IN PYTHON

  13. HTML tags <html> ... </html> <body> ... </body> <div> ... </div> <p> ... </p> WEB SCRAPING IN PYTHON

  14. The HTML tree WEB SCRAPING IN PYTHON

  15. The HTML tree : E x ample 1 WEB SCRAPING IN PYTHON

  16. The HTML tree : E x ample 2 WEB SCRAPING IN PYTHON

  17. Introd u ction to HTML O u tro W E B SC R AP IN G IN P YTH ON

  18. HTML Tags and Attrib u tes W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  19. Do w e ha v e to ? Information w ithin HTML tags can be v al u able E x tract link URLs Easier w a y to select elements WEB SCRAPING IN PYTHON

  20. Tag , y o u' re it ! We 'v e seen tag names s u ch as html , di v , and p . The a � rib u te name is follo w ed b y = follo w ed b y information assigned to that a � rib u te , u s u all y q u oted te x t . WEB SCRAPING IN PYTHON

  21. Let ' s " di v"vy u p the tag id a � rib u te sho u ld be u niq u e class a � rib u te doesn ' t need to be u niq u e WEB SCRAPING IN PYTHON

  22. " a " be linkin ' a tags are for h y perlinks href a � rib u te tells w hat link to go to WEB SCRAPING IN PYTHON

  23. Tag Traction WEB SCRAPING IN PYTHON

  24. Et T u, Attrib u tes ? W E B SC R AP IN G IN P YTH ON

  25. Crash Co u rse X W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  26. Another Slasher Video ? xpath = '/html/body/div[2]' Simple XPath : Single for w ard - slash / u sed to mo v e for w ard one generation . tag - names bet w een slashes gi v e direction to w hich element ( s ). Brackets [] a � er a tag name tell u s w hich of the selected siblings to choose . WEB SCRAPING IN PYTHON

  27. Another Slasher Video ? xpath = '/html/body/div[2]' WEB SCRAPING IN PYTHON

  28. Slasher Do u ble Feat u re ? Direct to all table elements w ithin the entire HTML code : xpath = '//table' Direct to all table elements w hich are descendants of the 2 nd div child of the body element : xpath = '/html/body/div[2]//table` WEB SCRAPING IN PYTHON

  29. E x( path ) celent W E B SC R AP IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend