xpath na v igation
play

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch - PowerPoint PPT Presentation

XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU Slashes and Brackets Single for w ard slash / looks for w ard one generation Do u ble for w ard slash // looks for w ard all f u t u re generations Sq u are


  1. XPath Na v igation W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  2. Slashes and Brackets Single for w ard slash / looks for w ard one generation Do u ble for w ard slash // looks for w ard all f u t u re generations Sq u are brackets [] help narro w in on speci � c elements WEB SCRAPING IN PYTHON

  3. To Bracket or not to Bracket xpath = '/html/body' xpath = '/html[1]/body[1]' Gi v e the same selection WEB SCRAPING IN PYTHON

  4. A Bod y of P xpath = '/html/body/p' WEB SCRAPING IN PYTHON

  5. The Birds and the Ps xpath = '/html/body/div/p' xpath = '/html/body/div/p[2]' WEB SCRAPING IN PYTHON

  6. Do u ble Slashing the Brackets xpath = '//p' xpath = '//p[1]' WEB SCRAPING IN PYTHON

  7. The Wildcard xpath = '/html/body/*' The asterisks * is the "w ildcard " WEB SCRAPING IN PYTHON

  8. Xposé W E B SC R AP IN G IN P YTH ON

  9. Off the Beaten XPath W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  10. ( At ) trib u te @ represents " a � rib u te " @class @id @href WEB SCRAPING IN PYTHON

  11. Brackets and Attrib u tes WEB SCRAPING IN PYTHON

  12. Brackets and Attrib u tes xpath = '//p[@class="class-1"]' WEB SCRAPING IN PYTHON

  13. Brackets and Attrib u tes xpath = '//*[@id="uid"]' WEB SCRAPING IN PYTHON

  14. Brackets and Attrib u tes xpath = '//div[@id="uid"]/p[2]' WEB SCRAPING IN PYTHON

  15. Content w ith Contains Xpath Contains Notation : contains ( @ a � ri - name , " string - e x pr " ) WEB SCRAPING IN PYTHON

  16. Contain This xpath = '//*[contains(@class,"class-1")]' WEB SCRAPING IN PYTHON

  17. Contain This xpath = '//*[@class="class-1"]' WEB SCRAPING IN PYTHON

  18. Get Class y xpath = '/html/body/div/p[2]' WEB SCRAPING IN PYTHON

  19. Get Class y xpath = '/html/body/div/p[2]/@class' WEB SCRAPING IN PYTHON

  20. End of the Path W E B SC R AP IN G IN P YTH ON

  21. Introd u ction to the scrap y Selector W E B SC R AP IN G IN P YTH ON Thomas Laetsch Data Scientist , NYU

  22. Setting u p a Selector from scrapy import Selector html = ''' <html> <body> <div class="hello datacamp"> <p>Hello World!</p> </div> <p>Enjoy DataCamp!</p> </body> </html> ''' sel = Selector( text = html ) Created a scrap y Selector object u sing a string w ith the html code The selector sel has selected the entire html doc u ment WEB SCRAPING IN PYTHON

  23. Selecting Selectors We can u se the xpath call w ithin a Selector to create ne w Selector s of speci � c pieces of the html code The ret u rn is a SelectorList of Selector objects sel.xpath("//p") # outputs the SelectorList: [<Selector xpath='//p' data='<p>Hello World!</p>'>, <Selector xpath='//p' data='<p>Enjoy DataCamp!</p>'>] WEB SCRAPING IN PYTHON

  24. E x tracting Data from a SelectorList Use the extract() method >>> sel.xpath("//p") out: [<Selector xpath='//p' data='<p>Hello World!</p>'>, <Selector xpath='//p' data='<p>Enjoy DataCamp!</p>'>] >>> sel.xpath("//p").extract() out: [ '<p>Hello World!</p>', '<p>Enjoy DataCamp!</p>' ] We can u se extract_first() to get the � rst element of the list >>> sel.xpath("//p").extract_first() out: '<p>Hello World!</p>' WEB SCRAPING IN PYTHON

  25. E x tracting Data from a Selector ps = sel.xpath('//p') second_p = ps[1] second_p.extract() out: '<p>Enjoy DataCamp!</p>' WEB SCRAPING IN PYTHON

  26. Select This Co u rse ! W E B SC R AP IN G IN P YTH ON

  27. " Inspecting the HTML " W E B SC R AP IN G IN P YTH ON Thomas Laetsch , PhD Data Scientist , NYU

  28. " So u rce " = HTML Code WEB SCRAPING IN PYTHON

  29. Inspecting Elements WEB SCRAPING IN PYTHON

  30. HTML te x t to Selector from scrapy import Selector import requests url = 'https://www.datacamp.com/courses/all' html = requests.get( url ).content sel = Selector( text = html ) WEB SCRAPING IN PYTHON

  31. Yo u Kno w O u r Secrets W E B SC R AP IN G IN P YTH ON

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend