getpatent: Scraping patent data into Stata Demetris Christodoulou - PowerPoint PPT Presentation

getpatent: Scraping patent data into Stata Demetris Christodoulou (Sydney) Le Ma (UTS) Hadi Mostafavi (Sydney) Methodological and Empirical Advances in Financial Analysis (MEAFA) September 27, 2016 . . . . . .. . . . . . . .. . . . . . . .. . . . . . . .. . .. . . . . . .

getpatent: Scraping patent data into Stata Outline Problem question 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Outline Problem question 1 The HTML source code 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Outline Problem question 1 The HTML source code 2 Scraping source code into Stata 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Outline Problem question 1 The HTML source code 2 Scraping source code into Stata 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Create database of patent attributes To enable research in innovation activity and the generation of intangible assets, we require detailed data on the outcome of the innovation process - the most observable and measurable being the number of patents and quality measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Create database of patent attributes To enable research in innovation activity and the generation of intangible assets, we require detailed data on the outcome of the innovation process - the most observable and measurable being the number of patents and quality measures. Although patent data is public and freely searchable, regional patent offices have restrictions on access and their data is limited to basic patent bibliographic information e.g. identifiers, date, title, classification, applicants and inventors. Their free data does not include information on patent citations, legal claims, legal status etc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Create database of patent attributes To enable research in innovation activity and the generation of intangible assets, we require detailed data on the outcome of the innovation process - the most observable and measurable being the number of patents and quality measures. Although patent data is public and freely searchable, regional patent offices have restrictions on access and their data is limited to basic patent bibliographic information e.g. identifiers, date, title, classification, applicants and inventors. Their free data does not include information on patent citations, legal claims, legal status etc. The EPO (Europe) provides free raw patent data in XML format. The WIPO (World) allows downloads of up to 10 , 000 records. The SIPO (China) requires domestic account registration. The exception is USPTO which provides all data in tab-delimited format. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Create database of patent attributes To enable research in innovation activity and the generation of intangible assets, we require detailed data on the outcome of the innovation process - the most observable and measurable being the number of patents and quality measures. Although patent data is public and freely searchable, regional patent offices have restrictions on access and their data is limited to basic patent bibliographic information e.g. identifiers, date, title, classification, applicants and inventors. Their free data does not include information on patent citations, legal claims, legal status etc. The EPO (Europe) provides free raw patent data in XML format. The WIPO (World) allows downloads of up to 10 , 000 records. The SIPO (China) requires domestic account registration. The exception is USPTO which provides all data in tab-delimited format. There is also the issue of non-standardisation when working across multiple sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google Patent Search consolidates 87 million patent publications from 17 patent offices around the world including the US, Europe, Japan, China, South Korea, WIPO, Russia, Germany, The United Kingdom, Canada, France, Spain, Belgium, Denmark, Finland, Luxembourg, and the Netherlands. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google Patent Search consolidates 87 million patent publications from 17 patent offices around the world including the US, Europe, Japan, China, South Korea, WIPO, Russia, Germany, The United Kingdom, Canada, France, Spain, Belgium, Denmark, Finland, Luxembourg, and the Netherlands. This is free data and even though Google does not like mining its website, an efficient and careful code can scrape this information into a database. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google provides this data from several locations. The US servers are indexed in https://patents.google.com. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google provides this data from several locations. The US servers are indexed in https://patents.google.com. The US-based data is then mirrored onto local services, e.g. in Australia as https://www.google.com.au/patents, in Greece as https://www.google.gr/patents and so on. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google provides this data from several locations. The US servers are indexed in https://patents.google.com. The US-based data is then mirrored onto local services, e.g. in Australia as https://www.google.com.au/patents, in Greece as https://www.google.gr/patents and so on. There are two advantages in working with local servers: (1) they speak your language, (2) they give information for the ’cooperative’ classification scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Problem question Google Patent Search Google provides this data from several locations. The US servers are indexed in https://patents.google.com. The US-based data is then mirrored onto local services, e.g. in Australia as https://www.google.com.au/patents, in Greece as https://www.google.gr/patents and so on. There are two advantages in working with local servers: (1) they speak your language, (2) they give information for the ’cooperative’ classification scheme. The US server contains the more widely recognised standard for international classification for patents, and importantly for us it applies a more consistent structure in its source code making it easier to scrape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata The HTML source code Outline Problem question 1 The HTML source code 2 Scraping source code into Stata 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Christodoulou, Ma and Hadi getpatent: Scraping patent data into Stata

getpatent: Scraping patent data into Stata Demetris Christodoulou - PowerPoint PPT Presentation

getpatent: Scraping patent data into Stata Demetris Christodoulou (Sydney) Le Ma (UTS) Hadi Mostafavi (Sydney) Methodological and Empirical Advances in Financial Analysis (MEAFA) September 27, 2016 . . . . . .. . . . . . . .. . . . . .

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

NTTS 2015 - Session 6A Big data sources: web scraping and smart meters www.statistik.at Wir

video demo End-User Web Scraping: Google Scholar Edition Sarah Chasins data scraping tool

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Patent family - background Patent family - background Patent family - background 1883

5/25/2011 Patent Reform Topics Law & economic model for understanding patent law

Web Scraping and Text Mining with R Simon Munzert University of Konstanz October 2014 Web

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Session 6A - Big data sources: web scraping and smart meters Using Internet as a Data Source for

Analyzing interval-censored survival-time data in Stata Xiao Yang Senior Statistician and

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Canadian Stata Users Group

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Nordic and Baltic Stata

Unitary Patent in Europe & Unified Patent Court (UPC ) An overview and a comparison to the

c12) United States Patent US 8,054,952 Bl (10) Patent No.: Or-Bach et al. (45) Date of Patent:

c12) United States Patent US 8,548,135 Bl (10) Patent No.: Lavian et al. (45) Date of Patent:

Been there, scraped that Amit Sharma, Chenhao Tan Why do you want to scrape data? It is cool

Staleness and Isolation in Prometheus 2.0 Brian Brazil Founder Who am I? One of the core

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name="Julien

Scrapy and Elasticsearch: Powerful Web Scraping and Searching with Python Michael Regg Swiss

Efficient Literature Searches using Python Blair Bilodeau May 30, 2020 University of Toronto

Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer,

Data Collection Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

... and other open source software April 17, 2019 Data Council San Francisco, CA Greg

getpatent: Scraping patent data into Stata Demetris Christodoulou - PowerPoint PPT Presentation

getpatent: Scraping patent data into Stata Demetris Christodoulou (Sydney) Le Ma (UTS) Hadi Mostafavi (Sydney) Methodological and Empirical Advances in Financial Analysis (MEAFA) September 27, 2016 . . . . . .. . . . . . . .. . . . . .

Web Scraping 1 / 9 Web Scraping Two ways to mine data from the web The hard way, by web

NTTS 2015 - Session 6A Big data sources: web scraping and smart meters www.statistik.at Wir

video demo End-User Web Scraping: Google Scholar Edition Sarah Chasins data scraping tool

Bayesian hierarchical models in Stata Nikolay Balov StataCorp LP 2016 Stata Conference Nikolay

Patent family - background Patent family - background Patent family - background 1883

5/25/2011 Patent Reform Topics Law &amp; economic model for understanding patent law

Web Scraping and Text Mining with R Simon Munzert University of Konstanz October 2014 Web

Python applications in Stata 16 BPLIM 2020 Portuguese Stata Conference BPLIM Python

Bayesian Analysis using Stata Bill Rising StataCorp LP 2016 Brazilian Stata Users Group Meeting

Session 6A - Big data sources: web scraping and smart meters Using Internet as a Data Source for

Analyzing interval-censored survival-time data in Stata Xiao Yang Senior Statistician and

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Canadian Stata Users Group

Calibrating Survey Weights in Stata Jeff Pitblado StataCorp LLC 2018 Nordic and Baltic Stata

Unitary Patent in Europe &amp; Unified Patent Court (UPC ) An overview and a comparison to the

c12) United States Patent US 8,054,952 Bl (10) Patent No.: Or-Bach et al. (45) Date of Patent:

c12) United States Patent US 8,548,135 Bl (10) Patent No.: Lavian et al. (45) Date of Patent:

Been there, scraped that Amit Sharma, Chenhao Tan Why do you want to scrape data? It is cool

Staleness and Isolation in Prometheus 2.0 Brian Brazil Founder Who am I? One of the core

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name=&quot;Julien

Scrapy and Elasticsearch: Powerful Web Scraping and Searching with Python Michael Regg Swiss

Efficient Literature Searches using Python Blair Bilodeau May 30, 2020 University of Toronto

Using Prometheus with InfluxDB for metrics storage Roman Vynar Senior Site Reliability Engineer,

Data Collection Duen Horng (Polo) Chau Assistant Professor Associate Director, MS Analytics

... and other open source software April 17, 2019 Data Council San Francisco, CA Greg

5/25/2011 Patent Reform Topics Law & economic model for understanding patent law

Unitary Patent in Europe & Unified Patent Court (UPC ) An overview and a comparison to the

Relabeling Julien Pivotto (@roidelapluie) PromConf Munich August 9, 2017 user{name="Julien