ropensci / patentsview Goto Github PK

View Code? Open in Web Editor NEW

31.0 11.0 9.0 7.43 MB

An R client to the PatentsView API

Home Page: https://docs.ropensci.org/patentsview

License: Other

Makefile 2.08% R 97.84% CSS 0.08%

patents uspto patentsview-api r rstats patentsview r-package peer-reviewed

patentsview's Introduction

patentsview

An R client to the PatentsView API

Installation

You can get the stable version from CRAN:

install.packages("patentsview")

Or the development version from GitHub:

if (!"devtools" %in% rownames(installed.packages())) 
  install.packages("devtools")

devtools::install_github("ropensci/patentsview")

Basic usage

The PatentsView API provides an interface to a disambiguated version of USPTO. The patentsview R package provides one main function, search_pv(), to make it easy to interact with the API:

library(patentsview)

search_pv(query = '{"_gte":{"patent_date":"2007-01-01"}}')
#> $data
#> #### A list with a single data frame on a patent level:
#> 
#> List of 1
#>  $ patents:'data.frame': 25 obs. of  3 variables:
#>   ..$ patent_id    : chr [1:25] "10000000" ...
#>   ..$ patent_number: chr [1:25] "10000000" ...
#>   ..$ patent_title : chr [1:25] "Coherent LADAR using intra-pixel quadrature "..
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_patent_count = 100,000

Learning more

Head over to the package’s webpage for more info, including:

A getting started vignette for first-time users. The package was also introduced in an rOpenSci blog post.
An in-depth tutorial on writing queries
A list of basic examples
Two examples of data applications (e.g., a brief analysis of the top assignees in the field of databases)

patentsview's People

Contributors

Stargazers

Watchers

Forkers

crew102 ladzzzz123 mustberuss firefoxxy8 tlcaputi zeonium orgpatentroot avahanhan

patentsview's Issues

Need help for master thesis - R - patent applications and citations per firm per year

Hi,

I am currently working on my master thesis and came over this package for R which could potentially save me a lot of hours. I have a list of companies collected from the VentureXpert database. For these companies I need yearly (2010, 2011, ... , 2019) amount of patent applications, granted patents, and if possible total amount of forward patent citations (for that company that year).

Currently, there are 336 firms with 10 firm-years (i = 336, t=10).

As you may understand this will be used to help create my panel data, and as I am measuring innovative performance, the above will be different dependent variables I am looking into.

Firm 1 Year A Patents_1A
Firm 1 Year B Patents_1B
.
.
.
Firm X Year Y Patents_XY

So, my question is if anyone could help me with the code for collecting this data with this package? It seems my alternative is to manually go to the USPTO database and search up every firm every year. I do not know how to web scrape.

Thanks in advance!

xheader_er_or_status(resp) : Gateway Timeout (HTTP 504)

Hello,

currently I am trying to use the patentsview R-package to download some data, but I'm having trouble with the API. Here is a MWE which produces an Error in xheader_er_or_status(resp) : Gateway Timeout (HTTP 504):

data <- patentsview::search_pv(
  query = patentsview::qry_funs$contains(assignee_organization = "M System"),
  fields = c(
    "app_number",
    "app_country",
    "app_date",
    "assignee_country",
    "assignee_id",
    "assignee_organization",
    "cited_patent_number",
    "citedby_patent_number",
    "forprior_country",
    "forprior_date",
    "forprior_docnumber",
    "inventor_first_name",
    "inventor_last_name",
    "ipc_class"
  ),
  sort = c("app_date" = "asc"),
  endpoint = "patents",
  all_pages = TRUE
)

Is this the expected behaviour or a bug?
How can I download the data if it the expected behaviour, is there some kind of batch mode?
Can you fix the bug, if it is a bug?

Add retrieve_linked_data to pkgdown reference index configuration

👋 @crew102! Happy New Year! We see an error via https://ropensci.r-universe.dev/ui#builds when trying to build docs for this package, and via pkgdown::check_pkgdown().

Limit on number of returned patents

Seems like the maximum number of patents returned is 100k; any way around this for larger queries?

e.g.,

## Query result for page 10 w/ 10k patents per page
pvObj <- search_pv(
  query    = '{"_gte":{"patent_date":"2007-01-01"}}',
  per_page = 10000,
  page     = 10
)

nrow(pvObj$data$patents)
# 10000

## Query result for page 11
pvObj <- search_pv(
  query    = '{"_gte":{"patent_date":"2007-01-01"}}',
  per_page = 10000,
  page     = 11
)

nrow(pvObj$data$patents)
# NULL

Regarding API

I am a beginner in R. I followed the documentation that you have provided but I find it difficult to unnest the data. Can you please give a simple example of how to unnest the data. I want to see the result in tabular format.

with_qfuns should respect calling environment

with_qfuns doesn't find objects in the calling environment, only objects in qry_funs or the base environment.

library(patentsview)

create_query <- function(date_of_patent) {
  with_qfuns(
   and(
     gte(patent_date = date_of_patent),
     text_phrase(patent_abstract = c("computer program")),
     or(
       eq(inventor_last_name = "ihaka"),
       eq(inventor_first_name = "chris")
     )
   )
  )
}

create_query("2007-01-01")
#> Error in gte(patent_date = date_of_patent): object 'date_of_patent' not found

^{Created on 2020-03-24 by the reprex package (v0.3.0)}

This alternate implementation of with_qfuns solves it:

with_qfuns <- function(code, envir = parent.frame()) {
  eval(substitute(code), qry_funs, envir)
}

create_query("2007-01-01")
#> {"_and":[{"_gte":{"patent_date":"2007-01-01"}},{"_text_phrase":{"patent_abstract":"computer program"}},{"_or":[{"_eq":{"inventor_last_name":"ihaka"}},{"_eq":{"inventor_first_name":"chris"}}]}]}

Getting issue when search query contains the ampersand character (&)

Hi, thanks for the useful package.

I have an issue when my query contains & when search for assignees.

result <- search_pv(
    query = with_qfuns(
        begins(assignee_organization = "Johnson & Johnson")
    ) ,
    fields = "assignee_organization",
    endpoint = "assignees",
    page = 1,
    per_page = 200
)

The error is: Error: 'q' param is not valid json

Simply removing it doesn't work as then it doesn't return the correct results.

Unexposed fields?

addme.txt
Hi, I'm a first time issue creator who's hoping I'm doing this correctly.

Someone posted in the patentsview api forum that the examiner_id is not present. It's a moderated list so my reply doesn't show up yet but I think the problem is here. The api does return the field in a browser request or in R doing a get on the endpoint. It's always coming back as null which is an api issue but shouldn't it be accessible in this package? I got the same error the poster got with the stable and dev packages.

I wrote a perl script to scrape and compare the api's endpoint pages to fieldsdf.csv It shows that the examiner fields are missing in fieldsdf.csv for all 7 endpoints. I've attached the fields my script thinks are missing. I could fork or pull something if this is an issue

github and R novice

api fixed the locations endpoint

PatentsView/PatentsView-API#29 has been fixed so cpc_sequence can be added back to the locations endpoint. I changed fieldsdf.R in my fork and ran it but it says a pr won't automatically merge so I didn't submit it. It should probably start out as an issue first anyway.

Upcoming api endpoint url changes

Hi,
In July the patentsview team announced that they will be moving the API to AWS Beanstalk, effective September 1, 2020. I participated in pilot, where they removed /api from the endpoint urls. I thought it was only for the pilot but I just found out the change will be permanent. I asked if there will be redirects or if the old urls will return 404s but I haven't heard back yet. I thought I would mention it here, sorry for the short notice!

NULL data when changing the year in the example query

This is an example from the manual.

> search_pv(query = '{"_gt":{"patent_year":2007}}')
$data
#### A list with a single data frame on a patent level:

List of 1
 $ patents:'data.frame':	25 obs. of  3 variables:
  ..$ patent_id    : chr [1:25] "10000000" ...
  ..$ patent_number: chr [1:25] "10000000" ...
  ..$ patent_title : chr [1:25] "Coherent LADAR using intra-pixel quadrature detection" ...

$query_results
#### Distinct entity counts across all downloadable pages of output:

total_patent_count = 100,000

This is what I get when I switch 2007 to 2008.

> search_pv(query = '{"_gt":{"patent_year":2008}}')
$data
#### A list with a single data frame on a patent level:

List of 1
 $ patents: NULL

$query_results
#### Distinct entity counts across all downloadable pages of output:

total_patent_count = 0

I'm using the dev version of the package.

underlying api change to https

The patentsview api has gone https! They've added a redirect but post parameters would get lost. See https://www.patentsview.org/community/forum/7/topic/150. Thought you'd like to know.

Documentation website returns 404

Hello,

The getting started, writing queries , examples and top assignees pages of https://docs.ropensci.org/patentsview/index.html are returning a 404 error.

Is there any possibility to get those pages back or find them somewhere else?

Update r object docs with new groups added

Docs fail to build

Fails with:

Quitting from lines 20-44 (citation-networks.Rmd) 
Error : Query Internal Error

See https://dev.ropensci.org/blue/organizations/jenkins/patentsview/detail/patentsview/116/pipeline (or click on the red checkmark behind your last commit on github)

If your vignettes require special keys, you could consider to precompute your vignettes.

Handling the api's 400 and 500 errors for the locations endpoint

Possible ways to workaround the underlying api issue PatentsView/PatentsView-API#24

Field validation is done in the api's executeQuery before the database is queried. It throws the 400 error since cpc_sequence is not present in entitySpecs for the location endpoint. It's present for 3 other endpoints (patents, assignees*, inventors*) and other cpc fields are present for the location endpoint. Until this is resolved cpc_sequence could be temporarily removed from fieldsdf here for the locations endpoint or return a custom error message if the field is specified in a locations query. get_fields("locations") should not return it. No locations query containing it can do anything but receive a 400 error if the api is called. The rest of the fieldsdf fields are present so no other 400 errors would be thrown by the api. Perhaps a PR in PatentsView-API is in order to correct this though it's the lesser of the two issues.
as suggested by @crew102 in the above issue, react to a 500 error being thrown by the api and return a helpful error message if one or more troublesome field is present. I had initially thought that unmapping the fields would be the fastest/easiest fix but that was before I figured out the how many troublesome fields there are (identified in the above issue).

* as a side or potentially separate issue, cpc_sequence is not specified on the assignees or inventors endpoint web pages but can be returned on a query to those endpoints. I'll try scraping the api's entitySpecs and comparing it to fieldsdf.csv to see if there are more undocumented fields on other endpoints. get_fields() may be under-reporting!

Simple Query does not work anymore

Hello Team,

The following style of query used to work perfectly fine for me, but stopped working recently. What is wrong?

query <-
with_qfuns(
and(
gte(patent_date = "2020-01-01"),
lte(patent_date = "2022-12-31"),
begins(cpc_subgroup_id = "H02S")
))

fields <- c("patent_number", "patent_date" ,"assignee_organization")

pv_out <- search_pv(
query = query,
endpoint = "patents",
fields = fields,
all_pages = TRUE)

df <- unnest_pv_data(pv_out$data, "patent_number")

Thanks so much!

API requests result in errors for some field lists

Travis job failures

I noticed that the Travis job failed after my PR due to the failure of two tests.

test-unnest-pv-data.R:10: error:
assignee_id cannot act as a primary key because it is not a unique identifier.

Try using assignee_id instead.

The test appears to be reasonable, ie it should work as written. Could the assert that's failing in unnest-pv-data.R be prefaced with an if(ok_pk != pk)? I did that locally and the test passed. Or is the test itself coded improperly?

test-cast-pv-data.R:10: error: cast_pv_data casts data types as expected
could not find function "fun_list"

It looks like the api's web pages added forprior_sequence and lawyer_sequence as "int" not "integer" on all seven endpoints. As a local hack I added "int" = as.integer, in get_cast_fun in cast-pv-data.R and the test passed. Possibly better to handle the int to integer in fieldsdf.R and/or throw an error when a type isn't recognized?

Not knowing the right way to fix either problem I didn't push either change.

Add cast_pv_data() function

Need function to automate process of converting the data returned by the API (which stores all vars as strings) to correct data types.
Function should use field data types found in fieldsdf
Suggested name is cast_pv_data()

tests are getting skipped on appveyor

need to export NOT_CRAN=true

Unexposed Fields - Coinventors

I believe this issue is similar to this one but for a different group:

When I run this I get the list of inventor and coinventor fields

get_fields(endpoint = "inventors",
groups = c("coinventors", "inventors"))

But when I actually pass it into the searc_pv function it throws this error:

Error: Bad field(s): coinventor_city, coinventor_country, coinventor_first_name, coinventor_first_seen_date, coinventor_id, coinventor_last_name, coinventor_last_seen_date, coinventor_lastknown_city, coinventor_lastknown_country, coinventor_lastknown_latitude, coinventor_lastknown_location_id, coinventor_lastknown_longitude, coinventor_lastknown_state, coinventor_latitude, coinventor_location_id, coinventor_longitude, coinventor_num_patents_for_inventor, coinventor_total_num_patents, inventor_key_id

Add peer-review badge to the README?

@crew102 there's now a badge for packages that have been peer-reviewed. The patentsview one could be added to the README via

[![](http://badges.ropensci.org/112_status.svg)](https://github.com/ropensci/onboarding/issues/112)

In the future these badges will even be added before and during review, they will be automatically updated based on the review progress.