Giter VIP home page Giter VIP logo

patentsview's Introduction

patentsview

An R client to the PatentsView API

R-CMD-check CRAN version

Installation

You can get the stable version from CRAN:

install.packages("patentsview")

Or the development version from GitHub:

if (!"devtools" %in% rownames(installed.packages())) 
  install.packages("devtools")

devtools::install_github("ropensci/patentsview")

Basic usage

The PatentsView API provides an interface to a disambiguated version of USPTO. The patentsview R package provides one main function, search_pv(), to make it easy to interact with the API:

library(patentsview)

search_pv(query = '{"_gte":{"patent_date":"2007-01-01"}}')
#> $data
#> #### A list with a single data frame on a patent level:
#> 
#> List of 1
#>  $ patents:'data.frame': 25 obs. of  3 variables:
#>   ..$ patent_id    : chr [1:25] "10000000" ...
#>   ..$ patent_number: chr [1:25] "10000000" ...
#>   ..$ patent_title : chr [1:25] "Coherent LADAR using intra-pixel quadrature "..
#> 
#> $query_results
#> #### Distinct entity counts across all downloadable pages of output:
#> 
#> total_patent_count = 100,000

Learning more

Head over to the package’s webpage for more info, including:

patentsview's People

Contributors

crew102 avatar jeroen avatar karthik avatar mustberuss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

patentsview's Issues

Need help for master thesis - R - patent applications and citations per firm per year

Hi,

I am currently working on my master thesis and came over this package for R which could potentially save me a lot of hours. I have a list of companies collected from the VentureXpert database. For these companies I need yearly (2010, 2011, ... , 2019) amount of patent applications, granted patents, and if possible total amount of forward patent citations (for that company that year).

Currently, there are 336 firms with 10 firm-years (i = 336, t=10).

As you may understand this will be used to help create my panel data, and as I am measuring innovative performance, the above will be different dependent variables I am looking into.

Firm 1 Year A Patents_1A
Firm 1 Year B Patents_1B
.
.
.
Firm X Year Y Patents_XY

So, my question is if anyone could help me with the code for collecting this data with this package? It seems my alternative is to manually go to the USPTO database and search up every firm every year. I do not know how to web scrape.

Thanks in advance!

xheader_er_or_status(resp) : Gateway Timeout (HTTP 504)

Hello,

currently I am trying to use the patentsview R-package to download some data, but I'm having trouble with the API. Here is a MWE which produces an Error in xheader_er_or_status(resp) : Gateway Timeout (HTTP 504):

data <- patentsview::search_pv(
  query = patentsview::qry_funs$contains(assignee_organization = "M System"),
  fields = c(
    "app_number",
    "app_country",
    "app_date",
    "assignee_country",
    "assignee_id",
    "assignee_organization",
    "cited_patent_number",
    "citedby_patent_number",
    "forprior_country",
    "forprior_date",
    "forprior_docnumber",
    "inventor_first_name",
    "inventor_last_name",
    "ipc_class"
  ),
  sort = c("app_date" = "asc"),
  endpoint = "patents",
  all_pages = TRUE
)

Is this the expected behaviour or a bug?
How can I download the data if it the expected behaviour, is there some kind of batch mode?
Can you fix the bug, if it is a bug?

Limit on number of returned patents

Seems like the maximum number of patents returned is 100k; any way around this for larger queries?

e.g.,

## Query result for page 10 w/ 10k patents per page
pvObj <- search_pv(
  query    = '{"_gte":{"patent_date":"2007-01-01"}}',
  per_page = 10000,
  page     = 10
)

nrow(pvObj$data$patents)
# 10000

## Query result for page 11
pvObj <- search_pv(
  query    = '{"_gte":{"patent_date":"2007-01-01"}}',
  per_page = 10000,
  page     = 11
)

nrow(pvObj$data$patents)
# NULL

Regarding API

I am a beginner in R. I followed the documentation that you have provided but I find it difficult to unnest the data. Can you please give a simple example of how to unnest the data. I want to see the result in tabular format.

with_qfuns should respect calling environment

with_qfuns doesn't find objects in the calling environment, only objects in qry_funs or the base environment.

library(patentsview)

create_query <- function(date_of_patent) {
  with_qfuns(
   and(
     gte(patent_date = date_of_patent),
     text_phrase(patent_abstract = c("computer program")),
     or(
       eq(inventor_last_name = "ihaka"),
       eq(inventor_first_name = "chris")
     )
   )
  )
}

create_query("2007-01-01")
#> Error in gte(patent_date = date_of_patent): object 'date_of_patent' not found

Created on 2020-03-24 by the reprex package (v0.3.0)

This alternate implementation of with_qfuns solves it:

with_qfuns <- function(code, envir = parent.frame()) {
  eval(substitute(code), qry_funs, envir)
}

create_query("2007-01-01")
#> {"_and":[{"_gte":{"patent_date":"2007-01-01"}},{"_text_phrase":{"patent_abstract":"computer program"}},{"_or":[{"_eq":{"inventor_last_name":"ihaka"}},{"_eq":{"inventor_first_name":"chris"}}]}]}

Getting issue when search query contains the ampersand character (&)

Hi, thanks for the useful package.

I have an issue when my query contains & when search for assignees.

result <- search_pv(
    query = with_qfuns(
        begins(assignee_organization = "Johnson & Johnson")
    ) ,
    fields = "assignee_organization",
    endpoint = "assignees",
    page = 1,
    per_page = 200
)

The error is: Error: 'q' param is not valid json

Simply removing it doesn't work as then it doesn't return the correct results.

Unexposed fields?

addme.txt
Hi, I'm a first time issue creator who's hoping I'm doing this correctly.

Someone posted in the patentsview api forum that the examiner_id is not present. It's a moderated list so my reply doesn't show up yet but I think the problem is here. The api does return the field in a browser request or in R doing a get on the endpoint. It's always coming back as null which is an api issue but shouldn't it be accessible in this package? I got the same error the poster got with the stable and dev packages.

I wrote a perl script to scrape and compare the api's endpoint pages to fieldsdf.csv It shows that the examiner fields are missing in fieldsdf.csv for all 7 endpoints. I've attached the fields my script thinks are missing. I could fork or pull something if this is an issue

github and R novice

Upcoming api endpoint url changes

Hi,
In July the patentsview team announced that they will be moving the API to AWS Beanstalk, effective September 1, 2020. I participated in pilot, where they removed /api from the endpoint urls. I thought it was only for the pilot but I just found out the change will be permanent. I asked if there will be redirects or if the old urls will return 404s but I haven't heard back yet. I thought I would mention it here, sorry for the short notice!

NULL data when changing the year in the example query

This is an example from the manual.

> search_pv(query = '{"_gt":{"patent_year":2007}}')
$data
#### A list with a single data frame on a patent level:

List of 1
 $ patents:'data.frame':	25 obs. of  3 variables:
  ..$ patent_id    : chr [1:25] "10000000" ...
  ..$ patent_number: chr [1:25] "10000000" ...
  ..$ patent_title : chr [1:25] "Coherent LADAR using intra-pixel quadrature detection" ...

$query_results
#### Distinct entity counts across all downloadable pages of output:

total_patent_count = 100,000

This is what I get when I switch 2007 to 2008.

> search_pv(query = '{"_gt":{"patent_year":2008}}')
$data
#### A list with a single data frame on a patent level:

List of 1
 $ patents: NULL

$query_results
#### Distinct entity counts across all downloadable pages of output:

total_patent_count = 0

I'm using the dev version of the package.

Handling the api's 400 and 500 errors for the locations endpoint

Possible ways to workaround the underlying api issue PatentsView/PatentsView-API#24

  1. Field validation is done in the api's executeQuery before the database is queried. It throws the 400 error since cpc_sequence is not present in entitySpecs for the location endpoint. It's present for 3 other endpoints (patents, assignees*, inventors*) and other cpc fields are present for the location endpoint. Until this is resolved cpc_sequence could be temporarily removed from fieldsdf here for the locations endpoint or return a custom error message if the field is specified in a locations query. get_fields("locations") should not return it. No locations query containing it can do anything but receive a 400 error if the api is called. The rest of the fieldsdf fields are present so no other 400 errors would be thrown by the api. Perhaps a PR in PatentsView-API is in order to correct this though it's the lesser of the two issues.

  2. as suggested by @crew102 in the above issue, react to a 500 error being thrown by the api and return a helpful error message if one or more troublesome field is present. I had initially thought that unmapping the fields would be the fastest/easiest fix but that was before I figured out the how many troublesome fields there are (identified in the above issue).

* as a side or potentially separate issue, cpc_sequence is not specified on the assignees or inventors endpoint web pages but can be returned on a query to those endpoints. I'll try scraping the api's entitySpecs and comparing it to fieldsdf.csv to see if there are more undocumented fields on other endpoints. get_fields() may be under-reporting!

Simple Query does not work anymore

Hello Team,

The following style of query used to work perfectly fine for me, but stopped working recently. What is wrong?

query <-
with_qfuns(
and(
gte(patent_date = "2020-01-01"),
lte(patent_date = "2022-12-31"),
begins(cpc_subgroup_id = "H02S")
))

fields <- c("patent_number", "patent_date" ,"assignee_organization")

pv_out <- search_pv(
query = query,
endpoint = "patents",
fields = fields,
all_pages = TRUE)

df <- unnest_pv_data(pv_out$data, "patent_number")

Thanks so much!

Travis job failures

I noticed that the Travis job failed after my PR due to the failure of two tests.

  1. test-unnest-pv-data.R:10: error:
    assignee_id cannot act as a primary key because it is not a unique identifier.

Try using assignee_id instead.

The test appears to be reasonable, ie it should work as written. Could the assert that's failing in unnest-pv-data.R be prefaced with an if(ok_pk != pk)? I did that locally and the test passed. Or is the test itself coded improperly?

  1. test-cast-pv-data.R:10: error: cast_pv_data casts data types as expected
    could not find function "fun_list"

It looks like the api's web pages added forprior_sequence and lawyer_sequence as "int" not "integer" on all seven endpoints. As a local hack I added "int" = as.integer, in get_cast_fun in cast-pv-data.R and the test passed. Possibly better to handle the int to integer in fieldsdf.R and/or throw an error when a type isn't recognized?

Not knowing the right way to fix either problem I didn't push either change.

Add cast_pv_data() function

  • Need function to automate process of converting the data returned by the API (which stores all vars as strings) to correct data types.
  • Function should use field data types found in fieldsdf
  • Suggested name is cast_pv_data()

Unexposed Fields - Coinventors

I believe this issue is similar to this one but for a different group:

#7

When I run this I get the list of inventor and coinventor fields

get_fields(endpoint = "inventors",
groups = c("coinventors", "inventors"))

But when I actually pass it into the searc_pv function it throws this error:

Error: Bad field(s): coinventor_city, coinventor_country, coinventor_first_name, coinventor_first_seen_date, coinventor_id, coinventor_last_name, coinventor_last_seen_date, coinventor_lastknown_city, coinventor_lastknown_country, coinventor_lastknown_latitude, coinventor_lastknown_location_id, coinventor_lastknown_longitude, coinventor_lastknown_state, coinventor_latitude, coinventor_location_id, coinventor_longitude, coinventor_num_patents_for_inventor, coinventor_total_num_patents, inventor_key_id

Add peer-review badge to the README?

@crew102 there's now a badge for packages that have been peer-reviewed. The patentsview one could be added to the README via

[![](http://badges.ropensci.org/112_status.svg)](https://github.com/ropensci/onboarding/issues/112)

In the future these badges will even be added before and during review, they will be automatically updated based on the review progress.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.