Giter VIP home page Giter VIP logo

patent_client's Introduction

patent_client_logo

Build codecov Documentation

PyPI PyPI - Python Versions Downloads Pydantic v2

Summary

A powerful library for accessing intellectual property, featuring:

  • ๐Ÿฐ Ease of use: All sources use a simple unified API inspired by Django-ORM.
  • ๐Ÿผ Pandas Integration: Results are easily castable to Pandas Dataframes and Series.
  • ๐Ÿš€ Performance: Fetched data is retrieved using the httpx library with native HTTP/2 and asyncio support, and cached using the hishel library for super-fast queries, and yankee for data extraction.
  • ๐ŸŒ Async/Await Support: All API's (optionally!) support the async/await syntax.
  • ๐Ÿ”ฎ Pydantic v2 Support: All models retrieved are Pydantic v2 models with all the goodness that comes with them!

Docs, including a fulsome Getting Started and User Guide are available on Read the Docs. The Examples folder includes examples of using patent_client for many common IP tasks

โญ New in v5 โญ

Version 5 brings a new and more reliable way to provide synchronous and asynchronous access to the various APIs. In version 5, like in version 3, you can just from patent_client import [Model] and get a synchronous version of the model. No asynchronous methods or functionality at all. Or you can do from patent_client._async import [Model] and get an asynchronous version of the model.

Version 5 also brings support for the USPTO's new Open Data Portal, a system currently in beta that is scheduled to replace the current Patent Examination Data System in late 2024.

Coverage

  • Free software: Apache Software License 2.0

Installation

pip install patent_client

If you only want access to USPTO resources, you're done! However, additional setup is necessary to access EPO Inpadoc and EPO Register resources. See the Docs.

Quick Start

To use the project:

# Import the model classes you need
>>> from patent_client import Inpadoc, Assignment, USApplication

# Fetch US Applications
>>> app = USApplication.objects.get('15710770')
>>> app.patent_title
'Camera Assembly with Concave-Shaped Front Face'

# Fetch from USPTO Assignments
>>> assignments = Assignment.objects.filter(assignee='Google')
>>> len(assignments) > 23000
True
>>> assignment = Assignment.objects.get('47086-788')
>>> assignment.conveyance_text
'ASSIGNMENT OF ASSIGNORS INTEREST'

# Fetch from INPADOC
>>> pub = Inpadoc.objects.get('EP3082535A1')
>>> pub.biblio.title
'AUTOMATIC FLUID DISPENSER'

Async Quick Start

To use async with Patent Client, just import the classes you need from the async module. All methods and iterators that access data or create a network request are asynchronous.

from patent_client._async import USApplication

apps = list()
async for app in USApplication.objects.filter(first_named_applicant="Google"):
  apps.append(app)

app = await USApplication.objects.aget("16123456")

Documentation

Docs, including a fulsome Getting Started are available on Read the Docs.

Development

To run the all tests run:

pytest

A developer guide is provided in the Documentation. Pull requests welcome!

Related projects

patent_client's People

Contributors

dependabot[bot] avatar diederikmath avatar grimmer0125 avatar hoffmabc avatar ianvdl avatar jwcook avatar kenneththompson avatar parkerhancock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

patent_client's Issues

issue with running the demonstration code

Hello,

First, thank you for building API.

Second, I was testing the following demonstration code and its giving me a weird error. I believe the error occurs when the patent_client module tries to interact with the European Patent Office's (EPO) Open Patent Services (OPS) API.

Demonstration code:

from patent_client import USApplication

google_apps = USApplication.objects.filter(first_named_applicant='Google LLC')

print(len(google_apps) > 1000)

Error:

File "C:\Users\ravit\PycharmProjects\patentApplication\simple_patent_app_retriever.py", line 1, in <module> from patent_client import USApplication File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\__init__.py", line 44, in <module> from patent_client.epo.ops.published.model import Inpadoc # isort:skip File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\epo\ops\__init__.py", line 2, in <module> from .legal.model import Legal File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\epo\ops\legal\__init__.py", line 3, in <module> generate_legal_code_db() File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\epo\ops\legal\national_codes.py", line 30, in generate_legal_code_db path = get_spreadsheet() File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\epo\ops\legal\national_codes.py", line 50, in get_spreadsheet response = session.get(url) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 600, in get return self.request("GET", url, **kwargs) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\epo\ops\session.py", line 22, in request response = super(OpsSession, self).request(*args, **kwargs) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests_cache\session.py", line 115, in request return super().request(method, url, *args, **kwargs) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests\sessions.py", line 587, in request resp = self.send(prep, **send_kwargs) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests_cache\session.py", line 147, in send response = self._send_and_cache(request, actions, **kwargs) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests_cache\session.py", line 193, in _send_and_cache self.cache.save_response(response, actions.cache_key, actions.expires) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests_cache\policy\actions.py", line 112, in expires return get_expiration_datetime(self.expire_after) File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\requests_cache\policy\actions.py", line 170, in get_expiration_datetime raise ValueError(f'Invalid HTTP date: {expire_after}') ValueError: Invalid HTTP date: 3 days

Update on timing

My team and I are hoping to use patent_client for our school project and are wondering if you have a time estimate as to when this package will be fully updated and ready for use. Thank you!

when the result of querying patft only match 1 patent, it will throw a exception

This is copied from #45 (comment) and just afarid that you did not notice this.

I tried the same query condition on patft and there is only one search result. The special thing is that it will automatically redirect to the patent page. If using a python script to query, the result is a small html page contains single document this word and the patent URL listed inside but it is not the predicted result page. Therefore, it will throw an exception here

            self.num_results = int(soup.find_all("i")[1].find_all("strong")[-1].text)

I made some workaround code here which is not beautiful and just as a reference, grimmer0125@8943ca9#diff-b6dbafaca666db361cb7b2b2266719b822e2dcecd53671c5f29dcf92a6590799R158

XML error while finding assignment objectives - macOS M1 - VS Code Python 3.10

The assigned apps is either a single value, or a list of values if more than one property was assigned

assigned_apps = Assignment.objects.filter(assignee="Tesla Motors").explode('properties').values_list('appl_id', flat=True).to_list()

{
"name": "XMLSyntaxError",
"message": "Opening and ending tag mismatch: HR line 55 and body, line 55, column 189 (, line 55)",
"stack": "Traceback \u001b[0;36m(most recent call last)\u001b[0m:\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/IPython/core/interactiveshell.py:3433\u001b[0m in \u001b[1;35mrun_code\u001b[0m\n exec(code_obj, self.user_global_ns, self.user_ns)\u001b[0m\n\n\u001b[0m Cell \u001b[1;32mIn[9], line 2\u001b[0m\n assigned_apps = Assignment.objects.filter(assignee="Alphabet Inc").explode('properties').values_list('appl_id', flat=True).to_list()\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/util/base/collections.py:31\u001b[0m in \u001b[1;35mto_list\u001b[0m\n return ListManager(self)\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/util/base/collections.py:149\u001b[0m in \u001b[1;35m__iter__\u001b[0m\n for row in super(ValuesListManager, self).iter():\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/util/base/collections.py:132\u001b[0m in \u001b[1;35m__iter__\u001b[0m\n for item in self.manager:\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/util/base/collections.py:88\u001b[0m in \u001b[1;35m__iter__\u001b[0m\n for row in self.iterable:\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/util/base/manager.py:60\u001b[0m in \u001b[1;35m__iter__\u001b[0m\n for item in self.get_results():\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/uspto/assignment/manager.py:50\u001b[0m in \u001b[1;35m_get_results\u001b[0m\n num_pages = math.ceil(len(self) / self.page_size)\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/uspto/assignment/manager.py:100\u001b[0m in \u001b[1;35m__len_\u001b[0m\n self.get_page(0)\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/uspto/assignment/manager.py:117\u001b[0m in \u001b[1;35mget_page\u001b[0m\n result = self.schema.load(text)\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/yankee/base/deserializer.py:101\u001b[0m in \u001b[1;35mload\u001b[0m\n pre_obj = self.pre_load(obj)\u001b[0m\n\n\u001b[0m File \u001b[1;32m/opt/homebrew/Caskroom/miniforge/base/lib/python3.10/site-packages/patent_client/uspto/assignment/schema.py:134\u001b[0m in \u001b[1;35mpre_load\u001b[0m\n tree = ET.fromstring(text.encode())\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/etree.pyx:3254\u001b[0m in \u001b[1;35mlxml.etree.fromstring\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/parser.pxi:1913\u001b[0m in \u001b[1;35mlxml.etree._parseMemoryDocument\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/parser.pxi:1800\u001b[0m in \u001b[1;35mlxml.etree._parseDoc\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/parser.pxi:1141\u001b[0m in \u001b[1;35mlxml.etree._BaseParser._parseDoc\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/parser.pxi:615\u001b[0m in \u001b[1;35mlxml.etree._ParserContext._handleParseResultDoc\u001b[0m\n\n\u001b[0m File \u001b[1;32msrc/lxml/parser.pxi:725\u001b[0m in \u001b[1;35mlxml.etree._handleParseResult\u001b[0m\n\n\u001b[0;36m File \u001b[0;32msrc/lxml/parser.pxi:654\u001b[0;36m in \u001b[0;35mlxml.etree._raiseParseError\u001b[0;36m\n\n\u001b[0;36m File \u001b[0;32m:55\u001b[0;36m\u001b[0m\n\u001b[0;31mXMLSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m Opening and ending tag mismatch: HR line 55 and body, line 55, column 189\n"
}

TypeError: unsupported type for timedelta days component: str

Hi Parker, thanks for fixing the previous issue. However, I am now getting a TypeError when I try to run:

app = USApplication.objects.get('15710770')

Here's the error:

Traceback (most recent call last): File "C:\Users\ravit\PycharmProjects\patentApplication\simple_patent_app_retriever.py", line 1, in <module> from patent_client import USApplication File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\__init__.py", line 41, in <module> session = PatentClientSession() File "C:\Users\ravit\AppData\Local\Programs\Python\Python39\lib\site-packages\patent_client\session.py", line 15, in __init__ expire_after=datetime.timedelta(days=SETTINGS.CACHE.MAX_AGE), TypeError: unsupported type for timedelta days component: str

I think the datetime.timedelta() function of the session.py module is expecting an integer for the days argument but it is being passed as a string.

version 2.0 no longer supports PtabTrial?

The docs need to be updated to reflect that "PtabTrial" is no longer used:

import patent_client
patent_client.version
'2.0.1'
from patent_client import PtabTrial
Traceback (most recent call last):
File "", line 1, in
ImportError: cannot import name 'PtabTrial' from 'patent_client'

Looks like PtabProceeding can replace PtabTrial in the old documentation and everything works fine.

Missing dependency: attrs

Hi

I had to pip install attrs to get import patent_client working after pip install patent_client . So I'm pretty sure that this dependency is missing in setup.py.

time data '202307' does not match format '%Y%m%d'

When I import the package, there raises an error that:
time data '202307' does not match format '%Y%m%d'

from patent_client import Inpadoc, Assignment, USApplication, PatentBiblio

ValueError Traceback (most recent call last)
Cell In[25], line 1
----> 1 import patent_client

File c:\Users\.conda\envs\patent_data\lib\site-packages\patent_client_init_.py:51
46 import yankee
48 yankee.use_model = True
---> 51 from patent_client.epo.ops.published.model import Inpadoc # isort:skip
53 # from patent_client.usitc.model import ITCAttachment
54 # from patent_client.usitc.model import ITCDocument
55 # from patent_client.usitc.model import ITCInvestigation
56 from patent_client.uspto.assignment.model import Assignment # isort:skip

File c:\Users\.conda\envs\patent_data\lib\site-packages\patent_client\epo\ops_init_.py:2
1 from .family.model import Family
----> 2 from .legal.model import Legal
3 from .published.model import Images
4 from .published.model import Inpadoc

File c:\Users\.conda\envs\patent_data\lib\site-packages\patent_client\epo\ops\legal_init_.py:3
1 from .national_codes import generate_legal_code_db
----> 3 generate_legal_code_db()

File c:\Users\.conda\envs\patent_data\lib\site-packages\patent_client\epo\ops\legal\national_codes.py:25, in generate_legal_code_db()
24 def generate_legal_code_db():
---> 25 current = has_current_spreadsheet()
26 if current:
27 logger.debug("Legal Code Database is Current - skipping database creation")

File c:\Users\.conda\envs\patent_data\lib\site-packages\patent_client\epo\ops\legal\national_codes.py:41, in has_current_spreadsheet()
39 fname = cur.execute("SELECT * FROM meta").fetchone()[0]
40 date_string = re.search(r"legal_code_descriptions_(\d+).xlsx", fname).group(1)
---> 41 date = datetime.datetime.strptime(date_string, "%Y%m%d").date()
42 age = datetime.datetime.now().date() - date
43 logger.debug(f"Legal Code Database is {age} days old")

File c:\Users\.conda\envs\patent_data\lib_strptime.py:568, in _strptime_datetime(cls, data_string, format)
565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)

File c:\Users\.conda\envs\patent_data\lib_strptime.py:349, in _strptime(data_string, format)
347 found = format_regex.match(data_string)
348 if not found:
--> 349 raise ValueError("time data %r does not match format %r" %
350 (data_string, format))
351 if len(data_string) != found.end():
352 raise ValueError("unconverted data remains: %s" %
353 data_string[found.end():])

ValueError: time data '202307' does not match format '%Y%m%d'

I have reinstalled my Conda and Python, but this issue cannot be solved.
Could you give me some hints on how to handle this issue?

Broken CI/CD Pipeline

I'm on a fresh conda environment of Python 3.11.3. Using pip install patent_client works perfectly and install everything needed.
When running "import pantent_client" in python I get the following error:

No module named 'patent_client'

But whe running the same line in a Jupyter notebook I get:


IncompleteRead Traceback (most recent call last)
File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:710, in HTTPResponse._error_catcher(self)
709 try:
--> 710 yield
712 except SocketTimeout as e:
713 # FIXME: Ideally we'd like to include the url in the ReadTimeoutError but
714 # there is yet no clean way to get at it from this context.

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:835, in HTTPResponse._raw_read(self, amt)
825 if (
826 self.enforce_content_length
827 and self.length_remaining is not None
(...)
833 # raised during streaming, so all calls with incorrect
834 # Content-Length are caught.
--> 835 raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
837 if data:

IncompleteRead: IncompleteRead(575718 bytes read, -287859 more expected)

The above exception was the direct cause of the following exception:

ProtocolError Traceback (most recent call last)
File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests/models.py:816, in Response.iter_content..generate()
815 try:
--> 816 yield from self.raw.stream(chunk_size, decode_content=True)
817 except ProtocolError as e:

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:940, in HTTPResponse.stream(self, amt, decode_content)
939 while not is_fp_closed(self._fp) or len(self._decoded_buffer) > 0:
--> 940 data = self.read(amt=amt, decode_content=decode_content)
942 if data:

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:911, in HTTPResponse.read(self, amt, decode_content, cache_content)
907 while len(self._decoded_buffer) < amt and data:
908 # TODO make sure to initially read enough data to get past the headers
909 # For example, the GZ file header takes 10 bytes, we don't want to read
910 # it one byte at a time
--> 911 data = self._raw_read(amt)
912 decoded_data = self._decode(data, decode_content, flush_decoder)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:813, in HTTPResponse._raw_read(self, amt)
811 fp_closed = getattr(self._fp, "closed", False)
--> 813 with self._error_catcher():
814 data = self._fp_read(amt) if not fp_closed else b""

File ~/anaconda3/envs/me2drugs/lib/python3.11/contextlib.py:155, in _GeneratorContextManager.exit(self, typ, value, traceback)
154 try:
--> 155 self.gen.throw(typ, value, traceback)
156 except StopIteration as exc:
157 # Suppress StopIteration unless it's the same exception that
158 # was passed to throw(). This prevents a StopIteration
159 # raised inside the "with" statement from being suppressed.

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/urllib3/response.py:727, in HTTPResponse._error_catcher(self)
725 except (HTTPException, OSError) as e:
726 # This includes IncompleteRead.
--> 727 raise ProtocolError(f"Connection broken: {e!r}", e) from e
729 # If no exception is thrown, we should avoid cleaning up
730 # unnecessarily.

ProtocolError: ('Connection broken: IncompleteRead(575718 bytes read, -287859 more expected)', IncompleteRead(575718 bytes read, -287859 more expected))

During handling of the above exception, another exception occurred:

ChunkedEncodingError Traceback (most recent call last)
Cell In[21], line 1
----> 1 import patent_client

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/init.py:51
46 import yankee
48 yankee.use_model = True
---> 51 from patent_client.epo.ops.published.model import Inpadoc # isort:skip
53 # from patent_client.usitc.model import ITCAttachment
54 # from patent_client.usitc.model import ITCDocument
55 # from patent_client.usitc.model import ITCInvestigation
56 from patent_client.uspto.assignment.model import Assignment # isort:skip

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/epo/ops/init.py:2
1 from .family.model import Family
----> 2 from .legal.model import Legal
3 from .published.model import Images
4 from .published.model import Inpadoc

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/epo/ops/legal/init.py:3
1 from .national_codes import generate_legal_code_db
----> 3 generate_legal_code_db()

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/epo/ops/legal/national_codes.py:31, in generate_legal_code_db()
29 else:
30 logger.debug("Legal Code Database is out of date - creating legal code database")
---> 31 path = get_spreadsheet()
32 create_code_database(path)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/epo/ops/legal/national_codes.py:62, in get_spreadsheet()
60 if out_path.exists():
61 return out_path
---> 62 response = session.get(excel_url, stream=True)
63 with out_path.open("wb") as f:
64 for chunk in response.iter_content(chunk_size=8192):

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests/sessions.py:602, in Session.get(self, url, **kwargs)
594 r"""Sends a GET request. Returns :class:Response object.
595
596 :param url: URL for the new :class:Request object.
597 :param **kwargs: Optional arguments that request takes.
598 :rtype: requests.Response
599 """
601 kwargs.setdefault("allow_redirects", True)
--> 602 return self.request("GET", url, **kwargs)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/patent_client/epo/ops/session.py:34, in OpsSession.request(self, *args, **kwargs)
33 def request(self, *args, **kwargs):
---> 34 response = super(OpsSession, self).request(*args, **kwargs)
35 if response.status_code in (403, 400):
36 auth_response = self.get_token()

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/session.py:115, in CacheMixin.request(self, method, url, expire_after, *args, **kwargs)
112 kwargs['headers']['Cache-Control'] = f'max-age={get_expiration_seconds(expire_after)}'
114 with patch_form_boundary(**kwargs):
--> 115 return super().request(method, url, *args, **kwargs)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests/sessions.py:589, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
584 send_kwargs = {
585 "timeout": timeout,
586 "allow_redirects": allow_redirects,
587 }
588 send_kwargs.update(settings)
--> 589 resp = self.send(prep, **send_kwargs)
591 return resp

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/session.py:147, in CacheMixin.send(self, request, expire_after, **kwargs)
145 # If the response is expired or missing, or the cache is disabled, then fetch a new response
146 if cached_response is None:
--> 147 response = self._send_and_cache(request, actions, **kwargs)
148 elif is_expired and self.stale_if_error:
149 response = self._resend_and_ignore(request, actions, cached_response, **kwargs)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/session.py:193, in CacheMixin._send_and_cache(self, request, actions, cached_response, **kwargs)
190 actions.update_from_response(response)
192 if self._is_cacheable(response, actions):
--> 193 self.cache.save_response(response, actions.cache_key, actions.expires)
194 elif cached_response and response.status_code == 304:
195 return self._update_revalidated_response(actions, response, cached_response)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/backends/base.py:101, in BaseCache.save_response(self, response, cache_key, expires)
93 """Save a response to the cache
94
95 Args:
(...)
98 expires: Absolute expiration time for this response
99 """
100 cache_key = cache_key or self.create_key(response.request)
--> 101 cached_response = CachedResponse.from_response(response, expires=expires)
102 cached_response = redact_response(cached_response, self.ignored_parameters)
103 self.responses[cache_key] = cached_response

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/models/response.py:67, in CachedResponse.from_response(cls, original_response, expires, **kwargs)
65 # Store request, raw response, and next response (if it's a redirect response)
66 obj.request = CachedRequest.from_request(original_response.request)
---> 67 obj.raw = CachedHTTPResponse.from_response(original_response)
68 obj._next = (
69 CachedRequest.from_request(original_response.next) if original_response.next else None
70 )
72 # Store response body, which will have been read & decoded by requests.Response by now

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests_cache/models/raw_response.py:59, in CachedHTTPResponse.from_response(cls, original_response)
57 kwargs['body'] = body
58 raw._fp = BytesIO(body)
---> 59 original_response.content # This property reads, decodes, and stores response content
61 # After reading, reset file pointer on original raw response
62 raw._fp = BytesIO(body)

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests/models.py:899, in Response.content(self)
897 self._content = None
898 else:
--> 899 self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
901 self._content_consumed = True
902 # don't need to release the connection; that's been handled by urllib3
903 # since we exhausted the data.

File ~/anaconda3/envs/me2drugs/lib/python3.11/site-packages/requests/models.py:818, in Response.iter_content..generate()
816 yield from self.raw.stream(chunk_size, decode_content=True)
817 except ProtocolError as e:
--> 818 raise ChunkedEncodingError(e)
819 except DecodeError as e:
820 raise ContentDecodingError(e)

ChunkedEncodingError: ('Connection broken: IncompleteRead(575718 bytes read, -287859 more expected)', IncompleteRead(575718 bytes read, -287859 more expected))

Environment variables for EPO not being found by v4

I'm not sure what I'm doing wrong. I've got environment variables set and active and the library is not picking these up for the session. Is there something else that needs to be done to get the client to authenticate to EPO properly in v4?

image

make `patent_client` work behind a proxy

If I do e.g.

from patent_client import Inpadoc
Inpadoc.objects.get("US2022323947A1")

when not sitting behind a proxy it works, but when I do sit behind a proxy I get

ConnectionError: HTTPConnectionPool(host='ops.epo.org', port=80): Max retries exceeded with url: /3.2/rest-services/published-data/publication/docdb/US2022323947A1/biblio (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x000001FB78C201F0>: Failed to establish a new connection: [Errno 11002] getaddrinfo failed'))

Is there a simple way to make it work?

(Excuse me if there is an "obvious" solution, but I am rather new to Python.)

more EPO samples

Great project - could you provide some more EPO samples? I'm having a hard time going from the docs to code because e.g. doc_number doesnt seem to be property on the Inpadoc object even though the docs seem to say so?

EP Register Support

This issue is tracking the work to implement the EP Register endpoints on patent_client. At least for the moment, @xi2pi has volunteered to make a run at this.

The endpoints we'd want to include are:

  • Register Search w/o Constituents - register/search
  • Register Retrieval Service - register/{type}/{format}/{number}/biblio
  • Register Events Service - register/{type}/{format}/{number}/events
  • Register Procedural Steps - /register/{type}/{format}/{number}/procedural-steps

These can be played around with at (https://developers.epo.org/ops-v3-2/apis). They are also documented in the EPO OPS documentation at pages 103-112.

A (broken) example of use is in the documentation:

>>> from patent_client import Epo 
>>> pub = Epo.objects.get("EP3221665A1") 
>>> pub.status[0] 
{'description': 'Request for examination was made', 'code': '15', 'date': '20170825'}
>>> pub.title 
'INERTIAL CAROUSEL POSITIONING'
>>> pub.procedural_steps[0] 
http://ops.epo.org/3.2/rest-services/register/publication/epodoc/EP.3221665.A1/procedural-steps {}
{'phase': 'undefined', 'description': 'Renewal fee payment - 03', 'date': '20171113', 'code': 'RFEE'}

This issue will track notes and thoughts on implementation details. For starters, let's do the EPO Search, which should look a lot like the Inpadoc search as an example, which is implemented in 4 files:

  • API Wrapper - src/patent_client/epo/ops/published/api.py
  • Manager - src/patent_client/epo/ops/published/manager.py
  • Model - src/patent_client/epo/ops/published/model/search.py
  • Schema - src/patent_client/epo/ops/published/schema/search.py

I would suggest this order for development:

  1. Write an API Wrapper at src/patent_client/epo/ops/register/api.py that just returns raw XML data. Don't worry about processing it, just get it to return the right XML data from the EPO endpoint. Use the PublishedSearchAPI in the published/api.py file as an example.

  2. Write a Schema at src/patent_client/epo/ops/register/schema.py that converts the raw XML data into a Python dictionary with the data attributes you want. Use the SearchSchema class and InpadocSchema classes in published/schema/search.py as examples.

  3. Write a Model at src/patent_client/epo/ops/register/model.py to hold the data the Schema produced. Use the Search and Inpadoc classes in published/model/search.py as examples.

  4. Write a Manager at src/patent_client/epo/ops/register/manager.py that will serve as the manager for the EP Register resource. use the SearchManager class in published/manager.py as an example.

And please let me know if you have any questions or concerns!

Patent Public Search Support in view of PatFT/AppFT Deprecation

Wanted to make sure you knew about this:

In October 2022, other searching platforms (i.e. Pub EAST, Pub WEST, Pat/FT and App/FT) will be retired and replaced with Patent Public Search.

under "What is the timeline for Patent Public Search?" on https://ppubs.uspto.gov/pubwebapp/static/pages/faq.html

I found this page that shows the search fields https://ppubs.uspto.gov/pubwebapp/static/pages/searchable-indexes.html for SEARCH_FIELDS in settings.py PN is still patent number but issue date is now PD etc. and it looks like the response comes back as json.

I guess that's a round about feature request to have fulltext use ppubs when patft goes away. I'm not sure I'm pythonic enough to take this on myself but I could help if needed.

Thanks

to_pandas - TypeError

Thank you for providing such a helpful package. I have encountered an issue while using the to_pandas function to convert the results to a DataFrame. Specifically, I am receiving a TypeError message with the following error: ForeignPriority.__init__() missing 1 required positional argument: 'country_name' or ParserError: year 0 is out of range: 0000-12-30T00:00:00Z

I have attempted to filter the results with more conditions, which has worked for smaller results, but I am concerned that a longer result may still cause this issue.

I was wondering if there is a more efficient way to handle this issue or if you have any suggestions on how to approach this problem.

Thank you again for your assistance.

apps = USApplication.objects.filter(first_named_applicant=f'{Firmname}')

df = apps.to_pandas()
df.to_csv("./output.csv")

TypeError Traceback (most recent call last)
Cell In[53], line 1
----> 1 df = google_apps.to_pandas()
2 df.to_csv("./google3.csv")

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\data\collection.py:50, in Collection.to_pandas(self, annotate)
47 import pandas as pd
49 list_of_series = list()
---> 50 for i in iter(self):
51 try:
52 series = i.to_pandas()

File c:\Users.conda\envs\patent_data\lib\site-packages\patent_client\util\base\manager.py:63, in Manager.iter(self)
62 def iter(self) -> Iterator[ModelType]:
---> 63 for item in self._get_results():
64 yield item

File c:\Users.conda\envs\patent_data\lib\site-packages\patent_client\uspto\peds\manager.py:58, in USApplicationManager._get_results(self)
56 for item in page_data["docs"]:
57 if not self.config.limit or counter < self.config.limit:
---> 58 yield self.schema.load(item)
59 counter += 1
60 page_num += 1

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\json\schema\schema.py:9, in Schema.load(self, obj)
7 def load(self, obj):
8 if self.name is not None or isinstance(obj, dict):
----> 9 return super().load(obj)
10 elif isinstance(obj, str):
11 return super().load(json.loads(obj))

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\deserializer.py:55, in Deserializer.load(self, obj)
53 def load(self, obj):
54 self.raw = obj
---> 55 return self._load_func(obj)

File c:\Users.conda\envs\patent_data\lib\site-packages\toolz\functoolz.py:489, in Compose.call(self, *args, **kwargs)
487 ret = self.first(*args, **kwargs)
488 for f in self.funcs:
--> 489 ret = f(ret)
490 return ret

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\schema.py:67, in Schema.deserialize(self, obj)
65 obj = self.accessor(obj)
66 for key, field in self.fields.items():
---> 67 value = field.load(obj)
68 # If there is no value, don't include anything in the output dictionary
69 if not is_valid(value):

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\fields.py:238, in List.load(self, obj)
236 return ListCollection()
237 obj_gen = (self.item_schema.load(i) for i in obj)
--> 238 return ListCollection(o for o in obj_gen if is_valid(o))

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\fields.py:238, in (.0)
236 return ListCollection()
237 obj_gen = (self.item_schema.load(i) for i in obj)
--> 238 return ListCollection(o for o in obj_gen if is_valid(o))

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\fields.py:237, in (.0)
235 if not obj:
236 return ListCollection()
--> 237 obj_gen = (self.item_schema.load(i) for i in obj)
238 return ListCollection(o for o in obj_gen if is_valid(o))

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\json\schema\schema.py:9, in Schema.load(self, obj)
7 def load(self, obj):
8 if self.name is not None or isinstance(obj, dict):
----> 9 return super().load(obj)
10 elif isinstance(obj, str):
11 return super().load(json.loads(obj))

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\deserializer.py:55, in Deserializer.load(self, obj)
53 def load(self, obj):
54 self.raw = obj
---> 55 return self._load_func(obj)

File c:\Users.conda\envs\patent_data\lib\site-packages\toolz\functoolz.py:489, in Compose.call(self, *args, **kwargs)
487 ret = self.first(*args, **kwargs)
488 for f in self.funcs:
--> 489 ret = f(ret)
490 return ret

File c:\Users.conda\envs\patent_data\lib\site-packages\yankee\base\schema.py:80, in Schema.load_model(self, obj)
79 def load_model(self, obj):
---> 80 return self.model(**obj)

TypeError: ForeignPriority.init() missing 1 required positional argument: 'country_name'

Assignment API; PriorOwnerName; Not found

I was looking at assignment API

On USTPO assignment website I see field about "PriorOwnerName" Though I think we have "Collect all applications ever assigned out of the company" But I was unable to find a way to know specifically PriorOwnerName for a patent.

USTPO assignment APIhttps://patent-client.readthedocs.io/en/latest/user_guide/assignments.html

Please advise if there is any way.

Error while using example function

Example:

from patent_client import Inpadoc, Assignment, USApplication, Patent
# Import the model classes you need
# Fetch US Patents with the word "tennis" in their title issued in 2010
pats = Patent.objects.filter(title="tennis", issue_date="2010-01-01->2010-12-31")
len(pats) > 10

Error:

Traceback (most recent call last):
  File "/home/volk/patents_search/test.py", line 5, in <module>
    len(pats) > 10
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 61, in __len__
    self.get_page(0)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 153, in get_page
    results = self.parse_page(response_text)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 169, in parse_page
    result = self.__schema__.load(tree)
  File "/home/volk/.local/lib/python3.10/site-packages/yankee/base/deserializer.py", line 106, in load
    return self.post_load(loaded_obj)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/util/schema_mixin.py", line 27, in post_load
    return self.__model__(**obj)
TypeError: Patent.__init__() missing 1 required positional argument: 'publication_number'

Example:

pub = Inpadoc.objects.get('EP3082535A1')
print(pub.biblio.title)

Error:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 960, in json
    return complexjson.loads(self.content.decode(encoding), **kwargs)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/volk/patents_search/test.py", line 24, in <module>
    pub = Inpadoc.objects.get('EP3082535A1')
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/manager.py", line 47, in get
    result = PublishedApi.biblio.get_biblio(number, doc_type, format)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 47, in get_biblio
    return cls.get_constituents(number, doc_type, format, constituents="biblio")
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 40, in get_constituents
    response = session.get(url)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line 24, in request
    auth_response = self.get_token()
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line 37, in get_token
    data = response.json()
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 968, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Pandas returns - AttributeError: 'NoneType' object has no attribute 'find'

First, this is a great project, thank you! I have a small question. When I run the following code, I get thrown an error: AttributeError: 'NoneType' object has no attribute 'find'

company_name = '3M Company'

pd.DataFrame.from_records(
    (USApplication.objects
        .filter(first_named_applicant=company_name)
        .values('app_filing_date', 'patent_number', 'patent_title')[:]
    )
)

If I were to change the line to
.values('app_filing_date', 'patent_number', 'patent_title')[:5]
Everything comes out great. I suspect there's some data that's rough or nonexistent, but just wanted to see if you've run into this before.

"yankee.json.schema"

It's a pleasure to greet you again.

During the example, an error occurred, too.

  • error -
    File C:\Python311\Lib\site-packages\patent_client\util\json.py:2
    1 from yankee.json import Schema as BaseSchema
    ----> 2 from yankee.json.schema.fields import List as BaseListField
    4 from .schema_mixin import ListManagerMixin
    5 from .schema_mixin import PatentSchemaMixin

ModuleNotFoundError: No module named 'yankee.json.schema.fields'

How to call values tags in other classes in USApplication (e.g., us_references in Patent)

Hi everyone,

I may not understand how to choose the right values tag, especially when calling tags from other classes (e.g., PulicSearch, Patent).

I've referenced this case to get the US references for corporate patents๏ผŒbut all I could get is 'None'.

apps = (USApplication.objects .filter(first_named_applicant=โ€œGoogleโ€) .order_by(โ€œ-appl_filing_dateโ€) .limit(20) .values( โ€œpriority_dateโ€, โ€œpatent.us_refsโ€ )

https://patent-client.readthedocs.io/en/latest/user_guide/introduction.html

Another problem is that I am getting an error when using the following code:

PublicSearch.objects.filter(applicant_name='Tesla Motors').limit(20).values('appl_id').to_list()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Question; is there any way to use EPO to download CMS objects such as IDS pdfs from US cases

I've looked through the documentation, and even the pull request that includes EPO register search functionality. However, I was unable to identify how I might access the global dossier for US applications through the EPO register. When I search the EPO register via chrome, I can see the entire list of CMS objects from the US file wrapper for my cases of interests. Is there any way of accessing this list of CMS objects and executing a download via the API? Now that PEDs blocks downloads, it would be great if there was an alternative way to grab PDFs from the file wrappers of US cases ...

Thanks very much!

Biblio data not pulling for certain patents

Thanks very much for this library, it is great to pull data from the EPO! For some reason there are certain patents that error out when I'm trying to pull biblio data for them. Am I just not using the tool correctly, or is this maybe a known issue? Below shows my code and error message. Here is a link to the actual patent on the EPO website... https://worldwide.espacenet.com/patent/search/family/030117972/publication/JP2005533465A?q=JP2005533465A.

Thanks in advance for any help!

from patent_client import Patent, PublishedApplication, Inpadoc
Inpadoc.objects.get('JP2005533465A').biblio

Gives me this error...

`---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 Inpadoc.objects.get('JP2005533465A').biblio

/usr/local/lib/python3.7/site-packages/patent_client/util/related.py in get(self)
16 filter_obj = {k: getattr(self, v) for (k, v) in mapping.items()}
17 logger.debug(f'Fetching related {klass} using filter {filter_obj}')
---> 18 return resolve(klass.objects.get(**filter_obj), attribute)
19 except AttributeError:
20 return None

/usr/local/lib/python3.7/site-packages/patent_client/util/manager.py in get(self, *args, **kwargs)
175 if len(manager) > 1:
176 raise ValueError("More than one document found!")
--> 177 return manager[0] # type: ignore
178
179 # Manager Functions

/usr/local/lib/python3.7/site-packages/patent_client/util/manager.py in getitem(self, key)
125 mger = mger.limit(stop - start)
126 return mger
--> 127 return self.offset(key).first()
128
129 def filter(self, *args, **kwargs) -> Manager[ModelType]:

/usr/local/lib/python3.7/site-packages/patent_client/util/manager.py in first(self)
185 def first(self) -> ModelType:
186 """Get the first object in the manager"""
--> 187 return next(iter(self))
188
189 def all(self) -> Manager[ModelType]:

/usr/local/lib/python3.7/site-packages/patent_client/util/manager.py in iter(self)
93 yield item
94 # Yield out of the iterator, expanding the cache as you go.
---> 95 for item in self.result_iterator:
96 self.cache.append(item)
97 yield item

/usr/local/lib/python3.7/site-packages/patent_client/epo/inpadoc/manager.py in _get_results(self)
94 break
95 if counter >= offset:
---> 96 yield self.schema.load(item)
97
98 @Property

/usr/local/lib/python3.7/site-packages/marshmallow/schema.py in load(self, data, many, partial, unknown)
726 """
727 return self._do_load(
--> 728 data, many=many, partial=partial, unknown=unknown, postprocess=True
729 )
730

/usr/local/lib/python3.7/site-packages/marshmallow/schema.py in _do_load(self, data, many, partial, unknown, postprocess)
848 try:
849 processed_data = self._invoke_load_processors(
--> 850 PRE_LOAD, data, many=many, original_data=data, partial=partial
851 )
852 except ValidationError as err:

/usr/local/lib/python3.7/site-packages/marshmallow/schema.py in _invoke_load_processors(self, tag, data, many, original_data, partial)
1099 many=many,
1100 original_data=original_data,
-> 1101 partial=partial,
1102 )
1103 return data

/usr/local/lib/python3.7/site-packages/marshmallow/schema.py in _invoke_processors(self, tag, pass_many, data, many, original_data, **kwargs)
1225 data = processor(data, original_data, many=many, **kwargs)
1226 else:
-> 1227 data = processor(data, many=many, **kwargs)
1228 return data
1229

/usr/local/lib/python3.7/site-packages/patent_client/epo/inpadoc/schema.py in pre_load(self, data, *args, **kwargs)
189 data["title"] = titles
190 else:
--> 191 data["title"] = titles["#text"]
192 return data
193

TypeError: 'NoneType' object is not subscriptable`

demonstration code does not work

on the web page https://pypi.org/project/patent-client/. the quick start example:

Import the model classes you need

from patent_client import Inpadoc, Assignment, USApplication, Patent

Fetch US Patents with the word "tennis" in their title issued in 2010

pats = Patent.objects.filter(title="tennis", issue_date="2010-01-01->2010-12-31")
len(pats) > 10
True

When I execute this I get:

pats = Patent.objects.filter(title="tennis", issue_date="2010-01-01->2010-12-31")
len(pats)
Traceback (most recent call last):
File "", line 1, in
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 61, in len
self.get_page(0)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 153, in get_page
results = self.parse_page(response_text)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 169, in parse_page
result = self.schema.load(tree)
File "/home/mclark/.local/lib/python3.10/site-packages/yankee/base/deserializer.py", line 106, in load
return self.post_load(loaded_obj)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/util/schema_mixin.py", line 27, in post_load
return self.model(**obj)
TypeError: Patent.init() missing 1 required positional argument: 'publication_number'
len(pats) > 10
Traceback (most recent call last):
File "", line 1, in
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 61, in len
self.get_page(0)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 153, in get_page
results = self.parse_page(response_text)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/uspto/fulltext/manager.py", line 169, in parse_page
result = self.schema.load(tree)
File "/home/mclark/.local/lib/python3.10/site-packages/yankee/base/deserializer.py", line 106, in load
return self.post_load(loaded_obj)
File "/home/mclark/.local/lib/python3.10/site-packages/patent_client/util/schema_mixin.py", line 27, in post_load
return self.model(**obj)
TypeError: Patent.init() missing 1 required positional argument: 'publication_number'

"

'OpsController' object has no attribute '_key_generator' Error

AttributeError: 'OpsController' object has no attribute '_key_generator' when calling Inpadoc.objects.get() in version 4.1.8

Traceback (most recent call last):
File
...
return Inpadoc.objects.get(number)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/patent_client/util/manager.py", line 159, in get
return run_sync(self.aget(*args, **kwargs))
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/patent_client/util/asyncio_util.py", line 12, in run_sync
return loop.run_until_complete(coroutine)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/nest_asyncio.py", line 99, in run_until_complete
return f.result()
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/futures.py", line 201, in result
raise self._exception.with_traceback(self._exception_tb)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/asyncio/tasks.py", line 232, in __step
result = coro.send(None)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/patent_client/epo/ops/published/manager.py", line 62, in aget
result = await PublishedAsyncApi.biblio.get_biblio(number, doc_type, format)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 35, in get_biblio
text = await cls.get_constituents(number, doc_type, format, constituents="biblio")
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 30, in get_constituents
response = await asession.get(url)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1757, in get
return await self.request(
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1530, in request
return await self.send(request, auth=auth, follow_redirects=follow_redirects)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1617, in send
response = await self._send_handling_auth(
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1645, in _send_handling_auth
response = await self._send_handling_redirects(
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1682, in _send_handling_redirects
response = await self._send_single_request(request)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/httpx/_client.py", line 1719, in _send_single_request
response = await transport.handle_async_request(request)
File "/Users/zhenglongwu/Documents/GitHub/OxValueAI/.venv/lib/python3.10/site-packages/hishel/_async/_transports.py", line 97, in handle_async_request
key = self._controller._key_generator(httpcore_request)
AttributeError: 'OpsController' object has no attribute '_key_generator'

'cp949' codec can't decode

hello.
I have contacted you. (An error occurred while using "patent_client" )

in jupyter
from patent_client import USApplication

~~
File c:\Python38\Lib\site-packages\patent_client\epo\ops\number_service\errors.py:36, in build_error_dir()
34 error_text = (base_dir / "errors.txt").read_text()
35 errors = [get_errors(err) for err in error_text.split("\n")]
---> 36 message_text = (base_dir / "messages.txt").read_text()
37 messages = [get_messages(err) for err in message_text.split("\n")]
38 error_objs = [NumberServiceError(**e) for e in errors + messages if e]

File c:\Python38\Lib\pathlib.py:1059, in Path.read_text(self, encoding, errors)
1057 encoding = io.text_encoding(encoding)
1058 with self.open(mode='r', encoding=encoding, errors=errors) as f:
-> 1059 return f.read()

UnicodeDecodeError: 'cp949' codec can't decode byte 0xe2 in position 102: illegal multibyte sequence

Is this an easily solved problem?

"There is a USPTO problem"

Did the PTO change or disable the API? I get "There is a USPTO problem" when using USApplication

from patent_client import USApplication
app = USApplication.objects.get('15710770')
.....manager.py", line 149, in is_online
raise NotAvailableException("There is a USPTO problem")

Edit: looks like it's a USPTO issue:

https://ped.uspto.gov/peds/
https://www.uspto.gov/blog/ebiz/

Patent Examination Data System (PEDS) and PEDS API down (out-of-service)
The USPTO is experiencing an issue with PEDS and is working diligently to resolve the issue

Exception when calling GlobalDossierApplication.objects.get() on certain applications;"GlobalDossierApplication.init() got an unexpected keyword argument 'applicant_names'"

For App Ser. No. 17193105, there is a single terminal disclaimer in the file wrapper. The identifier of the PDF is L512M7SNGREENX2

EDIT: When I call: app = GlobalDossierApplication.objects.get('16740760', type="application", office="US")

I am getting the attached exception. This is happening across applications and not just the 1 that initially was reproducing this error. Please let me know what else I can provide to help troubleshoot.

Thanks very much!

"GlobalDossierApplication.init() got an unexpected keyword argument 'applicant_names'"

image

KeyError: 'inventors'

While accessing application number 18306211, the python app crashes. Below is the stacktrace :

Traceback (most recent call last):
File "/home/amit/development/python/venv/lib/python3.11/site-packages/django/core/handlers/exception.py", line 55, in inner
response = get_response(request)
^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/django/core/handlers/base.py", line 197, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/django/views/decorators/csrf.py", line 56, in wrapper_view
return view_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/django/views/generic/base.py", line 104, in view
return self.dispatch(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/rest_framework/views.py", line 509, in dispatch
response = self.handle_exception(exc)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/rest_framework/views.py", line 469, in handle_exception
self.raise_uncaught_exception(exc)
File "/home/amit/development/python/venv/lib/python3.11/site-packages/rest_framework/views.py", line 480, in raise_uncaught_exception
raise exc
File "/home/amit/development/python/venv/lib/python3.11/site-packages/rest_framework/views.py", line 506, in dispatch
response = handler(request, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/rest_framework/decorators.py", line 50, in handler
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/drf/api/views.py", line 22, in getData
app = USApplication.objects.get(application_number)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/util/manager.py", line 159, in get
return run_sync(self.aget(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/util/asyncio_util.py", line 5, in run_sync
return asyncio.run(coroutine)
^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/nest_asyncio.py", line 31, in run
return loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/nest_asyncio.py", line 99, in run_until_complete
return f.result()
^^^^^^^^^^
File "/home/amit/anaconda3/lib/python3.11/asyncio/futures.py", line 203, in result
raise self._exception.with_traceback(self._exception_tb)
File "/home/amit/anaconda3/lib/python3.11/asyncio/tasks.py", line 267, in __step
result = coro.send(None)
^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/util/manager.py", line 151, in aget
length = await mger.alen()
^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/uspto/peds/manager.py", line 32, in alen
max_length = (await api.create_query(**self.get_query_params())).num_found
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/uspto/peds/api.py", line 105, in create_query
return PedsPage.model_validate(response.json())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/pydantic/main.py", line 503, in model_validate
return cls.pydantic_validator.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/amit/development/python/venv/lib/python3.11/site-packages/patent_client/uspto/peds/model.py", line 323, in collect_related_fields
for inventor in values["inventors"]:
~~~~~~^^^^^^^^^^^^^
KeyError: 'inventors'

USPTO repeated results?

Hey Parker, man you've been busy! I tried one of the uspto examples and am not sure if I'm doing something wrong or if there's something up in the new code. It looks like the results are the same patent repeated, using v3.2.1. I tried looking at the code but didn't see anything obvious. Thanks again for taking this on!

# Fetch US Patents with the word "tennis" in their title issued in 2010
>>> pats = PatentBiblio.objects.filter(title="tennis", issue_date="2010-01-01->2010-12-31")
>>> pats[0]
PublicationBiblio(publication_number=7841958, publication_date=2010-11-30, patent_title=Modular table tennis game)
>>> pats[1]
PublicationBiblio(publication_number=7841958, publication_date=2010-11-30, patent_title=Modular table tennis game)
>>> pats[1] == pats[2]
True

environmental variable bug in session.py for inpadoc

I think the following if statement means that the environmental variables are never read so the code always reverts to reading ApiKey/Secret from the file system. Looks like it should be os.environ.get("EPO_KEY", True):

CLIENT_SETTINGS = SETTINGS["EpoOpenPatentServices"]
if os.environ.get("EPO_KEY", False):
KEY = os.environ["EPO_KEY"]
SECRET = os.environ["EPO_SECRET"]
else:
KEY = CLIENT_SETTINGS["ApiKey"]
SECRET = CLIENT_SETTINGS["Secret"]

Can't import module: Connection broken: IncompleteRead

On Ubuntu Linux 22.04 LTS, using python 3.10 and patent_client 3.2.4, and a config file with EPO key/secret, i cannot import patent_client:

chgans@chgans-laptop:/tmp$ python --version
Python 3.10.6
chgans@chgans-laptop:/tmp$ python -mvenv venv
chgans@chgans-laptop:/tmp$ . venv/bin/activate
(venv) chgans@chgans-laptop:/tmp$ pip install patent_client
Collecting patent_client
  Using cached patent_client-3.2.4-py3-none-any.whl (1.5 MB)
[...]
Installing collected packages: snowballstemmer, ply, appdirs, zipp, urllib3, ujson, toolz, sphinxcontrib-serializinghtml, sphinxcontrib-qthelp, sphinxcontrib-jsmath, sphinxcontrib-htmlhelp, sphinxcontrib-devhelp, sphinxcontrib-applehelp, six, PyYAML, PyPDF2, pyparsing, Pygments, packaging, MarkupSafe, lxml, inflection, imagesize, idna, exceptiongroup, et-xmlfile, docutils, decorator, cssselect, colorlog, charset-normalizer, certifi, babel, attrs, alabaster, url-normalize, requests, python-dateutil, openpyxl, jsonpath-ng, Jinja2, cattrs, sphinx, requests-cache, sphinx-copybutton, yankee, patent_client
Successfully installed Jinja2-3.1.2 MarkupSafe-2.1.2 PyPDF2-2.12.1 PyYAML-6.0 Pygments-2.15.1 alabaster-0.7.13 appdirs-1.4.4 attrs-23.1.0 babel-2.12.1 cattrs-22.2.0 certifi-2023.5.7 charset-normalizer-3.1.0 colorlog-6.7.0 cssselect-1.2.0 decorator-5.1.1 docutils-0.20.1 et-xmlfile-1.1.0 exceptiongroup-1.1.1 idna-3.4 imagesize-1.4.1 inflection-0.5.1 jsonpath-ng-1.5.3 lxml-4.9.2 openpyxl-3.1.2 packaging-23.1 patent_client-3.2.4 ply-3.11 pyparsing-3.0.9 python-dateutil-2.8.2 requests-2.31.0 requests-cache-0.9.8 six-1.16.0 snowballstemmer-2.2.0 sphinx-7.0.1 sphinx-copybutton-0.5.2 sphinxcontrib-applehelp-1.0.4 sphinxcontrib-devhelp-1.0.2 sphinxcontrib-htmlhelp-2.0.1 sphinxcontrib-jsmath-1.0.1 sphinxcontrib-qthelp-1.0.3 sphinxcontrib-serializinghtml-1.1.5 toolz-0.12.0 ujson-5.7.0 url-normalize-1.4.3 urllib3-2.0.2 yankee-0.1.40 zipp-3.15.0
(venv) chgans@chgans-laptop:/tmp$ python -c "import patent_client"
Traceback (most recent call last):
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 705, in _error_catcher
    yield
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 830, in _raw_read
    raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
urllib3.exceptions.IncompleteRead: IncompleteRead(574192 bytes read, -287096 more expected)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/venv/lib/python3.10/site-packages/requests/models.py", line 816, in generate
    yield from self.raw.stream(chunk_size, decode_content=True)
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 935, in stream
    data = self.read(amt=amt, decode_content=decode_content)
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 906, in read
    data = self._raw_read(amt)
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 808, in _raw_read
    with self._error_catcher():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/tmp/venv/lib/python3.10/site-packages/urllib3/response.py", line 722, in _error_catcher
    raise ProtocolError(f"Connection broken: {e!r}", e) from e
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(574192 bytes read, -287096 more expected)', IncompleteRead(574192 bytes read, -287096 more expected))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/__init__.py", line 51, in <module>
    from patent_client.epo.ops.published.model import Inpadoc  # isort:skip
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/epo/ops/__init__.py", line 2, in <module>
    from .legal.model import Legal
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/epo/ops/legal/__init__.py", line 3, in <module>
    generate_legal_code_db()
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/epo/ops/legal/national_codes.py", line 31, in generate_legal_code_db
    path = get_spreadsheet()
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/epo/ops/legal/national_codes.py", line 62, in get_spreadsheet
    response = session.get(excel_url, stream=True)
  File "/tmp/venv/lib/python3.10/site-packages/requests/sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "/tmp/venv/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line 34, in request
    response = super(OpsSession, self).request(*args, **kwargs)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/session.py", line 115, in request
    return super().request(method, url, *args, **kwargs)
  File "/tmp/venv/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/session.py", line 147, in send
    response = self._send_and_cache(request, actions, **kwargs)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/session.py", line 193, in _send_and_cache
    self.cache.save_response(response, actions.cache_key, actions.expires)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/backends/base.py", line 101, in save_response
    cached_response = CachedResponse.from_response(response, expires=expires)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/models/response.py", line 67, in from_response
    obj.raw = CachedHTTPResponse.from_response(original_response)
  File "/tmp/venv/lib/python3.10/site-packages/requests_cache/models/raw_response.py", line 59, in from_response
    original_response.content  # This property reads, decodes, and stores response content
  File "/tmp/venv/lib/python3.10/site-packages/requests/models.py", line 899, in content
    self._content = b"".join(self.iter_content(CONTENT_CHUNK_SIZE)) or b""
  File "/tmp/venv/lib/python3.10/site-packages/requests/models.py", line 818, in generate
    raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(574192 bytes read, -287096 more expected)', IncompleteRead(574192 bytes read, -287096 more expected))
(venv) chgans@chgans-laptop:/tmp$ grep -v API ~/.patent_client_config.yaml
DEFAULT:
    BASE_DIR: ~/.patent_client
    LOG_FILE: patent_client.log
    LOG_LEVEL: INFO

CACHE:
    PATH: requests_cache.sqlite
    MAX_AGE: "3 days"

EPO:
ITC:
    USERNAME:
    PASSWORD:

Have tried multiple times this afternoon, and always get this crash.

Query limit

Hi and thanks for the package.
I am trying to query the API via multiprocessing package and I noticed that at a certain point (i.e. after a certain number of searches) the processes make no more progresses. Is there a query limit? If so, how many patents could be searched?
Is it possible to search more patents at once?

Thank you

Naive questions

Hi!

Thank you for writing such a great library! Though I have several questions. Most of them might be replied in the docs, but I couldn't find my way. So forgive me if there is already a quick answer.

  1. I can't see the filters.
    In the documentation it is mentioned:
    image

Though when calling that I don't get any field. And in the source code I only get "this call returns fields". So which ones are the fields for filtering?

  1. I can't see a way to perform full-text search
    In my line of work, I would like to explore a string of text towards title, claims and description. Perhaps related with question number 1, but I couldn't get it.

USApplication.objects.filter(patent_title='Integrated docking system for intelligent devices') For example matches nothing even though there is a patent exactly with that title: https://patents.google.com/patent/US11651258B2/en?q=(Integrated+Docking+System+for+Intelligent+Devices)&oq=Integrated+Docking+System+for+Intelligent+Devices

  1. How does one Iterate over PublicSearch object?

results: patent_client.uspto.public_search.manager.PublicSearchManager = PublicSearch.objects.filter(query='"6103599".PN. OR @APD=20210101')

Whatever I do on results gives a JSONEncording Error.

TLDR; I would love to learn more how to use this tool in a full-text search way, where I can give words, and it returns patents that contain those words in title, claims and description.
Thank you for your replies!

Error while using example functions

from patent_client import Inpadoc, Assignment, USApplication, Patent
results = Inpadoc.objects.filter(cql_query='pa="Google LLC"')
len(results) > 100

error

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 960, in json
    return complexjson.loads(self.content.decode(encoding), **kwargs)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/volk/patents_search/test.py", line 21, in <module>
    print(len(results) > 100)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/manager.
py", line 19, in __len__
    page = self._get_search_results_range(1, 100)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/manager.
py", line 16, in _get_search_results_range
    return PublishedApi.search.search(query, start, end)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py",
 line 104, in search
    response = session.get(base_url, params={"Range": range, "q": query})
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line
24, in request
    auth_response = self.get_token()
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line
37, in get_token
    data = response.json()
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 968, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

shell returned 1

Example

from patent_client import USApplication
apps = (USApplication.objects
        # Step 1 - Query the Data
        .filter(first_named_applicant="Google")
        .order_by("-appl_filing_date")
        .limit(20)
        # Step 2 - Reshape the Data
        .values("appl_id", "app_filing_date", "patent_title", "patent_number", "issue_date", "inventors.0.name")
        # Step 3 - Process the data
        .to_pandas()
        .assign(time_to_issue=lambda df: df.issue_date - df.app_filing_date)
        .mean()
        .time_to_issue
        )

Error

 File "/home/volk/.local/lib/python3.10/site-packages/patent_client/util/base/manager.py", lin
e 60, in __iter__
    for item in self._get_results():
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/peds/manager.py", li
ne 51, in _get_results
    num_pages = math.ceil(len(self) / self.page_size)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/peds/manager.py", li
ne 43, in __len__
    max_length = self.get_page(0)["numFound"] - self.config.offset
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/uspto/peds/manager.py", li
ne 71, in get_page
    raise HttpException(
patent_client.uspto.peds.manager.HttpException: 404
<!doctype html><html lang="en"><head><title>HTTP Status 404 ? Not Found</title><style type="tex
t/css">body {font-family:Tahoma,Arial,sans-serif;} h1, h2, h3, b {color:white;background-color:
#525D76;} h1 {font-size:22px;} h2 {font-size:16px;} h3 {font-size:14px;} p {font-size:12px;} a
{color:black;} .line {height:1px;background-color:#525D76;border:none;}</style></head><body><h1
>HTTP Status 404 ? Not Found</h1><hr class="line" /><p><b>Type</b> Status Report</p><p><b>Messa
ge</b> The requested resource [&#47;api&#47;error&#47;error] is not available</p><p><b>Descript
ion</b> The origin server did not find a current representation for the target resource or is n
ot willing to disclose that one exists.</p><hr class="line" /><h3>Apache Tomcat/9.0.62</h3></bo
dy></html>
{'Content-Type': 'text/html;charset=ISO-8859-1', 'Content-Length': '770', 'Connection': 'keep-a
live', 'Date': 'Wed, 02 Nov 2022 15:18:26 GMT', 'Server-Timing': 'intid;desc=142a3b6ca7f7147b',
 'Strict-Transport-Security': 'max-age=31536000;includeSubDomains', 'X-Frame-Options': 'DENY',
'X-Content-Type-Options': 'nosniff', 'X-XSS-Protection': '1; mode=block', 'Cache-Control': 'no-
cache, no-store, max-age=0, must-revalidate', 'Pragma': 'no-cache', 'Expires': '0', 'Content-La
nguage': 'en', 'X-Cache': 'Error from cloudfront', 'Via': '1.1 07d5d44815808d5d5a6f43984a987698
.cloudfront.net (CloudFront)', 'X-Amz-Cf-Pop': 'OSL50-P1', 'X-Amz-Cf-Id': 'vyCVnH5adlizgEws5EXy
WWVqj42cubLE1t6XLWvFheWSedSv9p3u0A=='}
{"qf": "appEarlyPubNumber applId appLocation appType appStatus_txt appConfrNumber appCustNumber
 appGrpArtNumber appCls appSubCls appEntityStatus_txt patentNumber patentTitle primaryInventor
firstNamedApplicant appExamName appExamPrefrdName appAttrDockNumber appPCTNumber appIntlPubNumb
er wipoEarlyPubNumber pctAppType firstInventorFile appClsSubCls rankAndInventorsList", "fl": "*
", "fq": [], "searchText": "firstNamedApplicant:(Google)", "sort": "applFilingDate desc", "face
t": "false", "mm": "100%", "start": 0}

Development/Use-case

Hi @parkerhancock, this is clearly the best and most up to date resource to access the sometimes cumbersome USPTO data. We spoke about trademarks a while ago. I am now getting quite interested in the future development of this project in an attempt to study corporate innovation and its relationship with company success.

A few things to discuss:

  1. It seems that the current scope of your project is to tease out individual patents, like to get a corpus of patents applied for and received both nationally and internationally for say "microsoft".
  2. A research question that I have is how can one create three rectangular datasets for patent applications, grants, and assignments - needed for up to date machine learning/big data analyses.
  3. It seems that it would be best to work with weekly XML files for applications and grants, and then the daily XML file for assignments and to keep the data up to date by parsing and uploading the parsed data to some database.
  4. I also see weekly action files, which makes me wonder how they mediate the relationships between application and grant data, is it some sort of intermediary dataset that keeps you up to date before an application is granted? If so I guess this is also an interesting dataset to keep tabs on.
  5. I have seen you comment on the @patentpy repository that your solution could replace theirs potentially, the developer here has a solution for grants but not yet for applications.
  6. Is this something that your library provides, and if not where can one possibly find software that has streamlined thie above rectangular dataset creation process.
  7. Given your expertise in this area, do you think it is worth to also learn more about the USPTO API, or should ones focus be on the bulk files?
  8. I have noted that there are some additional datasets out there like the maintenance fee and classification datasets, do you think there are some smart ways to merge this info into these three (app, grant, assignmt) dataset.
  9. Furthermore, I have seen some APIs somewhat separate from the original APIs like the patent examination system, would you know what that is all about, I also think patentsview can do some of the above, but is the problem not that it is too slow to update versus the fast weekly files above?
  10. Lastely do you think I have missed anything out, is there other datasets that I should consider/find, like citation data?

Its great to see someone as interested in this field as I am looking forward to any response or guidence you might give.

Edit: @mustberuss Hi Russ, it would be great to have you on this exchange to given your expertise.

ModuleNotFoundError: No module named 'requests_cache'

Hi, I think there may be a newly introduced bug as I upgraded patent_client from 3.2.8 to 3.1.0 and got the following error:

    from patent_client import Patent
  File "/Users/sanjeevan/miniforge3/envs/solve/lib/python3.10/site-packages/patent_client/_init_.py", line 57, in <module>
    from patent_client.uspto.assignment.model import Assignment  # isort:skip
  File "/Users/sanjeevan/miniforge3/envs/solve/lib/python3.10/site-packages/patent_client/uspto/_init_.py", line 1, in <module>
    from .assignment import Assignment
  File "/Users/sanjeevan/miniforge3/envs/solve/lib/python3.10/site-packages/patent_client/uspto/assignment/_init_.py", line 1, in <module>
    from .model import Assignment
  File "/Users/sanjeevan/miniforge3/envs/solve/lib/python3.10/site-packages/patent_client/uspto/assignment/model.py", line 9, in <module>
    from patent_client import session
  File "/Users/sanjeevan/miniforge3/envs/solve/lib/python3.10/site-packages/patent_client/session.py", line 4, in <module>
    import requests_cache

For now it works when I specify 3.2.8

JSON Decode error when accessing EPO bibliographic data endpoint

Investigate issue mentioned in #74 related to the Inpadoc endpoint

Example:

pub = Inpadoc.objects.get('EP3082535A1')
print(pub.biblio.title)

Error:

Traceback (most recent call last):
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 960, in json
    return complexjson.loads(self.content.decode(encoding), **kwargs)
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/volk/patents_search/test.py", line 24, in <module>
    pub = Inpadoc.objects.get('EP3082535A1')
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/manager.py", line 47, in get
    result = PublishedApi.biblio.get_biblio(number, doc_type, format)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 47, in get_biblio
    return cls.get_constituents(number, doc_type, format, constituents="biblio")
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/published/api.py", line 40, in get_constituents
    response = session.get(url)
  File "/usr/lib/python3.10/site-packages/requests/sessions.py", line 600, in get
    return self.request("GET", url, **kwargs)
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line 24, in request
    auth_response = self.get_token()
  File "/home/volk/.local/lib/python3.10/site-packages/patent_client/epo/ops/session.py", line 37, in get_token
    data = response.json()
  File "/usr/lib/python3.10/site-packages/requests/models.py", line 968, in json
    raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

HTTPError on importing Patent

Hi, on occasion I get the following error when importing Patent from patent_client

line 5, in <module>
	2023-11-27T12:23:56.772+00:00	from patent_client import Patent
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/patent_client/_init_.py", line 51, in <module>
	2023-11-27T12:23:56.772+00:00	from patent_client.epo.ops.published.model import Inpadoc # isort:skip
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/patent_client/epo/ops/_init_.py", line 2, in <module>
	2023-11-27T12:23:56.772+00:00	from .legal.model import Legal
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/patent_client/epo/ops/legal/_init_.py", line 3, in <module>
	2023-11-27T12:23:56.772+00:00	generate_legal_code_db()
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/patent_client/epo/ops/legal/national_codes.py", line 31, in generate_legal_code_db
	2023-11-27T12:23:56.772+00:00	path = get_spreadsheet()
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/patent_client/epo/ops/legal/national_codes.py", line 55, in get_spreadsheet
	2023-11-27T12:23:56.772+00:00	response.raise_for_status()
	2023-11-27T12:23:56.772+00:00	File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
	2023-11-27T12:23:56.772+00:00	raise HTTPError(http_error_msg, response=self)
	2023-11-27T12:23:56.772+00:00	requests.exceptions.HTTPError: 503 Server Error: Service Unavailable for url: https://www.epo.org/searching-for-patents/data/coverage/weekly.htm

I believe it's because of generate_legal_code_db() being run on init

https://github.com/parkerhancock/patent_client/blob/master/patent_client/epo/ops/legal/__init__.py#L2C1-L3C25

Is there any way to avoid this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.