First, this is a great project, thank you! I have a small question. When I run the f

Thanks so much for checking into this <a class="user-mention notranslate" data-hoverca

Pandas returns - AttributeError: 'NoneType' object has no attribute 'find' about patent_client HOT 6 CLOSED

parkerhancock commented on May 28, 2024

Pandas returns - AttributeError: 'NoneType' object has no attribute 'find'

from patent_client.

Comments (6)

johnisanerd commented on May 28, 2024

I am pretty new to pandas, for what it's worth!

from patent_client.

johnisanerd commented on May 28, 2024

Also, just to give you a better idea of what's going on:

company_name = '3M Company'


pd.DataFrame.from_records(
    (USApplication.objects
        .filter(first_named_applicant='Microsoft')
        .values('appl_id', 'patent_number', 'patent_title')[0:10]
    )
)

Works great!

and this:

company_name = '3M Company'

pd.DataFrame.from_records(
    (USApplication.objects
        .filter(first_named_applicant='Microsoft')
        .values('appl_id', 'patent_number', 'patent_title')[0:1000]
    )
)

Fails.

from patent_client.

parkerhancock commented on May 28, 2024

Thanks for reporting the issue! I'll take a look at it. I suspect its an issue on how I implemented slicing in the manager.

In the mean time, the pd.DataFrame.from_records can accept a generator as input, not just a list. So the first example should work fine without specifying a slice. That is, as:

company_name = '3M Company'

pd.DataFrame.from_records(
    (USApplication.objects
        .filter(first_named_applicant=company_name)
        .values('app_filing_date', 'patent_number', 'patent_title')
    )
)

from patent_client.

parkerhancock commented on May 28, 2024

Ah! One other thing. I know why your example that asks for the first 10 records works, but the first 1000 does not.

There's an issue with the USPTO's Patent Examination Data System API (which support USApplication). The ordinary JSON API only returns 20 results, and although it has a pagination system, it's broken (it returns sets of 20, but paginates in sets of 25).

If you query returns fewer than 20 results, it just parses the json and returns a result - easy peasy. If your query returns more than 20 results, it has to request a download of a bulk file in XML, download that bulk file, and then parse the data out of XML. Which is why USApplication can be slow for large queries. (It does cache the bulk file, so subsequent identical queries execute quickly)

The issue is something with how the XML is parsed. When you make a big request (e.g. the first 1000 records), something in the XML parser is failing. This is something I do need to fix.

from patent_client.

parkerhancock commented on May 28, 2024

Version 0.4.2 should fix the issue. I tried it with your examples above, and it worked great! turns out it was a busted XML parser, not the slicing.

I hate that XML parser. If the USPTO ever fixes the pagination issue, I'm switching to that immediately and dropping it altogether. Too many moving parts to go wrong. Especially when the JSON is just so easy to deal with.

Let me know if you still have problems, and I'll take another look. Travis CI is testing the new code now, and I'll deploy to PyPI as soon as it comes back green.

Thanks for reporting the issue!

from patent_client.

johnisanerd commented on May 28, 2024

Thanks so much for checking into this @parkerhancock !

I tried to upgrade and rerun the above code, and it throws this issue:


---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-3-d89bd37079ad> in <module>()
----> 1 from patent_client import USApplication, Assignment
      2 import pandas as pd
      3 
      4 pd.DataFrame.from_records(
      5     (USApplication.objects

~/anaconda3/lib/python3.6/site-packages/patent_client/__init__.py in <module>()
     28 SETTINGS = json.load(open(SETTINGS_FILE))
     29 
---> 30 from patent_client.epo_ops.models import Inpadoc, Epo  # isort:skip
     31 from patent_client.uspto_assignments import Assignment  # isort:skip
     32 from patent_client.uspto_exam_data.main import USApplication  # isort:skip

~/anaconda3/lib/python3.6/site-packages/patent_client/epo_ops/__init__.py in <module>()
      4 CACHE_DIR.mkdir(exist_ok=True)
      5 TEST_DIR = TEST_BASE / "epo"
----> 6 TEST_DIR.mkdir(exist_ok=True)

~/anaconda3/lib/python3.6/pathlib.py in mkdir(self, mode, parents, exist_ok)
   1244             self._raise_closed()
   1245         try:
-> 1246             self._accessor.mkdir(self, mode)
   1247         except FileNotFoundError:
   1248             if not parents or self.parent == self:

~/anaconda3/lib/python3.6/pathlib.py in wrapped(pathobj, *args)
    385         @functools.wraps(strfunc)
    386         def wrapped(pathobj, *args):
--> 387             return strfunc(str(pathobj), *args)
    388         return staticmethod(wrapped)
    389 

FileNotFoundError: [Errno 2] No such file or directory: '/Users/johncole/anaconda3/lib/python3.6/tests/fixtures/epo'

Curious, since it's pulling under "tests" maybe there was something left out of the build? I completely uninstalled the pip library, and then reinstalled it, then it started to throw this error.

from patent_client.

Pandas returns - AttributeError: 'NoneType' object has no attribute 'find' about patent_client HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent