Comments (6)
I am pretty new to pandas, for what it's worth!
from patent_client.
Also, just to give you a better idea of what's going on:
company_name = '3M Company'
pd.DataFrame.from_records(
(USApplication.objects
.filter(first_named_applicant='Microsoft')
.values('appl_id', 'patent_number', 'patent_title')[0:10]
)
)
Works great!
and this:
company_name = '3M Company'
pd.DataFrame.from_records(
(USApplication.objects
.filter(first_named_applicant='Microsoft')
.values('appl_id', 'patent_number', 'patent_title')[0:1000]
)
)
Fails.
from patent_client.
Thanks for reporting the issue! I'll take a look at it. I suspect its an issue on how I implemented slicing in the manager.
In the mean time, the pd.DataFrame.from_records
can accept a generator as input, not just a list. So the first example should work fine without specifying a slice. That is, as:
company_name = '3M Company'
pd.DataFrame.from_records(
(USApplication.objects
.filter(first_named_applicant=company_name)
.values('app_filing_date', 'patent_number', 'patent_title')
)
)
from patent_client.
Ah! One other thing. I know why your example that asks for the first 10 records works, but the first 1000 does not.
There's an issue with the USPTO's Patent Examination Data System API (which support USApplication). The ordinary JSON API only returns 20 results, and although it has a pagination system, it's broken (it returns sets of 20, but paginates in sets of 25).
If you query returns fewer than 20 results, it just parses the json and returns a result - easy peasy. If your query returns more than 20 results, it has to request a download of a bulk file in XML, download that bulk file, and then parse the data out of XML. Which is why USApplication can be slow for large queries. (It does cache the bulk file, so subsequent identical queries execute quickly)
The issue is something with how the XML is parsed. When you make a big request (e.g. the first 1000 records), something in the XML parser is failing. This is something I do need to fix.
from patent_client.
Version 0.4.2 should fix the issue. I tried it with your examples above, and it worked great! turns out it was a busted XML parser, not the slicing.
I hate that XML parser. If the USPTO ever fixes the pagination issue, I'm switching to that immediately and dropping it altogether. Too many moving parts to go wrong. Especially when the JSON is just so easy to deal with.
Let me know if you still have problems, and I'll take another look. Travis CI is testing the new code now, and I'll deploy to PyPI as soon as it comes back green.
Thanks for reporting the issue!
from patent_client.
Thanks so much for checking into this @parkerhancock !
I tried to upgrade and rerun the above code, and it throws this issue:
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-3-d89bd37079ad> in <module>()
----> 1 from patent_client import USApplication, Assignment
2 import pandas as pd
3
4 pd.DataFrame.from_records(
5 (USApplication.objects
~/anaconda3/lib/python3.6/site-packages/patent_client/__init__.py in <module>()
28 SETTINGS = json.load(open(SETTINGS_FILE))
29
---> 30 from patent_client.epo_ops.models import Inpadoc, Epo # isort:skip
31 from patent_client.uspto_assignments import Assignment # isort:skip
32 from patent_client.uspto_exam_data.main import USApplication # isort:skip
~/anaconda3/lib/python3.6/site-packages/patent_client/epo_ops/__init__.py in <module>()
4 CACHE_DIR.mkdir(exist_ok=True)
5 TEST_DIR = TEST_BASE / "epo"
----> 6 TEST_DIR.mkdir(exist_ok=True)
~/anaconda3/lib/python3.6/pathlib.py in mkdir(self, mode, parents, exist_ok)
1244 self._raise_closed()
1245 try:
-> 1246 self._accessor.mkdir(self, mode)
1247 except FileNotFoundError:
1248 if not parents or self.parent == self:
~/anaconda3/lib/python3.6/pathlib.py in wrapped(pathobj, *args)
385 @functools.wraps(strfunc)
386 def wrapped(pathobj, *args):
--> 387 return strfunc(str(pathobj), *args)
388 return staticmethod(wrapped)
389
FileNotFoundError: [Errno 2] No such file or directory: '/Users/johncole/anaconda3/lib/python3.6/tests/fixtures/epo'
Curious, since it's pulling under "tests" maybe there was something left out of the build? I completely uninstalled the pip library, and then reinstalled it, then it started to throw this error.
from patent_client.
Related Issues (20)
- Query limit HOT 5
- ModuleNotFoundError: No module named 'requests_cache' HOT 3
- HTTPError on importing Patent HOT 1
- Environment variables for EPO not being found by v4 HOT 2
- 504 Gateway Time-out for PEDS HOT 2
- KeyError: 'inventors' HOT 1
- Not getting the inventor details. HOT 1
- 'OpsController' object has no attribute '_key_generator' Error HOT 9
- Problems with PublishedApplicationBiblio.objects.get on Ubuntu HOT 1
- Docker weirdness; GlobalDossierApplication missing 'office_actions' attribute HOT 4
- Failed to authenticate with EPO OPS! Please check your credentials. HOT 5
- BUG: Public Search Returns a Length of 500 without raising an appropriate warning. HOT 4
- Confirmation Number Missing HOT 3
- Option to avoid loading national codes HOT 2
- "Event Loop is closed"/Deadlocks for simultaneous requests HOT 4
- Python 3.9 Is Not Supported HOT 1
- GlobalDossier exception on objects.get HOT 1
- When using docker, Inpadoc client credentials fail to be identified HOT 2
- can't run program when importing library HOT 2
- Public patent search API no longer working - USPTO blocking programmatic access? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from patent_client.