Comments (9)
If you want a smaller response from the server, you'd have to use the (alpha) v2 api.
from pyinaturalist.
Right, the v2 API would help with this. I have an open issue to add support for that (#155), which I haven't gotten around to yet, but plan to... soonish? Meanwhile you could use plain python requests
to try that out, if you want.
For repeated downloads, pyinaturalist has some built-in caching that helps with this (briefly mentioned in the docs here). Let me know if you'd like help changing the settings for that.
For minimizing disk usage, what you're already doing (removing the info you don't need and compressing it) is probably the easiest option. Another option would be to use a more space-efficient format like parquet. Or even SQLite, or pretty much anything other than JSON, would likely be an improvement; the downsides are that the files are no longer human-readable/editable, and it adds a couple extra steps to read and write observation data. I have a separate library here that helps with this kind of thing, and I could give some examples if needed: https://github.com/pyinat/pyinaturalist-convert
from pyinaturalist.
Thanks a lot for giving tons of great info!
And sorry about the discussion threads.
Most github repositories I've seen don't use them ... so I always forget that they exist until somebody reminds me like you.
from pyinaturalist.
Not sure if this would deserve a new issue because this is not a bug.
This is the only "related" subject I found ... although it is not related either.
I am downloading all my user occurrences as json files which I store for later processing. I have the feeling they are much bigger than I would need, basically because of the big size of the "identifications" section inside each "result".
i.e., if I only download 1 result (per_page=1
) I got a JSON of 912 lines.
The single whole "result" (one observation item) takes 904 lines itself in this case.
The "taxon" section inside "result" takes 59 lines.
The "identifications" section inside "result" takes 616 lines, despite of having only one identification in this example !!! (so the "identifications" sections takes even more relative amount of the JSON file size when there are several identifications per observation item).
For what I need, I am OK with downloading just the stuff inside the "taxon" section.
Is it possible to somehow avoid downloading the "identifications" section, specially when I do a get_observations(user_id='my_username', page='all')
request?
That would reduce my json files downloads to less than 1/3 size
(which is important since I need to do this weekly for several users of my institution).
Thanks a lot
@abubelinha
from pyinaturalist.
@abubelinha Yeah, full observation responses are fairly verbose, mainly because they include all the information you see on the observation pages on inaturalist.org. The bulk of it, as you noticed, is from the identifications and full taxonomy details for each identification.
Are you mainly concerned about network bandwidth, or disk space?
from pyinaturalist.
Are you mainly concerned about network bandwidth, or disk space?
Well, mainly about disk space (I rsync this folder 2-4 times a day between home and work).
But also script time, since I plan to repeat same downloads regularly.
Of course I can reprocess the json once it is in my disk and remove that part (I am also compressing the json too).
But if you know how to skip identifications from api response, the whole script would take less time and save bandwidth (and energy).
I don't see any iNaturalist api options to get a "summarized" api return (other than using only_id
which would only return the ids but nothing else).
As you are the expert I preferred to ask here, just in case pyinaturalist already had an option for this.
from pyinaturalist.
Oh, you mean the fields
parameter!
Yes, that seems to be exactly what I need.
I guess you will include that option in pyinaturalist when v2 api becomes stable, don't you?
Thank you so much.
from pyinaturalist.
P.S., you're always welcome to create new issues, or discussion threads for more open-ended questions. Usually those are easier for me to catch up on than comments on closed issues.
from pyinaturalist.
Continued in #155
from pyinaturalist.
Related Issues (20)
- Add undocumented GET /taxa/lifelist_metadata endpoint
- Fix type annotations in API docs HOT 1
- Add lifelist metadata to response in ObservationController.life_list()
- ImportError: cannot import name 'RequestRate' from 'pyrate_limiter' HOT 3
- Put long param sections in dropdowns
- AnnotationController.create() - allow adding annotations by label instead of ID
- Drop support for python 3.7
- binder down? HOT 2
- possible issue with some endpoints HOT 7
- TimeoutError: The write operation timed out HOT 12
- Using Dry Run throws a key error HOT 2
- Adding 'Notes' to observations HOT 2
- A big shoutout! HOT 5
- Checking if the token provided is valid HOT 3
- Observations to/from Pandas DataFrame HOT 5
- GUANOMetadata Support in Audio files HOT 4
- Create/update observations with Observation objects
- Error `WARNING Parameters missing or invalid:1/1 cannot come before 1/1"` using pyinaturalist HOT 5
- HTTP 429 Rate Limit error on reading observations HOT 3
- Feature request: support for font-awesome icons inplace of emojis for user interface HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyinaturalist.