Giter VIP home page Giter VIP logo

Comments (9)

jwidness avatar jwidness commented on June 5, 2024 1

If you want a smaller response from the server, you'd have to use the (alpha) v2 api.

from pyinaturalist.

JWCook avatar JWCook commented on June 5, 2024 1

Right, the v2 API would help with this. I have an open issue to add support for that (#155), which I haven't gotten around to yet, but plan to... soonish? Meanwhile you could use plain python requests to try that out, if you want.

For repeated downloads, pyinaturalist has some built-in caching that helps with this (briefly mentioned in the docs here). Let me know if you'd like help changing the settings for that.

For minimizing disk usage, what you're already doing (removing the info you don't need and compressing it) is probably the easiest option. Another option would be to use a more space-efficient format like parquet. Or even SQLite, or pretty much anything other than JSON, would likely be an improvement; the downsides are that the files are no longer human-readable/editable, and it adds a couple extra steps to read and write observation data. I have a separate library here that helps with this kind of thing, and I could give some examples if needed: https://github.com/pyinat/pyinaturalist-convert

from pyinaturalist.

abubelinha avatar abubelinha commented on June 5, 2024 1

Thanks a lot for giving tons of great info!
And sorry about the discussion threads.
Most github repositories I've seen don't use them ... so I always forget that they exist until somebody reminds me like you.

from pyinaturalist.

abubelinha avatar abubelinha commented on June 5, 2024

Not sure if this would deserve a new issue because this is not a bug.
This is the only "related" subject I found ... although it is not related either.

I am downloading all my user occurrences as json files which I store for later processing. I have the feeling they are much bigger than I would need, basically because of the big size of the "identifications" section inside each "result".

i.e., if I only download 1 result (per_page=1) I got a JSON of 912 lines.
The single whole "result" (one observation item) takes 904 lines itself in this case.
The "taxon" section inside "result" takes 59 lines.
The "identifications" section inside "result" takes 616 lines, despite of having only one identification in this example !!! (so the "identifications" sections takes even more relative amount of the JSON file size when there are several identifications per observation item).

For what I need, I am OK with downloading just the stuff inside the "taxon" section.
Is it possible to somehow avoid downloading the "identifications" section, specially when I do a get_observations(user_id='my_username', page='all') request?
That would reduce my json files downloads to less than 1/3 size
(which is important since I need to do this weekly for several users of my institution).

Thanks a lot
@abubelinha

from pyinaturalist.

JWCook avatar JWCook commented on June 5, 2024

@abubelinha Yeah, full observation responses are fairly verbose, mainly because they include all the information you see on the observation pages on inaturalist.org. The bulk of it, as you noticed, is from the identifications and full taxonomy details for each identification.

Are you mainly concerned about network bandwidth, or disk space?

from pyinaturalist.

abubelinha avatar abubelinha commented on June 5, 2024

Are you mainly concerned about network bandwidth, or disk space?

Well, mainly about disk space (I rsync this folder 2-4 times a day between home and work).
But also script time, since I plan to repeat same downloads regularly.
Of course I can reprocess the json once it is in my disk and remove that part (I am also compressing the json too).
But if you know how to skip identifications from api response, the whole script would take less time and save bandwidth (and energy).

I don't see any iNaturalist api options to get a "summarized" api return (other than using only_id which would only return the ids but nothing else).

As you are the expert I preferred to ask here, just in case pyinaturalist already had an option for this.

from pyinaturalist.

abubelinha avatar abubelinha commented on June 5, 2024

Oh, you mean the fields parameter!
Yes, that seems to be exactly what I need.
I guess you will include that option in pyinaturalist when v2 api becomes stable, don't you?
Thank you so much.

from pyinaturalist.

JWCook avatar JWCook commented on June 5, 2024

P.S., you're always welcome to create new issues, or discussion threads for more open-ended questions. Usually those are easier for me to catch up on than comments on closed issues.

from pyinaturalist.

JWCook avatar JWCook commented on June 5, 2024

Continued in #155

from pyinaturalist.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.