Add the <a href="https://api.inaturalist.org/v1/docs/#!/Observations/get_observations_

One <a href="https://github.com/inaturalist/iNaturalistAPI/issues/235" data-hovercard-

Well this is an interesting one. This is different from <a class="issue-link js-issue-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Add GET /observations/observers endpoint from v1 API about pyinaturalist HOT 20 CLOSED

pyinat commented on June 12, 2024 1

Add GET /observations/observers endpoint from v1 API

from pyinaturalist.

Comments (20)

willkuhn commented on June 12, 2024 1

One issue submitted, one to go.

from pyinaturalist.

willkuhn commented on June 12, 2024 1

Ok I think it's finally good to go!!!! Tests appears to be passed. Let me know if I'm wrong

from pyinaturalist.

JWCook commented on June 12, 2024

That would be great, thanks! Let me know if you run into any issues with that.

After taking a quick look, one thing I noticed about those endpoints is that both have parameters identical to GET /observations, except they don't list pagination parameters. It appears that the usual parameters (page, per_page, order_by, etc.) work just fine with those, though, so that seems to be an error in the docs.

from pyinaturalist.

willkuhn commented on June 12, 2024

Happy holidays!

I was able to modify get_observations() and get_all_observations() to serve the GET /observations/observers and GET /observations/identifiers endpoints. Both endpoints do allow the pagination parameters, however, there appear to be bugs in each endpoint that might need to be worked out before we can implement these new functions here.

Here are the bugs that I've encountered so far:

/observers returns pages of results that seem to overlap and that are mostly duplicates (bug documented in the /projects endpoint here and here). If we could find the "end" of the records, we could just filter out duplicates, but I can't figure out how to get to the last page. It goes way past the expected number of pages n_pages = total_records % per_page.
order_by works for /observers with options observation_count (default) and species_count). The default seems to work as expected, but the latter returns {"error":"Error","status":500} when you request a page after the ~500th record
/identifiers returns a maximum of 500 records for (apparently) any query, even if total_results is much higher

I haven't pushed my updates yet because they don't really work due to these bugs. What do you think we should do here @JWCook ?

from pyinaturalist.

JWCook commented on June 12, 2024

Thanks for the detailed report! I should have some time to take a look at this later this week.

Since these are the same endpoints used by inaturalist.org, there must be a combination of parameters that works (like per_page=30 in the similar issue with /projects), at least as a temporary workaround.

It would be also worth creating an issue on INaturalistAPI for this so they're aware. Would you be willing to do that?

from pyinaturalist.

willkuhn commented on June 12, 2024

Yes, it would make sense that there's some parameter combo that works for the website backend...we just need to find it.

I will open an issue on github. Is it better to open 2 issues (one for each endpoint) or one bigger one, given that they aren't necessarily connected issues?

from pyinaturalist.

JWCook commented on June 12, 2024

One issue per endpoint would be good. Thanks!

from pyinaturalist.

JWCook commented on June 12, 2024

Well this is an interesting one. This is different from inaturalist/iNaturalistAPI#227; in that one, the offset was wrong, and each page would return a certain number of unique and non-unique results. With these endpoints, I see that each page contains only unique results until ~500 results, like you noted, and then doesn't return any at all. The issue with Elasticsearch sharding also doesn't seem to apply here.

The relevant code for the /observers endpoint is here: https://github.com/inaturalist/iNaturalistAPI/blob/main/lib/controllers/v1/observations_controller.js#L955-L1089
It looks like this works different than the other endpoints, and contains 2 subqueries that behave differently depending on order_by. This may be a bit trickier to debug.

I think the case you found that causes a HTTP 500 server error may be the most useful here, as that is likely to produce some error logs that the iNat devs would have access to.

from pyinaturalist.

JWCook commented on June 12, 2024

Another thing I noticed is that the observer 'leaderboards' for a given species only shows 500 results, for example: Common Milkweed in North America.
So it's possible that the limit of 500 results is the intended behavior. Even so this behavior isn't documented, and the HTTP 500 error is unexpected, so it's still worth creating an issue.

Since your use case is "how many users have made 10+ observations in project X?," 500 results may actually be enough to do what you want. Even for some of the most commonly observed species (like the milkweed example above), the 500th observer has less than 10 observations of that species.

from pyinaturalist.

JWCook commented on June 12, 2024

@willkuhn Until that issue is answered, would you like to go ahead and commit what you have? You can just add a note in the docstring that it will currently return no more than 500 results, and then set the page size to 500 if not specified:

def get_observers(..., **params):
    params.setdefault('per_page', 500)

from pyinaturalist.

willkuhn commented on June 12, 2024

@JWCook I've got that ready. Do you want me to add it as a patch again like before?

from pyinaturalist.

JWCook commented on June 12, 2024

Yes, go ahead and submit a pull request for it. Thanks!

from pyinaturalist.

willkuhn commented on June 12, 2024

Well, I think it's done. The coding I can do but the gitting confuses the hell out of me.

from pyinaturalist.

JWCook commented on June 12, 2024

Yeah, git can have quite the learning curve. I'm happy to help if there are specific features or tasks in git that you'd like to learn. Or at least point you to some good resources; there are plenty of good ones out there, but also a lot of bad ones that can make it difficult to sort through them all. Are you using a git interface within an IDE (like PyCharm), a standalone UI (like GitKraken), or the git command line (my personal favorite)?

Atlassian has some tutorials that are relatively clear and straightforward. The most useful ones are in the Collaborating and Advanced Tips sections.

from pyinaturalist.

JWCook commented on June 12, 2024

@willkuhn If your changes are done, you can submit a pull request by going to Pull Requests (from your fork) -> New Pull Request. Then select niconoe/pyinaturalist (dev) as the base repository, and the feature branch from your fork as the 'head repository':

from pyinaturalist.

JWCook commented on June 12, 2024

If you would like some practice with the git command line, this is a good opportunity. I've pushed some more changes to dev since you started, so there is a minor merge conflict. This is a really common situation and will demonstrate several git concepts at once.

This can be fixed with either a merge or a rebase. There's a good explanation of the differences here. When dealing with your own feature branch, rebasing is usually the best option.

Rebase commands

First, add the upstream (base) repo as another 'remote' so you can get my recent changes. You can name this whatever you want, but here I'll just name it 'upstream':

git remote add upstream https://github.com/niconoe/pyinaturalist.git

Then, update your dev branch with mine:

git checkout dev       # Switch to your local dev branch
git pull upstream dev  # Pull in the changes from my dev branch
git push origin dev    # (optional) Push those changes back to your remote dev branch (in your fork on GitHub)

You can run those commands again anytime you want to pull the latest upstream changes.

Now you're ready to rebase:

git checkout patch-1
git rebase dev

Git will now start with the current dev branch, and then stick your changes on top of it, one commit at a time. Here's a diagram from the Atlassian docs linked above (rebasing onto a master branch instead of dev):

Fixing a merge conflict

The rebase process will pause when it gets to a merge conflict and show you this message:

You are currently rebasing branch 'patch-1' on '67916d8'.
  (fix conflicts and run "git rebase --continue")
  (use "git rebase --skip" to skip this patch)
  (use "git rebase --abort" to check out the original branch)

Unmerged paths:
  (use "git restore --staged <file>..." to unstage)
  (use "git add <file>..." to mark resolution)
	both modified:   test/test_node_api.py

Translation: we both modified the same part of test/test_node_api.py, and git doesn't know how to automatically merge it.
Run git diff to see the relevant lines:

diff --cc test/test_node_api.py
index 2c04856,bdeda49..0000000
mode 100644,100644..100755
--- a/test/test_node_api.py
+++ b/test/test_node_api.py
@@@ -15,11 -17,8 +17,12 @@@ from pyinaturalist.node_api import 
      get_controlled_terms,
      get_geojson_observations,
      get_observation,
++<<<<<<< HEAD
 +    get_observation_histogram,
++=======
+     get_observation_identifiers,
+     get_observation_observers,
++>>>>>>> appeasing black
      get_observation_species_counts,
      get_observations,
      get_places_autocomplete,

Translation: I added one import (above the === line), and you added two imports (below the line).
This is the easiest kind of merge conflict to resolve, since we just want to keep all three lines without modifying any of them. So open up test/test_node_api.py and remove the lines added by git (the lines with <<<, ===, >>>):

    get_observation,
    get_observation_histogram,
    get_observation_identifiers,
    get_observation_observers,
    get_observation_species_counts,

(Note that there is a way to do that part automatically, but it's actually more complicated, not less!).

Finish rebasing

Almost done! Now you can add your change and finish the rebase:

git add .
git rebase --continue

And finally push your changes back to GitHub:

git push --force

--force is needed there because rebase will rewrite your existing commits instead of adding new ones, so you need to explicitly tell git that you want to overwrite your previous commits.

Let me know if any of that needs more explanation.

from pyinaturalist.

willkuhn commented on June 12, 2024

@JWCook thank you so much for taking the time to write up that helpful guide! I really appreciate that! I found the conflict that you pointed out and I believe it's resolved and PRed. Please let me know if that worked or not.

from pyinaturalist.

JWCook commented on June 12, 2024

@willkuhn Great! I don't think the PR got submitted, though. Can you try again? It will show up here after being submitted: https://github.com/niconoe/pyinaturalist/pulls

from pyinaturalist.

willkuhn commented on June 12, 2024

PR done but I forgot to run unittests locally after rebasing (I think that's the right word) so there are some failed tests.

Palm to forehead

Working on that...

from pyinaturalist.

JWCook commented on June 12, 2024

Great! It's almost ready to merge. Just added a couple comments on your PR.

from pyinaturalist.

Add GET /observations/observers endpoint from v1 API about pyinaturalist HOT 20 CLOSED

Comments (20)

Rebase commands

Fixing a merge conflict

Finish rebasing

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent