Giter VIP home page Giter VIP logo

Comments (10)

tpoisot avatar tpoisot commented on August 16, 2024

Starting with a 0-length array (which was the case at some point) is obviously a terrible idea for performance reasons, so this will not happen.

While I do get your point about the semantics, I think we can have some flexibility for now - the package works, the documentation is updated, the code has been reviewed a few times. If we want to change all of that, with all that implies for both the user-facing code and the internals, there needs to be a discussion prior to opening issues, or at least suggestions. Not only about read! (which I don't think would make sense for most users), but about how to rename the number of effective records and the total number of records.

from gbif.jl.

rafaqz avatar rafaqz commented on August 16, 2024

As I mentioned in read! I was hoping to use GBIF.jl in teaching examples, and for documentation examples in GeoData.jl, e.g. for extract. It's the perfect package to use to demonstrate getting some point observations, and extracting variables for them. But code examples cant really contain base julia methods with non-standard semantics.

I was hoping to make some improvements to the package, and willing to do all of the work, and wouldn't do anything that reduces the performance.

from gbif.jl.

tpoisot avatar tpoisot commented on August 16, 2024

Not that I am unwilling to entertain the idea - but I need a concrete proposal, and I need this before we introduce read! (which I think is not a user-friendly name, but we'll discuss this later).

I am really grateful that you want to make improvements to the package, but as it's developed and heavily used in the lab, I want to make sure that we discuss things before writing code, so as not to waste your time writing it and our time reviewing it.

If you could reply with a proposed fix coupled to each of your issues, we can discuss this before moving on to a PR.

from gbif.jl.

rafaqz avatar rafaqz commented on August 16, 2024

Methods to modify:

length: feels like it should be the number of records downloaded already
size: is defined in base as (length,) for an array. We should do the same here.
view(::GBIFRecords): probably shouldn't be defined here. In base it returns a zero dimensional SubArray

There are two main ways to do this.

  1. Use everything as is, and just change some names, so
  • size (the length of the array) becomes e.g. total_occurrences .
  • length (the number of fetched records) stays as length
  • view becomes e.g records which calls view(o.occuurences, length(o)).
  • Array interface methods like Array/vec/collect etc could all call records.
  • getindex is a little odd... does it index o.occurrences or records(o) ? I'm not sure.

Generally in this case array interface methods like length have to be manipulated to give the illusion of a shorter array of GBIFRecord than we actually have.

  1. Swap how the occurrences array works so it starts small and grows with new data.
  • The occurrences vector is initialized with the length of the first request. Each successive request uses push!. This isn't slow compared to how long a HTTP request takes.
  • GBIFRecords is then <: AbstractArray and can be a direct forwarding of the whole interface to o.occurrences. It can just be indexed directly as none of the indices are undef. So view isn't needed at all, but you can use view on it in the normal way to take views of indices.
  • total_occurrences is then a field we store in the GBIFRecords object, to track how many there are in total, and how meany are left to download.

In this case the array interface makes sense for GBIFRecords as-is - it's just a vector or GBIFRecord, with some metadata - the original request and the total_occurrences field.

from gbif.jl.

rafaqz avatar rafaqz commented on August 16, 2024

Any comments on this? It would be good to clarify this before the upcoming course. We really should be using GBIF.jl in it.

from gbif.jl.

rafaqz avatar rafaqz commented on August 16, 2024

For a real world demonstration, this will be in the GeoData.jl docs and used in teaching:

# A basic species distribution modelling workflow
using GeoData, GBIF, CSV, Plots 

# Load occurrences for the Mountain Pygmy Possum
t = taxon("Burramys parvus")
records = occurrences(t, :limit => 300)

# Get the rest of the occurrences, we need to do this manually with a loop.
# Note: GBIF.jl uses non-standard semantics for `size`. This is comparing 
# the occurrances already downloaded with the total occurrances.
while length(records) < size(records)
    occurrences!(records)
end

# Extract the longitude/latitude value to a Vector of Tuple
coords = map(r -> (r.longitude, r.latitude), records)

# Get BioClim layers and subset to south-east australia
A = GeoStack(WorldClim{BioClim}, (1, 3, 7, 12))
SE_aus = A[X=Between(138, 155), Y=Between(-40, -25), Band=1]

# Plot BioClim predictors and scatter occurrence points on all subplots
p = plot(SE_aus);
foreach(i -> scatter!(p, coords; subplot=i, legend=:none), 1:4)
display(p)

# Extract predictor variables and write to CSV
predictors = extract(SE_aus, coords)
CSV.write("burramys_parvus_predictors.csv", predictors)

From my perspective it would be best to instead of the while loop do something like:

read!(occurences)

But if you were to use a while loop or anything similar, it would be something like:

while length(records) < total_occurences(records)
    occurrences!(records)
end

So that the non-standard meaning of size doesn't have to be explained. view(records) would give an error, for the same reason.

from gbif.jl.

tpoisot avatar tpoisot commented on August 16, 2024

What about count for total_occurrences?

from gbif.jl.

tpoisot avatar tpoisot commented on August 16, 2024

Re. read!, I don't think it's a good idea, because we have a number of cases where we do something at the end of every page of request.

from gbif.jl.

tpoisot avatar tpoisot commented on August 16, 2024

And as a third point, I don't think the Array should grow. We know its size at the beginning, and back when it did grew with each page, the requests were getting slower and slower.

from gbif.jl.

tpoisot avatar tpoisot commented on August 16, 2024

So I'd vote for option 1, with maybe count instead of total_occurrences - go ahead with a PR?

from gbif.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.