Comments (10)
Starting with a 0-length array (which was the case at some point) is obviously a terrible idea for performance reasons, so this will not happen.
While I do get your point about the semantics, I think we can have some flexibility for now - the package works, the documentation is updated, the code has been reviewed a few times. If we want to change all of that, with all that implies for both the user-facing code and the internals, there needs to be a discussion prior to opening issues, or at least suggestions. Not only about read!
(which I don't think would make sense for most users), but about how to rename the number of effective records and the total number of records.
from gbif.jl.
As I mentioned in read!
I was hoping to use GBIF.jl in teaching examples, and for documentation examples in GeoData.jl, e.g. for extract
. It's the perfect package to use to demonstrate getting some point observations, and extracting variables for them. But code examples cant really contain base julia methods with non-standard semantics.
I was hoping to make some improvements to the package, and willing to do all of the work, and wouldn't do anything that reduces the performance.
from gbif.jl.
Not that I am unwilling to entertain the idea - but I need a concrete proposal, and I need this before we introduce read!
(which I think is not a user-friendly name, but we'll discuss this later).
I am really grateful that you want to make improvements to the package, but as it's developed and heavily used in the lab, I want to make sure that we discuss things before writing code, so as not to waste your time writing it and our time reviewing it.
If you could reply with a proposed fix coupled to each of your issues, we can discuss this before moving on to a PR.
from gbif.jl.
Methods to modify:
length
: feels like it should be the number of records downloaded already
size
: is defined in base as (length,)
for an array. We should do the same here.
view(::GBIFRecords
): probably shouldn't be defined here. In base it returns a zero dimensional SubArray
There are two main ways to do this.
- Use everything as is, and just change some names, so
size
(the length of the array) becomes e.g.total_occurrences
.length
(the number of fetched records) stays aslength
view
becomes e.grecords
which callsview(o.occuurences, length(o))
.- Array interface methods like
Array
/vec
/collect
etc could all callrecords
. getindex
is a little odd... does it indexo.occurrences
orrecords(o)
? I'm not sure.
Generally in this case array interface methods like length
have to be manipulated to give the illusion of a shorter array of GBIFRecord
than we actually have.
- Swap how the occurrences array works so it starts small and grows with new data.
- The occurrences vector is initialized with the length of the first request. Each successive request uses
push!
. This isn't slow compared to how long a HTTP request takes. GBIFRecords
is then<: AbstractArray
and can be a direct forwarding of the whole interface too.occurrences
. It can just be indexed directly as none of the indices areundef
. Soview
isn't needed at all, but you can useview
on it in the normal way to take views of indices.total_occurrences
is then a field we store in theGBIFRecords
object, to track how many there are in total, and how meany are left to download.
In this case the array interface makes sense for GBIFRecords
as-is - it's just a vector or GBIFRecord
, with some metadata - the original request and the total_occurrences
field.
from gbif.jl.
Any comments on this? It would be good to clarify this before the upcoming course. We really should be using GBIF.jl in it.
from gbif.jl.
For a real world demonstration, this will be in the GeoData.jl docs and used in teaching:
# A basic species distribution modelling workflow
using GeoData, GBIF, CSV, Plots
# Load occurrences for the Mountain Pygmy Possum
t = taxon("Burramys parvus")
records = occurrences(t, :limit => 300)
# Get the rest of the occurrences, we need to do this manually with a loop.
# Note: GBIF.jl uses non-standard semantics for `size`. This is comparing
# the occurrances already downloaded with the total occurrances.
while length(records) < size(records)
occurrences!(records)
end
# Extract the longitude/latitude value to a Vector of Tuple
coords = map(r -> (r.longitude, r.latitude), records)
# Get BioClim layers and subset to south-east australia
A = GeoStack(WorldClim{BioClim}, (1, 3, 7, 12))
SE_aus = A[X=Between(138, 155), Y=Between(-40, -25), Band=1]
# Plot BioClim predictors and scatter occurrence points on all subplots
p = plot(SE_aus);
foreach(i -> scatter!(p, coords; subplot=i, legend=:none), 1:4)
display(p)
# Extract predictor variables and write to CSV
predictors = extract(SE_aus, coords)
CSV.write("burramys_parvus_predictors.csv", predictors)
From my perspective it would be best to instead of the while
loop do something like:
read!(occurences)
But if you were to use a while loop or anything similar, it would be something like:
while length(records) < total_occurences(records)
occurrences!(records)
end
So that the non-standard meaning of size
doesn't have to be explained. view(records)
would give an error, for the same reason.
from gbif.jl.
What about count
for total_occurrences
?
from gbif.jl.
Re. read!
, I don't think it's a good idea, because we have a number of cases where we do something at the end of every page of request.
from gbif.jl.
And as a third point, I don't think the Array should grow. We know its size at the beginning, and back when it did grew with each page, the requests were getting slower and slower.
from gbif.jl.
So I'd vote for option 1, with maybe count
instead of total_occurrences
- go ahead with a PR?
from gbif.jl.
Related Issues (20)
- Custom pretty-printing
- Query.jl HOT 3
- update! should apply the qa/qc filters if there are any
- Use missing type when 1.0 is released HOT 3
- Replace Requests by HTTP
- working on 1.0? HOT 2
- Update travis file after changes in Documenter.jl HOT 4
- Update documentation before release
- Paging after applying filters causes a bug in the number of records HOT 2
- The dictionary when filtering occurrences must be typed `{Any,Any}`
- The taxon function has no method to pass a taxon ID directly HOT 6
- Tests for filtering sometimes fail HOT 2
- Day of month out of range
- The genericName key is not always present in taxa
- Taxon fails but maybe it should return nothing?
- Occurrences get slower for large queries
- TagBot trigger issue HOT 3
- query could be a Dict rather than Vector{Pair} HOT 2
- Test
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gbif.jl.