Previously, apply_model_to_raster would do just that: apply a model to one raster.</p

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Cleanup apply_model_to_raster naming and API about bfg-nets HOT 8 CLOSED

nsfabina commented on August 22, 2024

Cleanup apply_model_to_raster naming and API

from bfg-nets.

Comments (8)

pgbrodrick commented on August 22, 2024

Hey @nsfabina,

If we rename, it should be 'apply_model_to_site' or something of the like. The list of feature files is not a list of different sites, but rather a list of different feature files per site. Previously, it wasn't possible to do an application this way, despite the fact that we supported that style of input data, hence the change.

Does this shift your thoughts?

from bfg-nets.

nsfabina commented on August 22, 2024

It depends. You say this is applying the model to a "site". Does the code actually care whether this is all the same site, or could I simply pack all of my feature files into the same "site" even though they're global?

If the notion of "site" does matter, then this should be reflected in the name and hopefully have some sort of check to be sure it's not being misused within the function.

If it doesn't matter, then I think the important point is whether there's any sort of efficiency gain from passing all the rasters at once, and being clear about naming.

The motivation for this is that I'm handling my application flow with file locks for large sets of rasters. I realize I can just pass in a list containing a single raster (and this is how I've modified my code in the short term), but the naming and API still matters.

from bfg-nets.

nsfabina commented on August 22, 2024

I suppose in the long-term we can streamline more of this by following the same pattern for DataContainer, Experiment, and Reporter, where with have an Applicator(?) that handles the data and experiment states, calculates any of the information that's necessary for applying models to rasters, and then can be efficient about applying the model to arbitrary rasters.

from bfg-nets.

pgbrodrick commented on August 22, 2024

I think you might be slightly missing the point here, let me be more clear.

I am using the term 'site' in the same way I did for training data - any region (which could be the entire globe, whatever you want) that you want to build data or apply data too. A site can be composed of multiple raster files, each of which may have multiple bands.

IE, when you generate your training set you pass in a list of lists:
[[site_a_file_1, site_a_file_2,...,site_a_file_n],[site_b_file_1,site_b_file_2,...,site_b_file_n],...,[site_q_file_1,site_q_file_2,site_q_file_n]]

The update is not a matter of efficiency, but rather a way in which the multiple files associated with a single site can be passed in together.

So your case (single raster), passing a single-element list is appropriate.

I generally agree that the term 'raster' in the function has the potential to be confusing. With the above in mind, do you think the term 'site' helps, or is there something better?

Your last comment goes to our ongoing discussion about whether it is appropriate to move the application to a different module or not. As far as I can tell, that is the only change you're proposing, as the application does all of the things you mention already, being passed only a data_container and a model. Correct me if I'm wrong.

from bfg-nets.

nsfabina commented on August 22, 2024

I'm not missing the following, we're on the same page about the following intentions:

site is identical to the list of lists format from data builds, where the list of lists is a list of sites
the function currently allows the user to pass in multiple rasters as a single site
a user can pass in a list with a single raster

Here's where we're not on the same page:

Why are sites important for this function? From your answer, apparently there's nothing special about a site and there are no restrictions on how the user defines a site for passing in data. If so, then structuring this function around the idea of a site is unnecessary and lacks clarity. It gives the user the impression that there's something special about a site that they should understand and they'll need to understand why they're doing that, when this could simply refer to a list of rasters and everyone knows what that is and how to use the function.
Is there an efficiency gain from passing in multiple rasters at the same time? From your answer, it sounds like there is no efficiency gain and it simply provides an interface to pass multiple files at the same time. If so, it's more common that functions provide an interface to do an operation once, and leave it to the user to loop through if they're applying that multiple times, since that simplifies the API; e.g., it's more common to pass a file path as a string than to pass a list of file paths as strings. At the very least, we should have a function that allows a user to apply the model to a single raster and, if the functions are named clearly and appropriately, we can have an additional function that does that looping for the user and prints out a progress bar, I suppose... but again, these two should share code where possible to reduce the maintenance cost.

To summarize, I think the action items are still the same from the first post in this issue.

from bfg-nets.

nsfabina commented on August 22, 2024

(Note that "site" matters in data builds and I'm supportive of it there, where there are concepts and characteristics that are shared between files from a single site, like a boundary file, and the code will fail or the outcome will be incorrect if this is misused. It's just that it doesn't really do anything here.)

from bfg-nets.

pgbrodrick commented on August 22, 2024

There is no efficiency gain from running multiple sites, nor are multiple sites currently supported.

I disagree, however, that the concept of a site is not relevant. It is how we have previously defined a group of rasters, and separated out the idea of applications to multiple areas and applications using multiple files. That is absolutely relevant here. Renaming the file to 'apply_model_to_rasters' is potentially confusing to a user who only has one raster per site, as it implies you are applying to multiple sites. I find no reason to support the ability to pass in a single raster, as single rasters are never something that should be considered, only raster sets (ie, a site). Since we already have a name for this, why not keep using it? I would imagine consistency is our best mechanism to deter confusion.

In short, I'd rather have a user say 'I don't know what a site is, let me look it up' than 'This name implies a particular thing that it doesn't do' (e.g., apply to multiple sites simultaneously).

from bfg-nets.

nsfabina commented on August 22, 2024

I'm not sure if the coffee hasn't kicked in yet today, but we may need to video chat about this if you still find this controversial because the miscommunication is making it difficult to resolve this issue.

There is no efficiency gain from running multiple sites, nor are multiple sites currently supported.

I make no mention of multiple sites, only multiple rasters.

Renaming the file to 'apply_model_to_rasters' (...)

I have not suggested renaming the file to 'apply_model_to_rasters'. I have suggested renaming the function called 'apply_model_to_raster" to 'apply_model_to_rasters' because it applies the model to multiple rasters. This should not be controversial because this is what the function actually does.

Renaming the file to 'apply_model_to_rasters' is potentially confusing to a user who only has one raster per site (...)

The reason I started this issue is because the function is currently named in a confusing way. I am applying my model to single rasters, one-by-one. The function is called apply_model_to_rasters. Yet, for some reason, I need to pass a list of rasters to the function? Moreover, if we add a function called apply_model_to_raster that simply applies the model to a single raster, that's even clearer and we have two use cases clearly covered.

'apply_model_to_rasters' is confusing (...) as it implies you are applying to multiple sites

Incorrect, it implies that you are applying the model to multiple rasters. I would agree with you had I suggested renaming the function apply_model_to_multiple_sites or apply_model_to_sites, which is not what I suggested.

I find no reason to support the ability to pass in a single raster, as single rasters are never something that should be considered, only raster sets (ie, a site).

Again, I have already provided a use case for passing in a single raster and I don't think this is all that unusual.

To quote from my comment above: "The motivation for this is that I'm handling my application flow with file locks for large sets of rasters. I realize I can just pass in a list containing a single raster (and this is how I've modified my code in the short term), but the naming and API still matters."

Since we already have a name for this, why not keep using it? I would imagine consistency is our best mechanism to deter confusion.

As scientists, we love creating simple terms to describe complex patterns or processes. This is not a complex object: it's a list of raster file paths. How does "list of raster file paths" confuse anyone?

I'm not sure what your definition of site is any longer, since you state that the sites could represent "even the entire globe". I don't want to have to know how you define sites if it doesn't matter for the purpose of the function, as I want to be able to pass in rasters however I see fit for my needs.

from bfg-nets.

Cleanup apply_model_to_raster naming and API about bfg-nets HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent