passah2o / dorado Goto Github PK

For shallow-water Lagrangian particle routing.

Home Page: https://passah2o.github.io/dorado

License: MIT License

Python 96.19% TeX 3.81%

lagrangian particles particle-transport tracer particle-tracking particle-tracing random-walk rivers hydrodynamic-modeling simulator

dorado's People

Contributors

Stargazers

Watchers

Forkers

elbeejay wrightky kbarnhart sypcloud adomliu muzahid58lsu soloyant

dorado's Issues

Log print statements

We print to stdout with reckless abandon throughout the codebase. I think we should create functionality to optionally suppress these print statements (maybe a "quiet" flag or a "verbosity" argument), and/or we should also support printing any messages out to a .log file instead. This way they won't disappear with the terminal, and can be preserved with the particle walk data for review during post-processing or when the results of the simulation are later analyzed.

JOSS Review - Running examples with data requires manual data setup?

I had some issues getting the examples that require data to run (e.g. example 1 - steady_anuga_particles.py). The example data seems to be in the directory with the main python files (in a subdirectory called example_data), while the example scripts (.py) are in a different directory called 'examples'. Maybe that is because I am not a Python expert, but here is what I did: During installation (python3 setup.py install) the example data seems to be installed, but the example scripts are not. Running the example scripts from where they are will raise an error (message attached), unless I manually copy the example data into the same directory as the example scripts. Did I mess up the installation process? To me it seems reasonable to have the example data in the same directory as the example scripts, thus to either (1) also install the example scripts, or to (2) have the example data in the same directory as the example scripts and not install both, or (3) install the data and not the scripts, but to make sure the scripts find the data without modification after the installation.

Error message:

Traceback (most recent call last):
  File "steady_anuga_particles.py", line 8, in <module>
    data = np.load('ex_anuga_data.npz')
  File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 428, in load
    fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'ex_anuga_data.npz'

Speed-Up "Exact" Particle Generator

The nested while-for loop structure in the "exact" method of particle generation is slow and could be fixed relatively easily (link to section of code)

Instead of using a loop structure, since the total number of particles to be seeded, as well as the total number of seed locations are known, the number of particles to put in each location could just be computed via division making the assignment process one of list comprehension rather than loop-based. If the modulo (%) operator is used, the "remainder" particles that couldn't be handled by list comprehension could be assigned via loop.

This change should be purely a speed-up, and shouldn't change how code behaves, so presumably unit tests would all just continue to pass and existing workflows would not be impacted...

JOSS Review: Improve community guidelines

From @dbuscombe-usgs's JOSS review link to comment

openjournals/joss-reviews/issues/2585

More instruction for how to make a pull request (e.g. https://docs.fast.ai/dev-setup), and how to submit an issue (including what information should be listed when doing so) is required. Also, provide a direct link to the github repo here

Build failing on Ubuntu (Python 3.8+)

The last two build attempts have failed for ubuntu-latest 3.8 and 3.9 (all other tests are passing). Specifically, we're getting an assertion error for three of the tests in tests/test_examplecases.py::TestRCM. Here's the failure message:

=========================== short test summary info ============================
FAILED tests/test_examplecases.py::TestRCM::test_few_steps_RCM - assert [0, 700759344...472986.069204] == [0, 700759344...473384.876724]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235, 10439964733.462337, 13698473384.876724]
  ?                     ^           ^^^^ ^^^^         ^^ ^ ^^ ^
  + [0, 7007593448.337233, 10439964528.877066, 13698472986.069204]
  ?                     ^          +++++ ^^ ^         ^^ ^ ^ ^ +
FAILED tests/test_examplecases.py::TestRCM::test_set_time_RCM_previousdata - assert [0, 7007593448.337233] == [0, 7007593448.337235]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235]
  ?                     ^
  + [0, 7007593448.337233]
  ?                     ^
FAILED tests/test_examplecases.py::TestRCM::test_set_time_RCM - assert [0, 7007593448.337233] == [0, 7007593448.337235]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235]
  ?                     ^
  + [0, 7007593448.337233]
  ?                     ^
========================= 3 failed, 88 passed in 4.55s =========================

It looks to me like our random seed isn't guaranteeing identical travel times in newer versions of Python, because the error is occurring quite far down the list of sig-figs. The difference is on the order of 10^-8. I'm thinking an acceptable fix would be to change the assertion to check whether the results are within some threshold difference of each other, maybe 10^-6? Let me know if that works for you @elbeejay , happy to work on this later.

JOSS Review - Warnings with parallel example

I get the attached warnings when running examples/parallel_routing.py. May just be a minor version issue on my machine, but I just wanted to know what you think about them.

My package versions:
python 3.8.2 (default install on ubuntu 20.04)
numpy 1.17.4
matplotlib 3.1.2
scipy 1.3.3
future 0.18.2
tqdm 4.49.0

log.txt

Further Speeding Up Routing Weight Calculations

From #16, @wrightky said:

So, looking at this code, I see that the main structure of the weight computation hasn't changed. We're still constructing small sub-arrays at each index, doing a few quick operations (max, cleaning, multiplications, summing), and saving the 9 resulting weights in a weight array. Surprised we hadn't tried this sooner given how much better it performs.

It looks like the key reason the runtime scales better in this model isn't anything about how this computation works, it's how many times we do it. Originally, we performed these operations locally once for every particle at every iteration. So, the runtime scaled as Np_tracer * iterations_per_particle. Now, we perform this once for every cell, so it scales with domain size (L-2) * (W-2). I bet if you checked the example cases you benchmarked, the ratio of these values should give roughly the speedup you observed.

One thing I wonder, though, is whether we could obtain even faster runtimes by modifying the structure of this computation itself, by switching from local array operations to global. Whatever is causing this function to take so long must be due to the fact that we're repeatedly constructing many sub-arrays in a loop and operating on them, instead of performing a few big global array operations (inside which we'd be taking advantage of all the fast numpy broadcasting stuff). In principal, if we broke up these operations (max, cleaning, multiplication, summing) into a loop over each of the D8 directions, with each being a global matrix operation, instead of a loop over each cell, we could reduce the amount of overhead repeatedly calling these functions. I don't know exactly how that scales, but it'd be the difference between time(np.max(small array)) * (L-2) * (W-2) and time(np.max(big array)) * 9. Does that scale better? Not sure.

So that is a potential route for further speeding up the routing weight calculation.

JOSS Review: "Particle" class name

This was a comment made by @gassmoeller here:

Your main class that handles all particles is called "Particle". When I looked into the code I expected this to be a class for one single particle. In terms of nomenclature, maybe something like ParticleHandler, or ParticleManager seems more appropriate (I do not insist on those names, I only suggested them, because in my own particle implementation I chose the name ParticleHandler). At least I would consider using "Particles" to make clear that it is a class that holds more than one particle. I realize this may be a disruptive change, and I do not know how much backward compatibility you have to consider, but I would like to hear your opinion on the topic.

JOSS Review: Code structure and class hierarchy

Comment from @gassmoeller here:

Your current code works for your purpose and the examples you show, but I am slightly concerned about the future maintainability of the code, because you have mixed responsibilities and an unclear class hierarchy for some of your functions. I will give some examples for your main functions and why that could become a problem later below. I would not consider this a roadblock for the publication in JOSS, but I would like to see as much as reasonable of this fixed at some point and I would like to hear your opinion on the matter:

You define particle locations either based on an initial seed location, or based on existing particle locations either from a previous run, or created by the user. But the initial seed location is expected to be handed over during construction of Particle, while existing locations are handed over to run_iteration. This is a bit confusing for a newcomer, but more importantly: It shows that you are not sure who (which function) is responsible for the particle generation. I would suggest to have one function specific for particle generation, e.g. generate_particles(x_seed, y_seed, start_xindices, start_yindices). In a previous particle system I have written we separated that into a set of different functions (Particles::Generators::regular_reference_locations, Particles::Generators::probabilistic_locations, ..., see here if you are interested in details), but that would be overkill for your application here. Still you want to keep that option instead of spreading the responsibility for generating over multiple functions or making your run_iteration function into a somehow_generate_particles_and_run_iteration function. This would become a problem once you start to add more particle generation options (e.g. probabilistic particle distribution with varying density functions, particle patterns like discs/circles/lines).

A related comment about assigning responsibilities: I do not understand why the run_iteration function is a member of Particle, but the single_iteration function is a member of Tools. single_iteration does the core work of moving the particles, and so I would suspect it to be a member of Particle. It is true that single_iteration uses many of the Tools functions, but that does not mean it is conceptually a tool itself, and I would move it to Particle. On the other hand the functions coord2ind, ind2coord, and unstruc2grid are typical tools functions and I would expect them in the Tools class, and not as stand-alone functions in particle_track.py. More about Tools below.

Currently Tools defines a number of utility functions for the Particle class, and then Particle is derived from Tools. But this is not a typical is-a parent-child relation. Particle is not a Tools. Particle uses Tools, so instead it should import Tools, or be given an object of type Tools upon creation or similar. This is already a problem in your code, because functions in Tools use member variables that are only read in and initialized in Particle (e.g. self.velocity in Tools.calc_travel_times). In other words Tools can not be used without Particle or used as a base class for anything else. Utility functions or objects should usually not own any member variables (and in particular not rely on member variables of derived classes), and instead be handed all necessary parameters to compute the little computation they are supposed to do.

Related to the last comment: Your documentation in Tools currently states that Particle is derived from Tools and uses its functions, but if that would change in the future you would likely only update the Particle class and forget to change the documentation of Tools (not implying you are not careful, this is just what experience in programming shows). Instead this documentation should be in the Particle class, explaining why Particle is derived from Tools, this way if the relationship changes you also update the documentation. Documentation should always state what this thing does, not what others do with it.

Write tests for verbose behavior

Write tests for all of the verbose behavior - i.e. does that argument successfully turn off print statements.

JOSS review: Improve statement of need

From @dbuscombe-usgs's JOSS review link to comment

openjournals/joss-reviews/issues/2585

The statement of need should lead with a definition of the term ‘particle tracking’. It is clear to me that this is shorthand for ‘tracking of water – including particles – in a Lagrangian framework’, but it could be clearer to the reader.
I also recommend adding the theory section in the documentation to the paper (https://passah2o.github.io/dorado/background/index.html). It is well written and answers many outstanding questions I have when left with when reading the paper on its own.

JOSS Review: General Comments

From @dbuscombe-usgs's JOSS review link to comment

openjournals/joss-reviews/issues/2585

What is the motivation for maintaining both pip and conda versions? Seems like picking one and ensuring support would be easier. Also, I suggest you recommend users make use of a virtual environment (venv) or conda environment to make installation, maintenance and compatibility with other projects easier.

The demo on https://passah2o.github.io/dorado/quickstart/index.html#demo-1-using-the-high-level-api is missing:
“import matplotlib.pyplot as plt”
Also, please provide some higher level explanation for what this example is actually doing and what this is a simulation of.

Are there any plans to add or change functionality in the future? If so, a roadmap would be a good addition. It would also be a good way for others could see themselves contributing

Instead of using the future package, presumably to make things backwards-compatible, I would suggest catering ONLY to python 3 and later. This is a new program so backwards-compatibility is certainly not expected and probably not necessary.

Documentation: Math rendering

Looks like the math rendering isn't working as it should be, should be a simple fix to change the LaTeX engine for Sphinx in conf.py (hopefully).

Make particle iteration limit a keyword argument

In Particles.run_iteration() we have a hard limit on the number of iterations particles can take (currently 10,000 iterations). It is simple enough to make this an optional keyword argument to the function, creating greater flexibility for the end-user in the event their use-case requires this limit to be higher or lower.

Docs Build Hard-Set to Python 3.8

Once the matplotlib dependencies support Python 3.9, we should reset our docs build to use the latest version of Python (in the workflow yaml this would be '3.x' as opposed to '3.8').

See here for the upstream issue that once closed, should allow us to move back to '3.x'.

Documentation: "publications" section

Per suggestion from @wrightky in #39:

Now that we've got a few dorado papers out, any interest in adding a "publications" section (either to the README or the docs) that points people to the relevant literature?

Opening this issue so we don't forget to add this to the documentation at some point.

passah2o / dorado Goto Github PK

dorado's People

Contributors

Stargazers

Watchers

Forkers

dorado's Issues

Recommend Projects

Recommend Topics

Recommend Org