Giter VIP home page Giter VIP logo

dorado's Introduction

dorado - Lagrangian particle routing

build codecov PyPI - Python Version PyPI version status

Particle routing on Lidar-derived bathymetry

dorado is a Python package for simulating passive Lagrangian particle transport over flow-fields from any 2D shallow-water hydrodynamic model using a weighted random walk methodology.

For user guides and detailed examples, refer to the documentation.

Example Uses:

Particles on an Unsteady ANUGA Flow Field of the Wax Lake Delta

Example

Particles on a DeltaRCM Simulated Delta

Example

Installation:

dorado supports Python 2.7 as well as Python 3.5+. For the full distribution including examples, clone this repository using git clone and run python setup.py install from the cloned directory. To test this "full" installation, you must first install pytest via pip install pytest. Then from the cloned directory the command pytest can be run to ensure that your installed distribution passes all of the unit tests.

For a lightweight distribution including just the core functionality, use pip to install via PyPI:

pip install pydorado

Installation using conda via conda-forge is also supported:

conda install -c conda-forge pydorado

For additional installation options and instructions, refer to the documentation.

Contributing

We welcome contributions to the dorado project. Please open an issue or a pull request if there is functionality you would like to see or propose. Refer to our contributing guide for more information.

Citing

If you use this package and wish to cite it, please use the Journal of Open Source Software article.

Funding Acknowledgments

This work was supported in part by NSF EAR-1719670, the NSF GRFP under grant DGE-1610403 and the NASA Earth Venture Suborbital (EVS) award 17-EVS3-17_1-0009 in support of the DELTA-X project.

dorado's People

Contributors

elbeejay avatar kbarnhart avatar paolapassah2o avatar wrightky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dorado's Issues

Documentation: "publications" section

Per suggestion from @wrightky in #39:

Now that we've got a few dorado papers out, any interest in adding a "publications" section (either to the README or the docs) that points people to the relevant literature?

Opening this issue so we don't forget to add this to the documentation at some point.

JOSS Review: "Particle" class name

This was a comment made by @gassmoeller here:

Your main class that handles all particles is called "Particle". When I looked into the code I expected this to be a class for one single particle. In terms of nomenclature, maybe something like ParticleHandler, or ParticleManager seems more appropriate (I do not insist on those names, I only suggested them, because in my own particle implementation I chose the name ParticleHandler). At least I would consider using "Particles" to make clear that it is a class that holds more than one particle. I realize this may be a disruptive change, and I do not know how much backward compatibility you have to consider, but I would like to hear your opinion on the topic.

Documentation: Math rendering

Looks like the math rendering isn't working as it should be, should be a simple fix to change the LaTeX engine for Sphinx in conf.py (hopefully).

image

Speed-Up "Exact" Particle Generator

The nested while-for loop structure in the "exact" method of particle generation is slow and could be fixed relatively easily (link to section of code)

Instead of using a loop structure, since the total number of particles to be seeded, as well as the total number of seed locations are known, the number of particles to put in each location could just be computed via division making the assignment process one of list comprehension rather than loop-based. If the modulo (%) operator is used, the "remainder" particles that couldn't be handled by list comprehension could be assigned via loop.

This change should be purely a speed-up, and shouldn't change how code behaves, so presumably unit tests would all just continue to pass and existing workflows would not be impacted...

JOSS Review - Warnings with parallel example

Related to openjournals/joss-reviews#2585

I get the attached warnings when running examples/parallel_routing.py. May just be a minor version issue on my machine, but I just wanted to know what you think about them.

My package versions:
python 3.8.2 (default install on ubuntu 20.04)
numpy 1.17.4
matplotlib 3.1.2
scipy 1.3.3
future 0.18.2
tqdm 4.49.0

log.txt

JOSS Review: General Comments

From @dbuscombe-usgs's JOSS review link to comment

openjournals/joss-reviews/issues/2585

What is the motivation for maintaining both pip and conda versions? Seems like picking one and ensuring support would be easier. Also, I suggest you recommend users make use of a virtual environment (venv) or conda environment to make installation, maintenance and compatibility with other projects easier.

The demo on https://passah2o.github.io/dorado/quickstart/index.html#demo-1-using-the-high-level-api is missing:
“import matplotlib.pyplot as plt”
Also, please provide some higher level explanation for what this example is actually doing and what this is a simulation of.

Are there any plans to add or change functionality in the future? If so, a roadmap would be a good addition. It would also be a good way for others could see themselves contributing

Instead of using the future package, presumably to make things backwards-compatible, I would suggest catering ONLY to python 3 and later. This is a new program so backwards-compatibility is certainly not expected and probably not necessary.

Build failing on Ubuntu (Python 3.8+)

The last two build attempts have failed for ubuntu-latest 3.8 and 3.9 (all other tests are passing). Specifically, we're getting an assertion error for three of the tests in tests/test_examplecases.py::TestRCM. Here's the failure message:

=========================== short test summary info ============================
FAILED tests/test_examplecases.py::TestRCM::test_few_steps_RCM - assert [0, 700759344...472986.069204] == [0, 700759344...473384.876724]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235, 10439964733.462337, 13698473384.876724]
  ?                     ^           ^^^^ ^^^^         ^^ ^ ^^ ^
  + [0, 7007593448.337233, 10439964528.877066, 13698472986.069204]
  ?                     ^          +++++ ^^ ^         ^^ ^ ^ ^ +
FAILED tests/test_examplecases.py::TestRCM::test_set_time_RCM_previousdata - assert [0, 7007593448.337233] == [0, 7007593448.337235]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235]
  ?                     ^
  + [0, 7007593448.337233]
  ?                     ^
FAILED tests/test_examplecases.py::TestRCM::test_set_time_RCM - assert [0, 7007593448.337233] == [0, 7007593448.337235]
  At index 1 diff: 7007593448.337233 != 7007593448.337235
  Full diff:
  - [0, 7007593448.337235]
  ?                     ^
  + [0, 7007593448.337233]
  ?                     ^
========================= 3 failed, 88 passed in 4.55s =========================

It looks to me like our random seed isn't guaranteeing identical travel times in newer versions of Python, because the error is occurring quite far down the list of sig-figs. The difference is on the order of 10^-8. I'm thinking an acceptable fix would be to change the assertion to check whether the results are within some threshold difference of each other, maybe 10^-6? Let me know if that works for you @elbeejay , happy to work on this later.

Further Speeding Up Routing Weight Calculations

From #16, @wrightky said:

So, looking at this code, I see that the main structure of the weight computation hasn't changed. We're still constructing small sub-arrays at each index, doing a few quick operations (max, cleaning, multiplications, summing), and saving the 9 resulting weights in a weight array. Surprised we hadn't tried this sooner given how much better it performs.

It looks like the key reason the runtime scales better in this model isn't anything about how this computation works, it's how many times we do it. Originally, we performed these operations locally once for every particle at every iteration. So, the runtime scaled as Np_tracer * iterations_per_particle. Now, we perform this once for every cell, so it scales with domain size (L-2) * (W-2). I bet if you checked the example cases you benchmarked, the ratio of these values should give roughly the speedup you observed.

One thing I wonder, though, is whether we could obtain even faster runtimes by modifying the structure of this computation itself, by switching from local array operations to global. Whatever is causing this function to take so long must be due to the fact that we're repeatedly constructing many sub-arrays in a loop and operating on them, instead of performing a few big global array operations (inside which we'd be taking advantage of all the fast numpy broadcasting stuff). In principal, if we broke up these operations (max, cleaning, multiplication, summing) into a loop over each of the D8 directions, with each being a global matrix operation, instead of a loop over each cell, we could reduce the amount of overhead repeatedly calling these functions. I don't know exactly how that scales, but it'd be the difference between time(np.max(small array)) * (L-2) * (W-2) and time(np.max(big array)) * 9. Does that scale better? Not sure.

So that is a potential route for further speeding up the routing weight calculation.

JOSS review: Improve statement of need

From @dbuscombe-usgs's JOSS review link to comment

openjournals/joss-reviews/issues/2585

The statement of need should lead with a definition of the term ‘particle tracking’. It is clear to me that this is shorthand for ‘tracking of water – including particles – in a Lagrangian framework’, but it could be clearer to the reader.
I also recommend adding the theory section in the documentation to the paper (https://passah2o.github.io/dorado/background/index.html). It is well written and answers many outstanding questions I have when left with when reading the paper on its own.

JOSS Review - Running examples with data requires manual data setup?

Related to openjournals/joss-reviews#2585

I had some issues getting the examples that require data to run (e.g. example 1 - steady_anuga_particles.py). The example data seems to be in the directory with the main python files (in a subdirectory called example_data), while the example scripts (.py) are in a different directory called 'examples'. Maybe that is because I am not a Python expert, but here is what I did: During installation (python3 setup.py install) the example data seems to be installed, but the example scripts are not. Running the example scripts from where they are will raise an error (message attached), unless I manually copy the example data into the same directory as the example scripts. Did I mess up the installation process? To me it seems reasonable to have the example data in the same directory as the example scripts, thus to either (1) also install the example scripts, or to (2) have the example data in the same directory as the example scripts and not install both, or (3) install the data and not the scripts, but to make sure the scripts find the data without modification after the installation.

Error message:

Traceback (most recent call last):
  File "steady_anuga_particles.py", line 8, in <module>
    data = np.load('ex_anuga_data.npz')
  File "/usr/lib/python3/dist-packages/numpy/lib/npyio.py", line 428, in load
    fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'ex_anuga_data.npz'

Docs Build Hard-Set to Python 3.8

Once the matplotlib dependencies support Python 3.9, we should reset our docs build to use the latest version of Python (in the workflow yaml this would be '3.x' as opposed to '3.8').

See here for the upstream issue that once closed, should allow us to move back to '3.x'.

Log print statements

We print to stdout with reckless abandon throughout the codebase. I think we should create functionality to optionally suppress these print statements (maybe a "quiet" flag or a "verbosity" argument), and/or we should also support printing any messages out to a .log file instead. This way they won't disappear with the terminal, and can be preserved with the particle walk data for review during post-processing or when the results of the simulation are later analyzed.

Make particle iteration limit a keyword argument

In Particles.run_iteration() we have a hard limit on the number of iterations particles can take (currently 10,000 iterations). It is simple enough to make this an optional keyword argument to the function, creating greater flexibility for the end-user in the event their use-case requires this limit to be higher or lower.

JOSS Review: Code structure and class hierarchy

Comment from @gassmoeller here:

Your current code works for your purpose and the examples you show, but I am slightly concerned about the future maintainability of the code, because you have mixed responsibilities and an unclear class hierarchy for some of your functions. I will give some examples for your main functions and why that could become a problem later below. I would not consider this a roadblock for the publication in JOSS, but I would like to see as much as reasonable of this fixed at some point and I would like to hear your opinion on the matter:

You define particle locations either based on an initial seed location, or based on existing particle locations either from a previous run, or created by the user. But the initial seed location is expected to be handed over during construction of Particle, while existing locations are handed over to run_iteration. This is a bit confusing for a newcomer, but more importantly: It shows that you are not sure who (which function) is responsible for the particle generation. I would suggest to have one function specific for particle generation, e.g. generate_particles(x_seed, y_seed, start_xindices, start_yindices). In a previous particle system I have written we separated that into a set of different functions (Particles::Generators::regular_reference_locations, Particles::Generators::probabilistic_locations, ..., see here if you are interested in details), but that would be overkill for your application here. Still you want to keep that option instead of spreading the responsibility for generating over multiple functions or making your run_iteration function into a somehow_generate_particles_and_run_iteration function. This would become a problem once you start to add more particle generation options (e.g. probabilistic particle distribution with varying density functions, particle patterns like discs/circles/lines).

A related comment about assigning responsibilities: I do not understand why the run_iteration function is a member of Particle, but the single_iteration function is a member of Tools. single_iteration does the core work of moving the particles, and so I would suspect it to be a member of Particle. It is true that single_iteration uses many of the Tools functions, but that does not mean it is conceptually a tool itself, and I would move it to Particle. On the other hand the functions coord2ind, ind2coord, and unstruc2grid are typical tools functions and I would expect them in the Tools class, and not as stand-alone functions in particle_track.py. More about Tools below.

Currently Tools defines a number of utility functions for the Particle class, and then Particle is derived from Tools. But this is not a typical is-a parent-child relation. Particle is not a Tools. Particle uses Tools, so instead it should import Tools, or be given an object of type Tools upon creation or similar. This is already a problem in your code, because functions in Tools use member variables that are only read in and initialized in Particle (e.g. self.velocity in Tools.calc_travel_times). In other words Tools can not be used without Particle or used as a base class for anything else. Utility functions or objects should usually not own any member variables (and in particular not rely on member variables of derived classes), and instead be handed all necessary parameters to compute the little computation they are supposed to do.

Related to the last comment: Your documentation in Tools currently states that Particle is derived from Tools and uses its functions, but if that would change in the future you would likely only update the Particle class and forget to change the documentation of Tools (not implying you are not careful, this is just what experience in programming shows). Instead this documentation should be in the Particle class, explaining why Particle is derived from Tools, this way if the relationship changes you also update the documentation. Documentation should always state what this thing does, not what others do with it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.