Giter VIP home page Giter VIP logo

rsgpr's People

Contributors

erikmannerfelt avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

rsgpr's Issues

jpg render fails with an ugly error if the file is too big

If the file is too big to write to a single jpg, rsgpr unhelpfully fails with this error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parameter(ParameterError { kind: DimensionMismatch, underlying: None })', src/gpr.rs:1517:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Either it should fail gracefully (no panic, keep on going with a warning saying "can't write jpg because it's too big"), or multiple jpg files should be written.

Move the filters to a separate module

Most of the filters are right now defined in methods to the GPR struct. This makes for a very long gpr.rs file, and makes testing more difficult. There should probably be a filters.rs file or a directory with sub-modules for different types of filters instead.

Fix nix `defaultPackage` in the flake

A very common way of accessing the package in an external flake is to call <package>.defaultPackage.<system>. For some reason, this doesn't work with the flake, so it would be nice if it was fixed.

Add shell completion

clap_complete is available and "should" be simple enough to use.

The problem I ran into was that I can't figure out how to generate autocompletion for the whole program; only subcommands can be completed according to my very limited understanding when using the derive approach.

Add `remove_empty_traces` filter

In cases when a Malå GPR is misconfigured, lots of empty traces appear in the data. They are pretty easy to find, as the sum(abs(trace)) < 100 which is only true on these. Adding a filter to remove them would be great, as it otherwise now has to be done with an extremely long remove_traces filter.

Make `rsgpr` easy to install

At the moment, there is no easy way to install rsgpr and no instructions exist for it. I see three things that should be done:

  • Add an application section in the flake.nix to be able to install it without fiddling with the source code. (completed in #20)
  • Add instructions for installing it on non-nix machines from source, so it's added to the machine's PATH (completed in #22)
  • Potentially publish it to https://crates.io to make it super-simple to install.

Add CI caching for faster builds

With #26, the CI build time is now around 13 minutes. That's not extreme, but it could surely be better. Most of the time is most likely spent on compiling the dependencies, and not rsgpr itself. Therefore, assuming the dependency versions don't change, there's no good reason to recompile them every time a new action is triggered.

There are apparently good caching solutions out there. One blog showcases this nicely:
Optimizing Rust Builds for Faster GitHub Actions Pipelines

Add an example radargram to the documentation

To showcase the capability of the tool, a nice screenshot would be cool. Here's one for example:
image

This is Kroppbreen on Svalbard from 2023 at 100 MHz collected by me. The run steps are:

zero_corr_max_peak
equidistant_traces
correct_antenna_separation
normalize_horizontal_magnitudes(0.3)
dewow(5)
auto_gain(100)
abslog
correct_topography

Add an absolute + log10 filter

Linear positive/negative values are only somewhat useful for interpretation. Running a log10(abs(data)) (after making sure no zeros exist) makes for much nicer looking plots!

It could be called abslog or something.

Add a disclaimer that only GDAL versions supported by the GDAL crate are easy to install

The rust crate gdal requires bindings for all separate versions (X.Y; e.g. 3.8) of the GDAL library. Nonexistent bindings should be possible to generate when enabling the bindgen flag of the gdal crate, but I have never gotten this to work. Thus, I am personally constrained to using versions with pre-built bindings (<=3.7 as of April 2024. 3.8 exists but is not in a release yet). There should maybe be a disclaimer on the installation saying that the most recent GDAL may require extra work since the gdal crate is always a few months behind the GDAL library.

The fastest way (I think) to see which versions are supported are to go to the prebuilt-bindings directory in the source, filter by crate version (git tag) and see which are available.
https://github.com/georust/gdal/tree/master/gdal-sys/prebuilt-bindings

The best way would be if instructions for bindgen use could be provided, but then I would need to get it to work myself!

Update the nix flake's gdal to follow mainline instead of relying on the nixos-22.05 channel

As of November 2022, there were no pre-built bindings for GDAL 3.6 and 3.7 in gdal-sys. As of Februrary and May, GDAL 3.6 and 3.7, respectively, now have pre-built bindings, which makes installation much simpler. I could personally not get binding builds to work, so nothing before 3.6 was possible to compile in November 2022. Thus, the "hack" in flake.nix with fetching GDAL from an older channel may not be necessary any more.

Change the default processing profile

The current default processing profile has these filters:

zero_corr_max_peak
equidistant_traces
correct_antenna_separation
normalize_horizontal_magnitudes(0.3)
unphase
kirchhoff_migration2d
normalize_horizontal_magnitudes(0.3)
dewow(5)
auto_gain(100)

There are two issues in this profile:

  • unphase is not common and may even be faulty, as there's no good theoretical basis for its functionality.
  • kirchhoff_migration2d is very useful for deep stuff like glacier thickness, but may not be as useful for high-frequency radar like snow stratigraphy. Therefore, this should be optional (say in another profile or just be excluded completely)

As the types of profiles start to widen, perhaps it could be useful with a --profile=PROFILE argument that would allow for many different types of profiles. It could even be combined with the suggestion of #9 where --profile=steps.txt is the way to use an external step file!

Add coordinates for "y2" (topographically corrected data) in export

Currently, there is no easy y-coordinate for the topographically corrected data. The dimension y2 should have elevation associated with it.

The y2-coordinate should technically be possible to calculate from the x/y coordinates (elevation and depth), but I honestly don't see a trivial way to do it, so it'd be much better to just export the y2 elevation instead.

Downscaling data using `equidistant_traces` (eg. on standstills) produces strange artefacts.

I'll add some examples later. My hunch is that the resampling scheme is wrong somehow (likely as I made it myself!). A new resampling function (made by GPT-4) could look something like:

fn interpolate(x: &Vec<f64>, y: &Vec<f64>, x_new: &Vec<f64>) -> Vec<f64> {
    let mut y_new: Vec<f64> = Vec::with_capacity(x_new.len());

    for &x_star in x_new {
        if x_star <= x[0] {
            // Linear extrapolation to the left
            y_new.push(y[0] + (x_star - x[0]) * (y[1] - y[0]) / (x[1] - x[0]));
        } else if x_star >= *x.last().unwrap() {
            // Linear extrapolation to the right
            let n = x.len();
            y_new.push(y[n - 1] + (x_star - x[n - 1]) * (y[n - 1] - y[n - 2]) / (x[n - 1] - x[n - 2]));
        } else {
            // Linear interpolation in-between
            let i = x.iter().position(|&xi| xi >= x_star).unwrap() - 1;
            y_new.push(y[i] + (x_star - x[i]) * (y[i + 1] - y[i]) / (x[i + 1] - x[i]));
        }
    }

    y_new
}

it also wrote some tests:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_interpolate() {
        let x = vec![1.0, 2.0, 3.0];
        let y = vec![1.0, 2.0, 3.0];
        let x_new = vec![1.5, 2.5, 0.0, 4.0]; // Includes values for extrapolation
        let y_new = interpolate(&x, &y, &x_new);
        assert_eq!(y_new, vec![1.5, 2.5, 0.5, 4.0]);

        let x = vec![1.0, 3.0, 5.0];
        let y = vec![1.0, 3.0, 5.0];
        let x_new = vec![2.0, 4.0, 0.0, 6.0]; // Includes values for extrapolation
        let y_new = interpolate(&x, &y, &x_new);
        assert_eq!(y_new, vec![2.0, 4.0, -1.0, 7.0]);
    }

    #[test]
    #[should_panic(expected = "x and y must have the same length")]
    fn test_interpolate_panic() {
        let x = vec![1.0, 2.0, 3.0];
        let y = vec![1.0, 2.0];
        let x_new = vec![1.5, 2.5];
        let _ = interpolate(&x, &y, &x_new);
    }
}

The only thing I'd like to add at this point is a check that x.size() == y.size() first.

Allow an empty processing step list or a `pass` step

For debugging, it's nice to just be able to use rsgpr for converting the raw data into something that's readable by other programs. Currently, rsgpr does not allow no step to be included:

❯ rsgpr -f DAT_0111_A1.rd3 --steps ""
Unrecognized step:

First of all, the error message above is not useful and should be replaced with a descriptive error that no step was given. Second, an easy way of passing "no step" should be included. The easiest way would be to just allow no steps to be provided. This does not work either at the moment:

❯ rsgpr -f DAT_0111_A1.rd3 -o .
No steps specified. Choose a profile or what steps to run

The only current workaround is to use subset with an all-inclusive bound:

❯ rsgpr -f DAT_0111_A1.rd3 --steps "subset(0 -1)"
Processing "DAT_0111_A1.rd3"
1/1, t+0.00 s, Running step subset(0 -1).
Exporting to "DAT_0111_A1.nc"

It would be much simpler and more intuitive if that was possible without a workaround!

Improve syntax help for step arguments

Admittedly, I often mix up the placement of commas when writing step arguments:

> rsgpr -f ... --steps "subset(0, 200)"
Unrecognized step: 200)

This is quite uninformative. It would be nicer to hint toward what may be wrong, for example by just informing that a comma is a delimiter for different steps, not for arguments.

´rsgpr´ refuses to work if no coordinate file exists

In some cases, coordinate information may not be available. This is an issue for the following filters:

  • equidistant_traces (needs distance)
  • correct_antenna_separation (needs distance)
  • kirchhoff_migration2d (needs x/y/z)
  • correct_topography (needs x/y/z)

Generally, it is of course preferable to have coordinate information, but for short tests and some types of processing, it should not be strictly needed. Currently, without a .cor file in the directory, rsgpr panics (c.f. #7) with a useless message:

❯ rsgpr -f DAT_0122_A1.rd3 --default                                                                                                  10:49
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/cli.rs:190:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Technically, all filters except those specified above should be possible to run even without a coordinate file. This would either work by allowing a non-existing .cor file and just failing (nicely) if one of the filters are above are run with an empty Vec<GPRLocation> attribute, or to have some kind of flag that shows the type of available location information that exists.

The latter would be slightly more complex, but it could allow for distances to be provided only in the future (Kirchhoff and topography still won't work, but at least equidistant and antenna separation would). A use-case for this is in case a wheel or another type of distance-only coordinate exists, but no 3D coordinate is available.

Add functionality to read `.nc` files back into `rsgpr`

When trying to read an .nc file (the output of a previous rsgpr run), an ugly panic occurs. Following #7, this should not result in a panic but instead a helpful error message. Also, reading an rsgpr output as an input is good for a few reasons:

  • Read the information and processing log of an output. The easiest way right now is to use xarray in python, which involves a few lines. It would be much nicer to have this built in to rsgpr.
  • Run more steps on a processed file. For example, this may be useful if the user wants to store intermediate files. Then, one can run rsgpr -f ... --steps "step1" -o step1.nc followed by rsgpr -f step1.nc --steps "step2" -o step2.nc"

For reference, here is what trying to retrieve the information about an output looks like now:

❯ rsgpr -f DAT_0122_A1.nc -i  
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "File found but no '.rad' file found: \"DAT_0122_A1.rad\""', src/cli.rs:190:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Add functionality to define steps in a file instead of the command line

For repeatability, it would be nice to be able to define steps in a text file instead of through the command line. This way, it would be easier to reuse the settings on different surveys.

Currently, this is the approach to run different steps. It makes for quite a long line if many filters are applied.

rsgpr -f file.rd3 --steps "zero_corr_max_peak,equidistant_traces,correct_antenna_separation,dewow(5),auto_gain(100)"

with a "step file" of the contents:

# steps.txt
zero_corr_max_peak
equidistant_traces
correct_antenna_separation
dewow(5)
auto_gain(100)

it would be nicer to be able to reference this file instead:

rsgpr -f file.rd3 --steps steps.txt

or

rsgpr -f file.rd3 --step-file steps.txt

Have a good release policy

I'm very lazy with releases; if something new is added or I fix something I bump the "patch" part of the version (X in "0.1.X"). Since the tool is so early, any fix is arguably a major change, so it does not make sense to me to do a major bump (X in X.1.0) even when backward compatibility breaks. In the near-future or now, there should be a clear policy for this.

Copied from semver 2.0, these are their policies:

  1. MAJOR version when you make incompatible API changes
  2. MINOR version when you add functionality in a backward compatible manner
  3. PATCH version when you make backward compatible bug fixes

Thus, a bugfix should bump the patch, and a new, say filter, should bump the minor. If considerable changes to the functionality are made, the major should be bumped. However, having a 1.0.0 should arguably only exist when the tool is more mature (potentially deviating a bit from semver).

At this moment, I'm comfortable with bumping minor and patch, but not major. Perhaps when the tool has reached a more mature state, semver can be used.

`zero_corr_max_peak` makes strange empty samples at the end of the profile

zero_corr_max_peak leaves/appends strange values at the end of a profile when run. I assume that this has to do with that a new array is constructed with only zeros:

let mut new_data = Array2::from_elem(

.. and then it may not be trimmed at the end correctly.

Here's the end of a radar profile without zero-corr:
image

And here it is with zero_corr_max_peak:
image

The fix may be as simple as just trimming the data correctly. I also think there's a zero-copy way of doing this instead of instantiating an entirely new array. Finally, a test should be added to make sure this does not happen!

Improve the error handling by avoiding panics

Right now, a few functions may panic, which is a sub-par way to report errors... Most .unwrap()s and direct indexings should be replaced with safer methods. This probably means that all GPR methods will need to return a Result instead of nothing which it does at the moment.

`remove_traces` should allow ranges

Here is a 500 MHz radargram where equidistant_traces has been applied:
image

As is evident, this filter didn't work perfectly. Without knowing the exact reason yet, I presume that the coordinate file doesn't properly show the standstill (?) so the bad data are not removed. Since there is some variability, an alternative explanation is that the instrument simply didn't wanna in that interval. This should probably be removed, and rsgpr could technically do that with remove_traces. But right now, this function only takes individual indices and not ranges, so it would be a very long call.

I suggest to add ranges to remove_traces:

remove_traces(0-10 40-100 50 52 54)

Fix default CRS hardcoding to WGS 84 / UTM Zone 33N

Since I develop most of this for work on Svalbard, it makes sense to default to UTM Zone 33N for my own purposes. For a general purpose tool, however, this is not good.

Defining which UTM zone is optimal is doable but not entirely trivial. Most zones are essentially just a bin of the longitude, i.e. -179.5°W is UTM zone 1, 1°E is zone 31, and 179.5°E is zone 60. Problems arise in northern Europe where the zones are more complex. For example, all of Spitsbergen is 33, when it should technically be between 32–34, Nordaustlandet is 35 when it should be 34–36. South Western Norway is 32 instead of 31–32. These need to be added as exceptions. Should not be too hard to program!

Assign default coordinates for the dimensions x, y and y2 in the exported data

All coordinates for the exported data exist with the right dimensionality (except y2; #42) but none are assigned by default to the dimensions, making tools like xarray confused until the coordinates are manually assigned as the main for each dimension. This is a minor inconvenience but should be easy to fix in the export step.

Add CI and linting

To keep a nicely formatted repo, and to make sure the up-to-date code compiles, it's good to add CI and pre-commit for linting/formatting. This way, the code is consistent, and will (most likely) run when installed.

Make `io.rs` into a module and move Malå-specific code to `mala.rs`

To alleviate support for e.g. GSSI data, or rsgpr outputs (#11), I/O related functions should be stored more future-proof. At the moment, most functions in io.rs relate only to the processing of Malå ´.rd3´ files. Adding more types of data as inputs will become very confusing if this is not sorted better.

Re-add `beautysh` to pre-commit once it works again.

Pre-commit has stopped working. Maybe fixable by just updating everything?

beautysh.................................................................Failed
- hook id: beautysh
- exit code: 1

Traceback (most recent call last):
  File "/home/runner/.cache/pre-commit/repo3we8pamt/py_env-python3.12/bin/beautysh", line 5, in <module>
    from beautysh import main
  File "/home/runner/.cache/pre-commit/repo3we8pamt/py_env-python3.12/lib/python3.12/site-packages/beautysh/__init__.py", line 9, in <module>
    import pkg_resources  # part of setuptools
    ^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pkg_resources'

Pretty format YAML.......................................................Failed
- hook id: pretty-format-yaml
- exit code: 1

Traceback (most recent call last):
  File "/home/runner/.cache/pre-commit/repo9838j_so/py_env-python3.12/bin/pretty-format-yaml", line 5, in <module>
    from language_formatters_pre_commit_hooks.pretty_format_yaml import pretty_format_yaml
  File "/home/runner/.cache/pre-commit/repo9838j_so/py_env-python3.12/lib/python3.12/site-packages/language_formatters_pre_commit_hooks/__init__.py", line 2, in <module>
    import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.