erikmannerfelt / rsgpr Goto Github PK
View Code? Open in Web Editor NEWSimple (-ish) Ground Penetrating Radar software
License: MIT License
Simple (-ish) Ground Penetrating Radar software
License: MIT License
If the file is too big to write to a single jpg, rsgpr unhelpfully fails with this error:
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Parameter(ParameterError { kind: DimensionMismatch, underlying: None })', src/gpr.rs:1517:42
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Either it should fail gracefully (no panic, keep on going with a warning saying "can't write jpg because it's too big"), or multiple jpg files should be written.
Most of the filters are right now defined in methods to the GPR
struct. This makes for a very long gpr.rs
file, and makes testing more difficult. There should probably be a filters.rs
file or a directory with sub-modules for different types of filters instead.
A very common way of accessing the package in an external flake is to call <package>.defaultPackage.<system>
. For some reason, this doesn't work with the flake, so it would be nice if it was fixed.
clap_complete is available and "should" be simple enough to use.
The problem I ran into was that I can't figure out how to generate autocompletion for the whole program; only subcommands can be completed according to my very limited understanding when using the derive approach.
In cases when a Malå GPR is misconfigured, lots of empty traces appear in the data. They are pretty easy to find, as the sum(abs(trace)) < 100
which is only true on these. Adding a filter to remove them would be great, as it otherwise now has to be done with an extremely long remove_traces
filter.
At the moment, there is no easy way to install rsgpr
and no instructions exist for it. I see three things that should be done:
flake.nix
to be able to install it without fiddling with the source code. (completed in #20)With #26, the CI build time is now around 13 minutes. That's not extreme, but it could surely be better. Most of the time is most likely spent on compiling the dependencies, and not rsgpr
itself. Therefore, assuming the dependency versions don't change, there's no good reason to recompile them every time a new action is triggered.
There are apparently good caching solutions out there. One blog showcases this nicely:
Optimizing Rust Builds for Faster GitHub Actions Pipelines
To showcase the capability of the tool, a nice screenshot would be cool. Here's one for example:
This is Kroppbreen on Svalbard from 2023 at 100 MHz collected by me. The run steps are:
zero_corr_max_peak
equidistant_traces
correct_antenna_separation
normalize_horizontal_magnitudes(0.3)
dewow(5)
auto_gain(100)
abslog
correct_topography
Linear positive/negative values are only somewhat useful for interpretation. Running a log10(abs(data))
(after making sure no zeros exist) makes for much nicer looking plots!
It could be called abslog
or something.
The rust crate gdal requires bindings for all separate versions (X.Y; e.g. 3.8) of the GDAL library. Nonexistent bindings should be possible to generate when enabling the bindgen
flag of the gdal crate, but I have never gotten this to work. Thus, I am personally constrained to using versions with pre-built bindings (<=3.7 as of April 2024. 3.8 exists but is not in a release yet). There should maybe be a disclaimer on the installation saying that the most recent GDAL may require extra work since the gdal crate is always a few months behind the GDAL library.
The fastest way (I think) to see which versions are supported are to go to the prebuilt-bindings directory in the source, filter by crate version (git tag) and see which are available.
https://github.com/georust/gdal/tree/master/gdal-sys/prebuilt-bindings
The best way would be if instructions for bindgen use could be provided, but then I would need to get it to work myself!
As of November 2022, there were no pre-built bindings for GDAL 3.6 and 3.7 in gdal-sys. As of Februrary and May, GDAL 3.6 and 3.7, respectively, now have pre-built bindings, which makes installation much simpler. I could personally not get binding builds to work, so nothing before 3.6 was possible to compile in November 2022. Thus, the "hack" in flake.nix
with fetching GDAL from an older channel may not be necessary any more.
The current default processing profile has these filters:
zero_corr_max_peak
equidistant_traces
correct_antenna_separation
normalize_horizontal_magnitudes(0.3)
unphase
kirchhoff_migration2d
normalize_horizontal_magnitudes(0.3)
dewow(5)
auto_gain(100)
There are two issues in this profile:
unphase
is not common and may even be faulty, as there's no good theoretical basis for its functionality.kirchhoff_migration2d
is very useful for deep stuff like glacier thickness, but may not be as useful for high-frequency radar like snow stratigraphy. Therefore, this should be optional (say in another profile or just be excluded completely)As the types of profiles start to widen, perhaps it could be useful with a --profile=PROFILE
argument that would allow for many different types of profiles. It could even be combined with the suggestion of #9 where --profile=steps.txt
is the way to use an external step file!
Currently, there is no easy y-coordinate for the topographically corrected data. The dimension y2 should have elevation associated with it.
The y2-coordinate should technically be possible to calculate from the x/y coordinates (elevation and depth), but I honestly don't see a trivial way to do it, so it'd be much better to just export the y2 elevation instead.
I'll add some examples later. My hunch is that the resampling scheme is wrong somehow (likely as I made it myself!). A new resampling function (made by GPT-4) could look something like:
fn interpolate(x: &Vec<f64>, y: &Vec<f64>, x_new: &Vec<f64>) -> Vec<f64> {
let mut y_new: Vec<f64> = Vec::with_capacity(x_new.len());
for &x_star in x_new {
if x_star <= x[0] {
// Linear extrapolation to the left
y_new.push(y[0] + (x_star - x[0]) * (y[1] - y[0]) / (x[1] - x[0]));
} else if x_star >= *x.last().unwrap() {
// Linear extrapolation to the right
let n = x.len();
y_new.push(y[n - 1] + (x_star - x[n - 1]) * (y[n - 1] - y[n - 2]) / (x[n - 1] - x[n - 2]));
} else {
// Linear interpolation in-between
let i = x.iter().position(|&xi| xi >= x_star).unwrap() - 1;
y_new.push(y[i] + (x_star - x[i]) * (y[i + 1] - y[i]) / (x[i + 1] - x[i]));
}
}
y_new
}
it also wrote some tests:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_interpolate() {
let x = vec![1.0, 2.0, 3.0];
let y = vec![1.0, 2.0, 3.0];
let x_new = vec![1.5, 2.5, 0.0, 4.0]; // Includes values for extrapolation
let y_new = interpolate(&x, &y, &x_new);
assert_eq!(y_new, vec![1.5, 2.5, 0.5, 4.0]);
let x = vec![1.0, 3.0, 5.0];
let y = vec![1.0, 3.0, 5.0];
let x_new = vec![2.0, 4.0, 0.0, 6.0]; // Includes values for extrapolation
let y_new = interpolate(&x, &y, &x_new);
assert_eq!(y_new, vec![2.0, 4.0, -1.0, 7.0]);
}
#[test]
#[should_panic(expected = "x and y must have the same length")]
fn test_interpolate_panic() {
let x = vec![1.0, 2.0, 3.0];
let y = vec![1.0, 2.0];
let x_new = vec![1.5, 2.5];
let _ = interpolate(&x, &y, &x_new);
}
}
The only thing I'd like to add at this point is a check that x.size() == y.size()
first.
For debugging, it's nice to just be able to use rsgpr
for converting the raw data into something that's readable by other programs. Currently, rsgpr
does not allow no step to be included:
❯ rsgpr -f DAT_0111_A1.rd3 --steps ""
Unrecognized step:
First of all, the error message above is not useful and should be replaced with a descriptive error that no step was given. Second, an easy way of passing "no step" should be included. The easiest way would be to just allow no steps to be provided. This does not work either at the moment:
❯ rsgpr -f DAT_0111_A1.rd3 -o .
No steps specified. Choose a profile or what steps to run
The only current workaround is to use subset
with an all-inclusive bound:
❯ rsgpr -f DAT_0111_A1.rd3 --steps "subset(0 -1)"
Processing "DAT_0111_A1.rd3"
1/1, t+0.00 s, Running step subset(0 -1).
Exporting to "DAT_0111_A1.nc"
It would be much simpler and more intuitive if that was possible without a workaround!
Admittedly, I often mix up the placement of commas when writing step arguments:
> rsgpr -f ... --steps "subset(0, 200)"
Unrecognized step: 200)
This is quite uninformative. It would be nicer to hint toward what may be wrong, for example by just informing that a comma is a delimiter for different steps, not for arguments.
In some cases, coordinate information may not be available. This is an issue for the following filters:
equidistant_traces
(needs distance)correct_antenna_separation
(needs distance)kirchhoff_migration2d
(needs x/y/z)correct_topography
(needs x/y/z)Generally, it is of course preferable to have coordinate information, but for short tests and some types of processing, it should not be strictly needed. Currently, without a .cor
file in the directory, rsgpr
panics (c.f. #7) with a useless message:
❯ rsgpr -f DAT_0122_A1.rd3 --default 10:49
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/cli.rs:190:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Technically, all filters except those specified above should be possible to run even without a coordinate file. This would either work by allowing a non-existing .cor
file and just failing (nicely) if one of the filters are above are run with an empty Vec<GPRLocation>
attribute, or to have some kind of flag that shows the type of available location information that exists.
The latter would be slightly more complex, but it could allow for distances to be provided only in the future (Kirchhoff and topography still won't work, but at least equidistant and antenna separation would). A use-case for this is in case a wheel or another type of distance-only coordinate exists, but no 3D coordinate is available.
When trying to read an .nc
file (the output of a previous rsgpr
run), an ugly panic occurs. Following #7, this should not result in a panic but instead a helpful error message. Also, reading an rsgpr
output as an input is good for a few reasons:
rsgpr
.rsgpr -f ... --steps "step1" -o step1.nc
followed by rsgpr -f step1.nc --steps "step2" -o step2.nc"
For reference, here is what trying to retrieve the information about an output looks like now:
❯ rsgpr -f DAT_0122_A1.nc -i
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: "File found but no '.rad' file found: \"DAT_0122_A1.rad\""', src/cli.rs:190:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
For repeatability, it would be nice to be able to define steps in a text file instead of through the command line. This way, it would be easier to reuse the settings on different surveys.
Currently, this is the approach to run different steps. It makes for quite a long line if many filters are applied.
rsgpr -f file.rd3 --steps "zero_corr_max_peak,equidistant_traces,correct_antenna_separation,dewow(5),auto_gain(100)"
with a "step file" of the contents:
# steps.txt
zero_corr_max_peak
equidistant_traces
correct_antenna_separation
dewow(5)
auto_gain(100)
it would be nicer to be able to reference this file instead:
rsgpr -f file.rd3 --steps steps.txt
or
rsgpr -f file.rd3 --step-file steps.txt
I'm very lazy with releases; if something new is added or I fix something I bump the "patch" part of the version (X in "0.1.X"). Since the tool is so early, any fix is arguably a major change, so it does not make sense to me to do a major bump (X in X.1.0) even when backward compatibility breaks. In the near-future or now, there should be a clear policy for this.
Copied from semver 2.0, these are their policies:
Thus, a bugfix should bump the patch, and a new, say filter, should bump the minor. If considerable changes to the functionality are made, the major should be bumped. However, having a 1.0.0 should arguably only exist when the tool is more mature (potentially deviating a bit from semver).
At this moment, I'm comfortable with bumping minor and patch, but not major. Perhaps when the tool has reached a more mature state, semver can be used.
zero_corr_max_peak
leaves/appends strange values at the end of a profile when run. I assume that this has to do with that a new array is constructed with only zeros:
Line 620 in dc73fa0
.. and then it may not be trimmed at the end correctly.
Here's the end of a radar profile without zero-corr:
And here it is with zero_corr_max_peak
:
The fix may be as simple as just trimming the data correctly. I also think there's a zero-copy way of doing this instead of instantiating an entirely new array. Finally, a test should be added to make sure this does not happen!
Right now, a few functions may panic, which is a sub-par way to report errors... Most .unwrap()
s and direct indexings should be replaced with safer methods. This probably means that all GPR
methods will need to return a Result
instead of nothing which it does at the moment.
Here is a 500 MHz radargram where equidistant_traces
has been applied:
As is evident, this filter didn't work perfectly. Without knowing the exact reason yet, I presume that the coordinate file doesn't properly show the standstill (?) so the bad data are not removed. Since there is some variability, an alternative explanation is that the instrument simply didn't wanna in that interval. This should probably be removed, and rsgpr could technically do that with remove_traces
. But right now, this function only takes individual indices and not ranges, so it would be a very long call.
I suggest to add ranges to remove_traces
:
remove_traces(0-10 40-100 50 52 54)
Since I develop most of this for work on Svalbard, it makes sense to default to UTM Zone 33N for my own purposes. For a general purpose tool, however, this is not good.
Defining which UTM zone is optimal is doable but not entirely trivial. Most zones are essentially just a bin of the longitude, i.e. -179.5°W is UTM zone 1, 1°E is zone 31, and 179.5°E is zone 60. Problems arise in northern Europe where the zones are more complex. For example, all of Spitsbergen is 33, when it should technically be between 32–34, Nordaustlandet is 35 when it should be 34–36. South Western Norway is 32 instead of 31–32. These need to be added as exceptions. Should not be too hard to program!
All coordinates for the exported data exist with the right dimensionality (except y2; #42) but none are assigned by default to the dimensions, making tools like xarray confused until the coordinates are manually assigned as the main for each dimension. This is a minor inconvenience but should be easy to fix in the export step.
To keep a nicely formatted repo, and to make sure the up-to-date code compiles, it's good to add CI and pre-commit for linting/formatting. This way, the code is consistent, and will (most likely) run when installed.
To alleviate support for e.g. GSSI data, or rsgpr
outputs (#11), I/O related functions should be stored more future-proof. At the moment, most functions in io.rs
relate only to the processing of Malå ´.rd3´ files. Adding more types of data as inputs will become very confusing if this is not sorted better.
Pre-commit has stopped working. Maybe fixable by just updating everything?
beautysh.................................................................Failed
- hook id: beautysh
- exit code: 1
Traceback (most recent call last):
File "/home/runner/.cache/pre-commit/repo3we8pamt/py_env-python3.12/bin/beautysh", line 5, in <module>
from beautysh import main
File "/home/runner/.cache/pre-commit/repo3we8pamt/py_env-python3.12/lib/python3.12/site-packages/beautysh/__init__.py", line 9, in <module>
import pkg_resources # part of setuptools
^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'pkg_resources'
Pretty format YAML.......................................................Failed
- hook id: pretty-format-yaml
- exit code: 1
Traceback (most recent call last):
File "/home/runner/.cache/pre-commit/repo9838j_so/py_env-python3.12/bin/pretty-format-yaml", line 5, in <module>
from language_formatters_pre_commit_hooks.pretty_format_yaml import pretty_format_yaml
File "/home/runner/.cache/pre-commit/repo9838j_so/py_env-python3.12/lib/python3.12/site-packages/language_formatters_pre_commit_hooks/__init__.py", line 2, in <module>
import pkg_resources
ModuleNotFoundError: No module named 'pkg_resources'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.