opera-adt / disp-s1 Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 7.0 2.49 MB

OPERA Displacement workflows

License: Other

Dockerfile 2.08% Shell 6.65% Python 91.26%

disp-s1's Introduction

DISP-S1

Surface Displacement workflows for OPERA DISP-S1 products.

Creates the science application software (SAS) using the dolphin library.

Development setup

Prerequisite installs

Download source code:

git clone https://github.com/isce-framework/dolphin.git
git clone https://github.com/isce-framework/tophu.git
git clone https://github.com/opera-adt/disp-s1.git

Install dependencies, either to a new environment:

mamba env create --name my-disp-env --file disp-s1/conda-env.yml
conda activate my-disp-env

or install within your existing env with mamba.

Install tophu, dolphin and disp-s1 via pip in editable mode

python -m pip install --no-deps -e dolphin/ tophu/ disp-s1/

Setup for contributing

We use pre-commit to automatically run linting, formatting, and mypy type checking. Additionally, we follow numpydoc conventions for docstrings. To install pre-commit locally, run:

pre-commit install

This adds a pre-commit hooks so that linting/formatting is done automatically. If code does not pass the checks, you will be prompted to fix it before committing. Remember to re-add any files you want to commit which have been altered by pre-commit. You can do this by re-running git add on the files.

Since we use black for formatting and flake8 for linting, it can be helpful to install these plugins into your editor so that code gets formatted and linted as you save.

Running the unit tests

After making functional changes and/or have added new tests, you should run pytest to check that everything is working as expected.

First, install the extra test dependencies:

python -m pip install --no-deps -e .[test]

Then run the tests:

pytest

Optional GPU setup

To enable GPU support (on aurora with CUDA 11.6 installed), install the following extra packages:

mamba install -c conda-forge "cudatoolkit=11.6" cupy "pynvml>=11.0"

Building the docker image

To build the docker image, run:

./docker/build-docker-image.sh --tag my-tag

which will print out instructions for running the image.

disp-s1's People

Contributors

Stargazers

Watchers

Forkers

scottstanie gmgunter ehavazli liangjyu mirzaees rtburns-jpl seyeonjeon

disp-s1's Issues

Add an ampltiude pre-processor to convert saved mean/dispersion into one input per burst

After we move to storing means/ dispersions in the CCSLCs, we need to convert them into single rasters for the current PS input format. The mean will use something like

$$ \mu_{total} = (\sum_{i=1}^k \mu_i N_i) / (\sum_{i=1}^k N_i) $$

with similar idea for $\sigma^2_{total}$.

To minimize changes to the SAS, we can run a function on each CCSLC burst stack and produce one amplitude_mean.tif and one amplitude_dispersion.tif per burst.

Run product creation in parallel

Right now the ~20 netcdf products + compressed SLC creation takes about 20 minutes. It's largely IO, but can likely be sped up at least by 2x using ThreadPoolExecutor

Create a data bounding polygon, add to `identification/bounding_polygon`

The CSLC product has a /identification/bounding_polygon dataset, where they include WKT for the nodata boundary.

Possible implementation on example GSLC

gdal_calc.py -A NETCDF:t051_109451_iw3_20190329.h5:/data/VV --type Byte --co "COMPRESS=DEFLATE" --outfile not_nan_mask.tif --quiet --calc " ~numpy.isnan(A)"
gdal_polygonize.py -8 not_nan_mask.tif not_nan_polygons.shp

The spatial reference point isn't applied now for single reference unwrapping

While we apply the reference point if we invert unwrapped interferograms, we are not doing that when copying over single-reference interferograms

We need to copy, but then subtract the pixel value at the reference row/column (ensuring we don't set everything to nan)

Create short-wavelength layer in `product`

Steps to adding this in:

Add a wavelength_cutoff parameter to the AlgorithmParameters part of the configuration. Set default to 50 km
after running the dolphin portion, add a call to filter_long_wavelength here
create another layer in the product_info module
Save the filtered data here in product.py

Add cloud-optimized HDF5 file creation options to `product.py`

Full detail's pending results of Michael's test, but when creating the HDF5 files, we should

switch to the "page" allocation strategy
Set fs_page_size > max_data_chunk_size + epsilon, where max_data_chunk_size is the largest chunk for any dataset in the file. + epsilon could be replaced with * 2

Create a database of large (M>6) earthquakes in our AOI

At Fringe, it was mentioned that EGMS pulling in a list earthquakes M>6.0 from USGS. This will be relevant to check how often this leads to a huge surface displacement (depends on depth), as well as how the algorithm can handle changing reference dates.

GUI search (this can also be down in an API): https://earthquake.usgs.gov/earthquakes/map/?extent=-7.10089,-154.77539&extent=57.65716,-35.15625&range=search&timeZone=utc&search=%7B%22name%22:%22Search%20Results%22,%22params%22:%7B%22starttime%22:%222014-10-04%2000:00:00%22,%22endtime%22:%222023-10-11%2023:59:59%22,%22maxlatitude%22:50,%22minlatitude%22:5.966,%22maxlongitude%22:-65,%22minlongitude%22:-125,%22minmagnitude%22:6,%22orderby%22:%22time%22%7D%7D
75 earthquakes larger than 6 in CONUS down to panama
- Some of these will be clustered within the same 12 day window, which would be the same DISP-relevant event

Add "average seasonal coherence" as an input to the interface

Charlie has given a nice easy-to-use code example to pull the Josef global coherence dataset over an area of interest

https://github.com/OPERA-Cal-Val/s1-coherence-2020/blob/main/3_Seasonal_Coherence_View.ipynb

Figure out where to add this in the PgeRunconfig class
Grab the sample data for the delivery example
Update the description of this in the user guide

Reference date selection

Do a test over Alaska to determine if this tells us the correct time to pick as a reference date

Background

For each tile, seasonal composites of the coherence at different repeat intervals and backscatter imagery were calculated. We calculated the median coherence based on all coherence estimates per tile of a given repeat interval (6, 12, 18, 24, 36, and 48) per three-month period: 1) December, January, February 2) March, April, May 3) June, July, August, and 4) September, October, November. We chose the median operation to account for outliers. In the case of the backscatter intensity products, we calculated per three-month period the average backscatter intensity in VV and VH, or HH and HV, polarization.

The decay of coherence with increasing repeat interval was modelled for each season at pixel-level with the exponential model38
𝛾𝑡(𝑡)=(1−𝜌∞)𝑒−𝑡/𝜏+𝜌∞
(3)

where ρ∞ and τ denote the long-term coherence and rate of coherence decay with increasing repeat interval, respectively.

Including masks as product layers

We need to store any combined masks we used used for unwrapping (e.g. nodata, plus water mask, plus pixels < coherence threshold)
We may consider adding another "suggested mask" so that people can easily apply one layer to the unwrapped phase to get an output. We dont want to have to explain how to combine 2-4 conditions to recreate what are the "good pixels" to look at

Changes to `product.py` for updated product layout

make sure ps_mask_looked gets included in the output
We can remove this now that we have separated it away from dolphin: https://github.com/opera-adt/disp-s1/blob/6e3aadfe98f426e3e82545391fb08151445f01d5/src/disp_s1/product.py#L211C6-L213
CSLC has many more "identification" datasets than we do in product.py: https://github.com/opera-adt/COMPASS/blob/99fcd7ccdf78a627e1a537fcafee43b550165b64/src/compass/utils/h5_helpers.py#L402
CSLC has a whole "metadata group" in their product, which includes things like calibration and orbit info. We should decide if we need this, and if so what would go in it for DISP
Do we want to move the main data layers to /data/? or keep them at the root? I'm tempted to keep at root so people don't always need to do group="data" if loading with xarray
Rename "temporal correlation" to "temporal coherence", and get rid of "spatial correlation" references (in favor of "interferometric correlation"

Create configuration to run faster on a large machine

SDS has requested another config file for us/them to test on 32 or 64 CPU machines.
We'll need to adjust the parallelism of the stages. Assuming we have 128 GB of ram, we can try

Wrapped phase: n_parallel_bursts = 27
- Running 9 parallel bursts led to ~30 GB of RAM usage.
Unwrapping: (multiple options for this)
- n_parallel_tiles: 8
- n_parallel_jobs: 5
  (TBD: exact math on how much RAM snaphu uses for different tiles shapes/sizes.)

One possible idea to aid them: make a few "standard configurations" or "base configurations" (similar to the idea here, where you just specify 'big'. We could have a 16cpu configuration, a 64 cpu one, etc.

Skip the runconfig dtype check in `validate`

@collinss-jpl noted that SDS will often make small minor change to the PGE runconfig, which may/should not affect the test. But right now the validation fails since the string length is different:

disp_s1.validate.ComparisonError: /metadata/pge_runconfig dtypes do not match: |S10634 vs |S10630

We should remove this check and just log any difference

Turn off `overviews` by default

Since they can't get stored in the NetCDF, we should skip computing them in the dolphin config

Choosing a reference point and recording it

Pick one of the PS points,
- Possible way: take the biggest connected component. Take lowest amplitude dispersion PS pixel within that.
Reference the output products to this.
Record it in the product here: https://github.com/opera-adt/disp-s1/blob/main/src/disp_s1/product.py#L193-L209
- Note on this format: currently the scalar "value" is the phase that is subtracted from the image. The attrs of the dataset have the rows/cols. This would just be a list of one row/one col for a single PS point.

Compute solid earth tide correction, add layer

Here's a start: https://gist.github.com/scottstanie/d83654607b20589846f154d3a4390a2c . it used the COMPASS code without changing to radar coordinates.
you'll need to get the geometry files, they should be in the geometry_dir = out_dir / "geometry", but we aren't currently passing those through.
follow the other correction examples, like the ionosphere one here
if possible, please also fix this docstring to be more descriptive about what the output files are

Replace "0"s made in the unwrapping output with the original wrapped phase value

Since we don't want to make it impossible for a better unwrapping job, we'd rather keep the original wrapped phase where the unwrapper gave up.

Make sure we reset all data originally in the interferogram (both masked for unwrapping, and badly unwrapped areas)
Ensure that the edges haven't gotten slightly off of the nodata value. Every frame should be nans around the outside, and something nonzero inside
~~Make sure we have some clear mask layer which says "we think you should probably ignore these areas"~~

Create a CLI command for running `product.create_output_product` on existing dolphin outputs

Right now we need to run disp-s1 from the start to get a product. For more testing, it would be useful to just see the netcdf made from dolphin outputs (e.g. from AWS)

Adding a `time` dimension to the NetCDF product

This is up for discussion: We should possible add a /time dimension to each product to explicitly label the displacement time. We should be able to follow the CF-conventions (which do specify that the name should be "time") by using h5netcdf.

downsides:
- shape becomes (1, rows, cols) so some things will complain, e.g. if you directly plot the (1, rows, cols) image (This already happens when you use rasterio to load a 1-band image)
upsides:
- concatenating multiple becomes more straightforward because the time dimension is explicitly encoded
- everyone doesn't have to write a string/name parser (i.e. you can just do xr.open_mfdataset() and it should just work)

`group_by_date` should only account for the product date, not the generation date, for grouping CSLCs

https://github.com/opera-adt/disp-s1/blob/v0.1.0/src/disp_s1/main.py#L157-L159

Multiple ways to implement this

We could need to add a another argument to group_by_date if we'd like to only group by some subset
we could manipulate the keys that come out of group_by_date so that we just select the first one, if we dont want to alter opera_utils

Switch `provision-with-micromamba` to `setup-micromamba`

See isce-framework/dolphin#113

Create spatial baseline cubes from the CSLC input orbits

Started the data loading part for this here: https://gist.github.com/scottstanie/a15f6c8cc9412496ee29686d90e64e06 . These loading functions could be added to https://github.com/opera-adt/disp-s1/blob/main/src/disp_s1/_parse_cslc_product.py
A new baselines.py module could be made to complete the geometry calculation.
- unclear if this is better suited to dolphin or this repo
~~Piyush suggested storing these as a 3D metadata cube in the DISP results~~ for now, assume the output is 2d, heavily downsampled raster of perpendicular baselines
Check Jungkyo's NISAR code for the insar workflow
- Otherwise, missing math for perpendicular baselines: https://github.com/isce-framework/isce2/blob/9bc1ba87717e0873cb329952a49920a76be41d73/contrib/stack/topsStack/baselineGrid.py#L134-L152

Convert phase units from radians to meters using wavelength

Incremental script producing garbage outputs

@mirzaees found that run_repeated_nrt.py is now messed up compared to the sequential workflow:

Figure out the fix
add a test of a small area to the integration tests/benchmarks (once that is fixed / set up)

Particular CI-regions to test for bugs

We would like to test the algorithm in regions where particular bugs may arise.

International date line crossing: Alaska

New validation checks

On top of the existing golden output validation script

A check if the actual bounds of the file match the expected bounds from the frame_to_burst json
Ensure the unwrapped phase values are congruent with the wrapped phase values

(...todo: figure out what better checks to do on the results).

Read from `timeseries` output paths, not `unwrapped`

https://github.com/opera-adt/disp-s1/blob/main/src/disp_s1/main.py#L57

For unwrappers like spurt, there will always be a network that gets inverted into the timeseries folder. We need to use that (and possibly fix the naming, either in dolphin, or here)