Giter VIP home page Giter VIP logo

hyp3-autorift's Introduction

HyP3 autoRIFT Plugin

DOI

The HyP3-autoRIFT plugin provides a set of workflows for feature tracking processing with the AutoRIFT autonomous Repeat Image Feature Tracking (autoRIFT) software package. This plugin is part of the Alaska Satellite Facility's larger HyP3 (Hybrid Plugin Processing Pipeline) system, which is a batch processing pipeline designed for on-demand processing of remote sensing data. For more information on HyP3, see the Background section.

Installation

  1. Ensure that conda is installed on your system (we recommend using mambaforge to reduce setup times).
  2. Clone the hyp3-autorift repository and navigate to the root directory of this project
    git clone https://github.com/ASFHyP3/hyp3-autorift.git
    cd hyp3-autorift
  3. Create and activate your Python environment
    mamba env create -f environment.yml
    mamba activate hyp3-autorift
  4. Finally, install a development version of HyP3 autoRIFT
    python -m pip install -e .

Usage

The HyP3-autoRIFT plugin provides workflows (accessible directly in Python or via a CLI) that can be used to process SAR data or optical data using autoRIFT. HyP3-autoRIFT can process these satellite missions:

  • SAR:
    • Sentinel-1
  • Optical:
    • Sentinel-2
    • Landsat 4,5,7,8,9

To see all available workflows, run:

python -m hyp3_autorift ++help

hyp3_autorift workflow

The hyp3_autorift workflow is used to get dense feature tracking between two images using autoRIFT. You can run this workflow by selecting the hyp3_autorift process:

python -m hyp3_autorift ++process hyp3_autorift [WORKFLOW_ARGS]

or by using the hyp3_autorift console script:

hyp3_autorift [WORKFLOW_ARGS]

For example:

hyp3_autorift \
  "S2B_MSIL1C_20200612T150759_N0209_R025_T22WEB_20200612T184700" \
  "S2A_MSIL1C_20200627T150921_N0209_R025_T22WEB_20200627T170912"

This command will run autorift for a pair of Sentinel-2 images.

Important

Credentials are necessary to access Landsat and Sentinel-1 data. See the Credentials section for more information.

For all options available to this workflow, see the help documentation:

hyp3_autorift --help

Credentials

Depending on the mission being processed, some workflows will need you to provide credentials. Generally, credentials are provided via environment variables, but some may be provided by command-line arguments or via a .netrc file.

AWS Credentials

To process Landsat images, you must provide AWS credentials because the data is hosted by USGS in a "requester pays" bucket. To provide AWS credentials, you can either use an AWS profile specified in your ~/.aws/credentials by exporting:

export AWS_PROFILE=your-profile

or by exporting credential environment variables:

export AWS_ACCESS_KEY_ID=your-id
export AWS_SECRET_ACCESS_KEY=your-key
export AWS_SESSION_TOKEN=your-token  # optional; for when using temporary credentials

For more information, please see: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html

NASA Earthdata Login and ESA Copernicus Data Space Ecosystem (CDSE)

To process Sentinel-1 images, you must provide Earthdata Login credentials and ESA Copernicus Data Space Ecosystem (CDSE) credentials in order to download input data.

  • If you do not already have an Earthdata account, you can sign up here.
  • If you do not already have a CDSE account, you can sign up here.

For Earthdata login and CDSE, you can provide credentials by exporting environment variables:

export EARTHDATA_USERNAME=your-edl-username
export EARTHDATA_PASSWORD=your-edl-password
export ESA_USERNAME=your-esa-username
export ESA_PASSWORD=your-esa-password

or via your ~/.netrc file which should contain lines like these two:

machine urs.earthdata.nasa.gov login your-edl-username password your-edl-password
machine dataspace.copernicus.eu login your-esa-username password your-esa-password

Tip

Your ~/.netrc file should only be readable by your user; otherwise, you'll receive a "net access too permissive" error. To fix, run:

chmod 0600 ~/.netrc

Docker Container

The ultimate goal of this project is to create a docker container that can run autoRIFT workflows within a HyP3 deployment. To run the current version of the project's container, use this command:

docker run -it --rm \
    -e AWS_ACCESS_KEY_ID=[YOUR_KEY] \
    -e AWS_SECRET_ACCESS_KEY=[YOUR_SECRET] \
    -e EARTHDATA_USERNAME=[YOUR_USERNAME_HERE] \
    -e EARTHDATA_PASSWORD=[YOUR_PASSWORD_HERE] \
    -e ESA_USERNAME=[YOUR_USERNAME_HERE] \
    -e ESA_PASSWORD=[YOUR_PASSWORD_HERE] \
    ghcr.io/asfhyp3/hyp3-autorift:latest \
    ++process hyp3_autorift \
    [WORKFLOW_ARGS]

Tip

You can use docker run --env-file to capture all the necessary environment variables in a single file.

Docker Outputs

To retain hyp3_autorift output files running via Docker there are two recommended approaches:

  1. Use a volume mount

    Add the -w /tmp -v ${PWD}:/tmp flags after docker run; -w changes the working directory inside the container to /tmp and -v will mount your current working directory to the /tmp location inside the container such that hyp3_autorift outputs are preserved locally. You can replace ${PWD} with any valid path.

  2. Copy outputs to a remote AWS S3 Bucket

    Append the --bucket and --bucket-prefix to [WORKFLOW_ARGS] so that the final output files are uploaded to AWS S3. This also requires that AWS credentials to write to the bucket are available to the running container. For example, to write outputs to a hypothetical bucket s3://hypothetical-bucket/test-run/:

    docker run -it --rm \
        -e AWS_ACCESS_KEY_ID=[YOUR_KEY] \
        -e AWS_SECRET_ACCESS_KEY=[YOUR_SECRET] \ 
        -e AWS_SESSION_TOKEN=[YOUR_TOKEN] \  # Optional
        -e EARTHDATA_USERNAME=[YOUR_USERNAME_HERE] \
        -e EARTHDATA_PASSWORD=[YOUR_PASSWORD_HERE] \
        -e ESA_USERNAME=[YOUR_USERNAME_HERE] \
        -e ESA_PASSWORD=[YOUR_PASSWORD_HERE] \
        ghcr.io/asfhyp3/hyp3-autorift:latest \
          ++process hyp3_autorift \
          [WORKFLOW_ARGS] \
          --bucket "hypothetical-bucket" \
          --bucket-prefix "test-run"
    

Background

HyP3 is broken into two components: the cloud architecture/API that manages the processing of HyP3 workflows and Docker container plugins that contain scientific workflows that produce new science products from a variety of data sources (see figure below for the full HyP3 architecture).

Cloud Architecture

The cloud infrastructure-as-code for HyP3 can be found in the main HyP3 repository., while this repository contains a plugin that can be used for feature tracking processing with AutoRIFT.

License

The HyP3-autoRIFT plugin is licensed under the BSD 3-Clause license. See the LICENSE file for more details.

Code of conduct

We strive to create a welcoming and inclusive community for all contributors to HyP3-autoRIFT. As such, all contributors to this project are expected to adhere to our code of conduct.

Please see CODE_OF_CONDUCT.md for the full code of conduct text.

Contributing

Contributions to the HyP3-autoRIFT plugin are welcome! If you would like to contribute, please submit a pull request on the GitHub repository.

Contact Us

Want to talk about HyP3-autoRIFT? We would love to hear from you!

Found a bug? Want to request a feature? open an issue

General questions? Suggestions? Or just want to talk to the team? chat with us on gitter

hyp3-autorift's People

Contributors

jhkennedy avatar asjohnston-asf avatar jacquelynsmale avatar forrestfwilliams avatar mfangaritav avatar jlrine2 avatar cirrusasf avatar jtherrmann avatar dependabot[bot] avatar andrewplayer3 avatar

Stargazers

ZGH avatar Aiguo Zhao avatar Liao Spacefan avatar  avatar Suhail Khan avatar Alex Gardner avatar

Watchers

James Cloos avatar Kirk Hogenson avatar Tom Logan avatar  avatar  avatar  avatar Rebecca Miller avatar  avatar

Forkers

whigg mfangaritav

hyp3-autorift's Issues

Questions about "--parameter-file"

Hello professor, I have some questions about the parameter file in "--parameter-file". For example, does the "0120m" in "autorift_solidearth_0120m.shp" refer to the resolution of 120 meters?

Secondly, is it suitable for application in coal mining subsidence areas?

Looking forward to your reply! Thank you.

netCDF source and references need to be updated for production

ITS_LIVE website has updated "how to cite" guidelines:

The recommended citation for the Regional Glacier and Ice Sheet Surface Velocities is:
"Velocity data generated using auto-RIFT (Gardner et al., 2018) and provided by the NASA MEaSUREs ITS_LIVE project (Gardner et al., 2019)."

Gardner, A. S., M. A. Fahnestock, and T. A. Scambos, 2019 [update to time of data download]: ITS_LIVE Regional Glacier and Ice Sheet Surface Velocities. Data archived at National Snow and Ice Data Center; doi:10.5067/6II6VW8LLWJ7.

Gardner, A. S., G. Moholdt, T. Scambos, M. Fahnstock, S. Ligtenberg, M. van den Broeke, and J. Nilsson, 2018: Increased West Antarctic and unchanged East Antarctic ice discharge over the last 7 years, Cryosphere, 12(2): 521–547, doi:10.5194/tc-12-521-2018.

Production products should have the Gardner et al. (2019) reference (NSIDC DOI) in the references attribute.

All products should have the "NASA MEaSUREs ITS_LIVE project" soumewhere in the source attribute.

AutoRIFT should quickly fail if the reference and secondary scenes are the same

AutoRIFT is a pixel-tracking algorithm and so requires at least some difference between the reference and secondary scene.

Unfortunately, autoRIFT will process a pair that has the same scene for reference and secondary all the way through and then finally fail due to NaN issues in the netCDF packaging step at the very end, like in this job:
https://hyp3-api.asf.alaska.edu/jobs/879efe81-adf2-48cb-89db-f4bd88c577e4

We should check up front that the reference and secondary scene provided aren't exactly the same and fail immediately.

Sentinel-2 L1C breaks S2 metadata

With the downgrade to S-2 L1C products, URLs to the products now look like:

s3://sentinel-s2-l1c/tiles/22/W/EB/2020/6/12/0/B08.jp2

and our downloaded files look like:

S2B_MSIL1C_20200612T150759_N0209_R025_T22WEB_20200612T184700_B08.jp2

autoRIFT is expecting URLs to look like:

https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/22/W/EB/2020/9/S2B_22WEB_20200903_0_L2A/B08.tif

and downloaded files to look like ???

Blocked: this needs to be resolved upstream

grid spacing not being read from parameter files

bug
I'm pointing autoRIFT to the new shapefile that in-turn points to the 120m gridded parameter files but they are producing 240m grids... tracing the issue a bit it looks like it must happen on the gdal call to retrieve the parameters.
For example the parameter file: http://its-live-data.jpl.nasa.gov.s3.amazonaws.com/autorift_parameters/v001/SPS_0120m_yMinChipSize.tif
has a 120 m grid spacing but the file "SPS_0120m_yMinChipSize.tif" created by the autoRIFT workflow has 240m grid spacing


this line: https://github.com/ASFHyP3/hyp3-autorift/blob/develop/hyp3_autorift/io.py#L86

hyp3_autorift/io.py:86
out_path, tif, outputBounds=output_bounds, xRes=240, yRes=240, targetAlignedPixels=True, multithread=True,

needs to get xRes and yRes from the input file

** ALSO MAKE SURE THAT PIXELS CENTERS ARE NOT BEING MOVED FROM ORIGINAL PARAMETER FILE **

__init__.py:main fails with ValueError: too many values to unpack

$ python -m hyp3_autorift ++process hyp3_autorift S1A_IW_SLC__1SSH_20170118T091036_20170118T091104_014884_01846D_01C5 S1B_IW_SLC__1SSH_20170112T090955_20170112T091023_003813_0068DC_C750
Traceback (most recent call last):
  File "/home/asjohnston/mambaforge/envs/hyp3-autorift/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/asjohnston/mambaforge/envs/hyp3-autorift/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/asjohnston/src/hyp3-autorift/src/hyp3_autorift/__main__.py", line 47, in <module>
    main()
  File "/home/asjohnston/src/hyp3-autorift/src/hyp3_autorift/__main__.py", line 38, in main
    (process_entry_point,) = {process for process in eps if process.name == args.process}
ValueError: too many values to unpack (expected 1)

Reviewing importlib.metadata.entry_points, there are multiple entry points registered with the same hyp3_autorift name:

>>> from importlib.metadata import entry_points
>>> for ep in entry_points()['console_scripts']:
...     if ep.name == 'hyp3_autorift':
...         print(ep)
... 
EntryPoint(name='hyp3_autorift', value='hyp3_autorift.__main__:main', group='console_scripts')
EntryPoint(name='hyp3_autorift', value='hyp3_autorift.process:main', group='console_scripts')
EntryPoint(name='hyp3_autorift', value='hyp3_autorift.process:main', group='console_scripts')

This was possibly introduced when migrating from setup.py to pyproject.toml

Remove dependence on mat file

Currently, testautoRIFT_ISCE.py sub-processes out a call to topsinsar_filename.py to just load back in a dictionary with product metadata:

runCmd('topsinsar_filename.py')
conts = sio.loadmat('topsinsar_filename.mat')

This is bad and entirely not needed -- we could just patch the vendored script to call the function directly, or push changes upstream to actually package everything it needs appropriately.

Landsat 7 + 8 pairs may not be handled well

Since we only look at the reference scene to determine the platform, we may have some issues with the secondary scene:

Fortunately, in our current campaign pair list, 4/5 pairs are only crossed with 4/5 pairs, and 9 pairs are only crossed with 8, so we should only see issues around 7+8 pairs.

>>> import geopandas as gpd
>>> df = gpd.read_parquet('l45789.parquet')
>>> df.groupby(['ref_mission', 'sec_mission']).reference.count()
ref_mission  sec_mission
L4           L4                1058
             L5                3203
L5           L4                2988
             L5              767024
L7           L7             2003704
             L8              474339
L8           L7              416539
             L8             4103107
             L9              367178
L9           L8              112403
             L9              109936

Fix: AutoRIFT pre-processing defaults to the strictest filtering, so we should allow L8 scenes to be wallis filled if either pair is L7. And we should check both ref, sec platform when deciding to run the pre-processing filters or not in process.py

autoRIFT assume 'glaciers' as the research application

The current autoRIFT parameter file:

aws s3 ls s3://its-live-data.jpl.nasa.gov/autorift_parameters/v001/autorift_parameters.shp

Assumes the area of interested in glaciers/land-ice and links to files with optimized search parameters for the research application.

Ideally, hyp3_autorift would take a parameter where you could specify the research application and select the right shape file (or right features out of the shape file) based on that.

Note: this will require action by JPL to generate appropriate input files and specify how to query based on research application

Improve performance of crop

Cropping some of the Golden Test netCDF files takes over 10 minutes to perform, which seems ridiculously long.

Profiling the script on

LC08_L1TP_009011_20200703_20200913_02_T1_X_LC08_L1TP_009011_20200820_20200905_02_T1_G0120V02_P078_IL_ASF_OD.nc

shows 99.8% of the wallclock time is spent in the where method here:

cropped_ds = ds.where(mask, drop=True)

Some strange errors reported by Hyp3-AutoRIFT

Hello, professor. When I used Hyp3-autorift to track the pixel offset of the Turkish earthquake, some strange errors occurred. As shown in the figure, I used Sentinel-2 data and the command was: hyp3_autorift
"S2A_MSIL1C_20230120T082241_N0509_R121_T37SCB_20230120T091424"
"S2A_MSIL1C_20230209T082111_N0509_R121_T37SCB_20230209T091429"
The error reported is as follows:
1{6 RGO{ 55~ 77X EKU86F

NetCDF output should record which parameter file was used

ITS_LIVE requested that the parameter file version be recorded in the netCDF file.

However, parameter file can now be passed in on the CLI, and there are likely going to be different files for different research applications.

So, we should record the full path to the file used.

Unfortunately, we subprocess out to vendored script, so the vendored scripts should be updated upstream to (preferably) allow import and direct calling of script workflow or (alternatively) take parameter file as an input parameter.

identify a few example pairs that fail with this error

The Malaspina datacube test is rife with these failures. You can get a list of pairs that failed due to miss-matched projection from the GeoDataFrame record of the test, located here:

aws s3 ls --profile hyp3 s3://enterprise-campaigns/its-live/tests/malasina.parquet

To get the list of pairs you can run:
import geopandas as gpd

df = gpd.read_parquet('malaspina.parquet')

projection_failures = df.loc[
    (df.status == 'FAILED')
    df.mission.str.startswith('L') 
    & df.look.str.startswith('Exception: The current version of geo_autoRIFT assumes the two images are in the same projection')
    ]

You can create an environment for these commands with the attached env yaml.

S2 names should use COG Identifier

S2 names are currently given using the ESA identifier (e.g. S2B_MSIL2A_20200903T151809_N0214_R068_T22WEB_20200903T194353), we should change to the COG identifier (e.g. S2B_22WEB_20200903_0_L2A) because that's what Mark's pair list uses.

Also, consider validation of granule names and pairs here:

  • both are valid names
  • both are from the same mission

Google is throwing HTTP 429 errors when opening Sentinel-2 scenes

I've attached a list of ~1500 HTTP links to Sentinel-2 jobs which failed due to Google throwing HTTP 429s when trying to read the data.

From the logs it looks like GDAL is trying to open a pile of side-car files that may contain metadata, which we have no interest in and that may be causing the rate-limit issues. We are setting export GDAL_DISABLE_READDIR_ON_OPEN='EMPTY_DIR' to prevent GDAL From trying to list the entire cloud object store, but it's possible there's another GDAL environment variable we can set to prevent side-car files from being loaded.

To reproduce locally, you can use either the autorift environment (linux only) or docker container, which looks like:

hyp3_autorft <S2_REF> <S2_SEC>

Essentially, what's happening in the code is just a gdal.Open on the data, so we likely can reproduce by simply writing a python script that opens and reads two scenes. Psudo code:

ds = gdal.Open('/vsicurl/http...')
ds.readAsArray()

making sure the GDAL environment variables are set accordingly.

important parts of the hyp3-autorift:

Problem with splicing Sentinel-2 OT results obtained using Hyp3-autorift

Hello, professors! I encountered some problems when using Hyp3-autorift to process Sentinel-2 data. First, I can quickly get the OT results of each small block through Hyp3-autorift, but when I splice all the OT results with gdal, the situation shown in the figure below appears. I don’t understand why there are so many gaps around.
I tried to crop the OT results of each small block around, and then when I performed a three-dimensional solution, the result of the second picture appeared.
I am very confused, please help me, professors. Thank you.
image

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.