bcgov / nr-rfc-processing Goto Github PK

River forecast centre data processing solution

License: Apache License 2.0

Python 99.43% Batchfile 0.09% Dockerfile 0.48%

nr-rfc-processing's Introduction

Credential YAML

This YAML file will be passed to the --envpth argument to be able to download MODIS/VIIRS/Sentinel-2 data. This file needs to be located inside the mounted volume store in order for the internal processes to access the credentials.

--envpth /data/<credential>.yml

YAML

# register at https://urs.earthdata.nasa.gov/home
EARTHDATA_USER: username_without_quotes
EARTHDATA_PASS: password_without_quotes

# register at https://scihub.copernicus.eu/dhus/#/self-registration 
SENTINELSAT_USER: username_without_quotes
SENTINELSAT_PASS: password_without_quotes

MODIS/VIIRS Pipeline

Docker

Building Docker Image

To build the image from this directory:

docker build -t <tagname> .

Running Docker Image

To run the Docker image use the following schema:

docker run --rm -v <local store>:/data <tagname> <extra_commands>

Input Formats

ARG	VALUES	TYPE
envpth	Linux path	string
date	YYYY.MM.DD	string
sat	modis / viirs / sentinel	string
typ	watersheds / basins	string
days	1 / 5 / 8	int

Docker Options

OPTION	CAUSE
--rm	Removes container once it finishes running
-v	Mount volume to Docker container

Help

To list out commands available:

The default CMD is "--help" to list out the available commands.

docker run --rm -v <mount_point>:/data <tag_name>

FAQ

Why is there an authentication error when downloading MODIS or VIIRS data?

Answer

A HTML 503 Authentication Error is an expected behaviour from the NASA servers while downloading large amounts of data. Be patient as the download will retry and continue once the service is made available again.

Daily-Pipeline

docker run --rm -v <mount_point>:/data <tag_name> daily-pipeline --envpth /data/<creds.yml> --date <target_date: YYYY.MM.DD>

daily-pipeline kicks off the entire process chain that will include performing the following per satellite (MODIS/VIIRS):

Build up directory structure and supporting files
Download raw granule files
Process raw granules to GTiff formats
Calculate snow coverage per watersheds/basin
Build KML of watersheds/basins
Clean up intermediate files

Build Directory Structure

Builds necessary supporting files and directories in order for the process pipeline to properly manage file I/O.

docker run --rm -v <mount_point>:/data <tag_name> build

High Level Directory Structure:

/data
- /analysis
- /basins
- /intermediate_kml
- /intermediate_tif
- /kml
- /modis-terra
- /output_tif
- /plot
- /watersheds

Download

MODIS requires 5 or 8 days in order to build a composite of valid data. Option to download one day is possible. VIIRS will download a single day as it is a cloud-gap-filled product.

docker run --rm -v <mount_point>:/data <tag_name> download --envpth /data/<creds.yml> --sat <modis/viirs> --date <YYYY.MM.DD> --days <1/5/8>

Output:

Raw granules: modis-terra/<product>/<date>/.
- MODIS: MOD10A1.006
- VIIRS: VNP10A1F.001

Process

MODIS requires 5 or 8 days in order to build a composite of valid data. Default value is 5 days.

docker run --rm -v <mount_point>:/data <tag_name> process --sat <modis/viirs> --date <YYYY.MM.DD> --days <1/5/8>

Output:

Clipped watershed/basin GTiff: <watershed/basin>/<name>/<satellite>/<date>/.
- EPSG:4326 -> needed for KML
- EPSG:3153 -> BC Albers projection GTiff

Caclulate Snow Coverage

Analyze each watershed and basin to calculate the snow coverage based on the NDSI value.

docker run --rm -v <mount_point>:/data <tag_name> run-analysis --typ <watersheds/basins> -sat <modis/viirs> --date <YYYY.MM.DD>

Output:

SQLITE3 database: analysis/analysis.db

Database To CSV

Convert the SQLITE3 database into a CSV

docker run --rm -v <mount_point>:/data <tag_name> dbtocsv

Output:

CSV: analysis/analysis.csv

Build KMLs and Colour Ramped GTiffs

Build the colour-ramp GTiff and KML versions of the watersheds/basins.

docker run --rm -v <mount_point>:/data <tag_name> build-kml --date <YYYY.MM.DD> --typ <watersheds/basins> --sat <modis/viirs>

Output :

colourized GTiffs: <watersheds/basins>/<name>/<satellite>/<date>/.
KML : kml/<date>/

Compose KMLs

Compose built KML files into a heirarchal KML

docker run --rm -v <mount_point>:/data <tag_name> compose-kmls --date <YYYY.MM.DD> --sat <modis/viirs>

Zip KMLs

ZIP KMLs into a ZIP file

docker run --rm -v <mount_point>:/data <tag_name> zip-kmls

KNOWN ISSUE: ZIP file is larger than original KMLs -- deprecated

Plot

Plot all watersheds and basins into PNG plots with mapped colour bar.

docker run --rm -v <mount_point>:/data <tag_name> plot --date <YYYY.MM.DD> --sat <modis/viirs>

Clean

Manually clean up files and directories.

TARGET	CAUSE
all	Non-vital directories in `/data`
intermediate	Intermediate files in `intermediate_\[tif/kml\]`
output	Output files in `output_tif`
downloads	Raw granules in `modis-terra/`
watersheds	All files/dirs in `watersheds/`
basins	All files/dirs in `basins/`

docker run --rm -v <mount_point>:/data <tag_name> clean --target <target>

Sentinel-2 Pipeline

Docker

Building Docker Image

To build the image from this directory:

docker build -t <tagname> .

Running Docker Image

To run the Docker image use the following schema:

docker run --rm -it -v <local store>:/data <tagname> <extra_commands>

It is necessary to run the docker container in interactive mode (by including the -it option when calling docker run) as the Sentinel-2 process requires user interaction.

Process-Sentinel

Call the Sentinel-2 pipeline

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/<creds.yml> --lat <latitude> --lng <longtitude> --date <YYYY.MM.DD>

The pipeline will return a date ordered list of 10

OPTIONAL ARGS	VALUES	DEFAULT
--rgb	true / false	false
--max-allowable-cloud	int	50
--force-download	true / false	false
--day-tolerance	int	50

Outputs are logged to a log file in /data/log/.

Argument Details:

Latitude and Longtitude are WSG84 float numbers (-180 <= lat/lng => +180).
--rgb true : Creates a RGB output GTiff of the selected tile
--max-allowable-cloud <int> : Max percentage of cloud cover that is allowable in the query. Default is 50%.
--force-download true : Deletes existing downloads in target directory
--day-tolerance <int> : Number of days to look back from target date with given cloud expections. Default is 50 days.

Example 0

Using default arguments to demonstrate expected output.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.12 --lng -126.5  --date 2021.03.18

0 : DATE: 2021-03-17 || CLOUD%:  45.588648
1 : DATE: 2021-03-14 || CLOUD%:  17.49127
2 : DATE: 2021-03-07 || CLOUD%:  39.499657
3 : DATE: 2021-02-05 || CLOUD%:  16.117952
4 : DATE: 2021-01-31 || CLOUD%:  23.281466
5 : DATE: 2021-01-28 || CLOUD%:  2.606739
Pick which product to download and process [0-5/n]:

Example 1

Calling with high cloud tolerance and demonstrate upper limit of selection and typical output.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.73 --lng -126.5  --date 2021.03.18 --rgb true --max-allowable-cloud 90

0 : DATE: 2021-03-17 || CLOUD%:  32.746968
1 : DATE: 2021-03-14 || CLOUD%:  87.006531
2 : DATE: 2021-03-09 || CLOUD%:  45.446411
3 : DATE: 2021-03-07 || CLOUD%:  74.669036
4 : DATE: 2021-02-25 || CLOUD%:  84.000973
5 : DATE: 2021-02-22 || CLOUD%:  79.523277
6 : DATE: 2021-02-17 || CLOUD%:  56.401063
7 : DATE: 2021-02-15 || CLOUD%:  74.093911
8 : DATE: 2021-02-10 || CLOUD%:  32.880395
9 : DATE: 2021-02-07 || CLOUD%:  75.293885
Pick which product to download and process [0-9/n]: 0

Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.09G/1.09G [03:30<00:00, 5.19MB/s]MD5 checksumming: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.09G/1.09G [00:04<00:00, 231MB/s]
0

Notice the clouds and water are masked out form the analysis.

Example 2

Calling with low cloud tolerance to demonstrate limited selection.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.12 --lng -126.5  --date 2021.03.18 --max-allowable-cloud 20 --max-allowable-cloud 20

0 : DATE: 2021-03-14 || CLOUD%:  17.49127
1 : DATE: 2021-02-05 || CLOUD%:  16.117952
2 : DATE: 2021-01-28 || CLOUD%:  2.606739
Pick which product to download and process [0-2/n]:

nr-rfc-processing's People

Contributors

Stargazers

Watchers

Forkers

jsuwala webgismd

nr-rfc-processing's Issues

Retrieve data provider secrets from env vars

Currently the script expects a yaml file to retrieve environment variables required to download information.

All secrets should be passed to the scripts via environment variables.

This ticket will re-work the code so the various parameters defined in the yaml file will be defined and retrieved through environment variables

Integrate All Steps with Object Storage

When the script is run want it to be able to pickup from previous failures.

With the download process after data is downloaded it is copied to object storage, immediately after
When the download process starts before it attempts to retrieve data from object storage first before going to its official sources
When processing starts, it determines what needs to be done based on what data exists in the object storage

Create Object Storage Utility Module

In order to be able to get the snow pack script runing again, we need to include the ability for the code to communicate with object store via the api, vs expecting the data to exist locally.

Code already exists to facilitate this task, however in order to use it properly it should be just declared like any other object store dependency.

This task will take the code from this repo (https://github.com/franTarkenton/nr-objectstore-util) and:

transfer to bcgov repo
add autobuild github action that will create the package in pypi
update documentation

Modify Daily pipeline so fills in missing data for a given time window

currently the script is setup to run on a daily basis where it runs using the current date.

Issues with this approach is the data is not always available for the current date. Sometimes there can be a 3-4 day lag before data becomes available.

Changes:

instead of just running a specific date, the script will check what data exists in the object store bucket vs the data that is currently available. It will then run the pipeline for consecutive days, untill al the data that is currently available has been processed.

Fix Viirs processing - load object store data as required

Similar to how the modis data is currently being handled, need to modify how the viirs processing is taking place

move the path config to the path config module
use the ostore module to load the object store data on an as needed basis.

Implement timeout on async functions

When the processing (dailypipeline) is run it sometimes hangs. This is an example of the logs from the script running in GHA..

2023-04-26T14:18:48.3178324Z 2023-04-26 07:18:48 - 172 - hatfieldcmr.earthdata - INFO - downloading file https://n5eil01u.ecs.nsidc.org/DP4/MOST/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T14:18:49.3100898Z 2023-04-26 07:18:49 - 186 - hatfieldcmr.earthdata - INFO - saving granule modis-terra/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T14:18:49.3102653Z 2023-04-26 07:18:49 - 192 - hatfieldcmr.earthdata - INFO - fetched and uploaded file modis-terra/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml from https://n5eil01u.ecs.nsidc.org/DP4/MOST/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T20:08:12.1424245Z ##[error]The operation was canceled.
2023-04-26T20:08:12.1483953Z Post job cleanup.

Note the timestamps jump from 14:18:49 to 20:08:12, with no log messages in between suggesting that a process is stuck waiting.

Setting a 5 minutes timout would be more than reasonable for the process that is getting hung up.

Make script idempotent

When the script runs it should treat the object store as the source of truth. This task will:

break up the daily pipeline into smaller components.
- each satellite will run as a separate pipeline,
- each step in the satellite specific pipeline will be broken into smaller pieces and reflected in separate steps in the github action. Each of these steps should first check ostore for data and then pull locally, if its not available the go to source.
- steps:
  - build aoi's - creating the basin / watershed directories locally and extract the data from the shapefiles
  - download - download the data - convert into two way download and then push to ostore
  - process - process the data - push changes as data is processed. Ideally make this async
  - calculate stats - research spike, look into how this works and best way to persist the results. (think its getting saved to a sql db)
  - daily_kml - drop this step for now as its not being used
  - plot - Again create and cache at same time

Document Steps to run Locally

There is a readme for how to run using docker... using both the dockerfile and the existing readme, create docs on how to setup a local dev environment for working on this code.

Get Hatfield snowpack running again

quick fix, whatever is required to make this work again
originally was running using geodrive, scheduled through jenkins. Because the files on object store have files that are named using mixed case, it means we cannot use the previous geodrive solution.

Description of Task

get script running through github actions
add logic to the code to pull the files it needs from object store when / if they are required (just in time data load)
refine the object store module so can be re-used by other projects easily
fix the logging config to use logging.config
Add the github actions to complete build / run of the code

Dependencies

Epic

Pull norm data on an as needed basis

when trying to compare 10y 20y norms, should pull the data from object storage on an as needed basis. Shouldn't expect the data to just be available on a mounted drive

Set Objects to PUBLIC / READ when uploaded

When uploading objects to object storage, set them to public/read permissions.

This ticket will also require creation of a script that will set all the various objects to public read also.

required in order to properly deploy the frontend as if the objects are not set to public read they will not show up in the view

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

chore(deps): update dependency pylint to v3
chore(deps): update github actions all dependencies (major) (actions/checkout, docker/build-push-action, docker/login-action, ubuntu)
🔐 Create all rate-limited PRs at once 🔐

Pending Status Checks

These updates await pending status checks. To force their creation now, click the checkbox below.

chore(deps): update all non-major dependencies (black, continuumio/miniconda3, flake8, flake8-docstrings, mambaorg/micromamba, minio, pylint, pytest, python-cmr)

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile

Dockerfile

mambaorg/micromamba 1.4.9

dockerfile.hatfield

continuumio/miniconda3 4.9.2-alpine

ubuntu 20.04

github-actions

.github/workflows/pr_open.yaml

mathieudutour/github-tag-action v6.1

actions/checkout v3

docker/login-action v2

docker/build-push-action v4

ubuntu 22.04

ubuntu 22.04

ubuntu 22.04

.github/workflows/run_daily_pipeline_cron_mamba.yaml

actions/checkout v3

ubuntu 22.04

.github/workflows/run_pipeline_auto_fill_data.yaml

actions/checkout v3

mamba-org/setup-micromamba v1

ubuntu 20.04

.github/workflows/test_run_container_image.yaml

actions/checkout v3

docker/login-action v2

ubuntu 22.04

pip_requirements

hatfieldcmr/requirements.txt

scandir ==1.10.0

python-dotenv ==1.0.0

minio ==7.1.14

python-cmr ==0.7.0

requirements-dev.txt

flake8 ==6.0.0

black ==23.1.0

pytest ==7.3.1

requirements.txt

nr-objstore-util ==0.10.0

python-cmr ==0.7.0

snowpack_archive/requirements-dev.txt

pylint ==2.8.2

flake8 ==3.9.2

black ==21.5b1

flake8-docstrings ==1.6.0

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

chore(deps): update mathieudutour/github-tag-action action to v6.2
chore(deps): update dependency pytest to v8
chore(deps): update github actions all dependencies (major) (actions/checkout, docker/build-push-action, docker/login-action, ubuntu)
🔐 Create all rate-limited PRs at once 🔐

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

chore(deps): update all non-major dependencies (continuumio/miniconda3, flake8, flake8-docstrings, mambaorg/micromamba, minio, pylint, python-cmr, python-dotenv)

Pending Status Checks

These updates await pending status checks. To force their creation now, click the checkbox below.

chore(deps): update dependency pytest to v7.4.4

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

chore(deps): update dependency black to v24 [security]
chore(deps): update dependency flake8 to v7
chore(deps): update dependency pylint to v3
Click on this checkbox to rebase all open PRs at once

Detected dependencies

dockerfile

Dockerfile

mambaorg/micromamba 1.4.9

dockerfile.hatfield

continuumio/miniconda3 4.9.2-alpine

ubuntu 20.04

github-actions

.github/workflows/pr_open.yaml

mathieudutour/github-tag-action v6.1

actions/checkout v3

docker/login-action v2

docker/build-push-action v4

ubuntu 22.04

ubuntu 22.04

ubuntu 22.04

.github/workflows/run_daily_pipeline_cron_mamba.yaml

actions/checkout v3

ubuntu 22.04

.github/workflows/run_pipeline_auto_fill_data.yaml

actions/checkout v3

mamba-org/setup-micromamba v1

ubuntu 20.04

.github/workflows/test_run_container_image.yaml

actions/checkout v3

docker/login-action v2

ubuntu 22.04

pip_requirements

hatfieldcmr/requirements.txt

scandir ==1.10.0

python-dotenv ==1.0.0

minio ==7.1.14

python-cmr ==0.7.0

requirements-dev.txt

flake8 ==6.0.0

black ==23.1.0

pytest ==7.3.1

requirements.txt

nr-objstore-util ==0.10.0

python-cmr ==0.7.0

snowpack_archive/requirements-dev.txt

pylint ==2.8.2

flake8 ==3.9.2

black ==21.5b1

flake8-docstrings ==1.6.0

Check this box to trigger a request for Renovate to run again on this repository

reconfigure logging

use logging.config for centralized normal logging configuration.

Details

run.py line 25... remove that config
delete logging.py
create a logging.config file
name logger as per pep8 constants standard all upper case (https://peps.python.org/pep-0008/#constants)

Speedup Object Storage Uploads (Async/Multiprocess)

Currently the upload to object storage works on a single thread. Could significantly reduce upload time by making the upload process async.

See example for how to do that here:
download_granules.py which uses the multiprocess module. Similar approach could be used for syncing to object storage.

Add project lifecycle badge

No Project Lifecycle Badge found in your readme!

Hello! I scanned your readme and could not find a project lifecycle badge. A project lifecycle badge will provide contributors to your project as well as other stakeholders (platform services, executive) insight into the lifecycle of your repository.

What is a Project Lifecycle Badge?

It is a simple image that neatly describes your project's stage in its lifecycle. More information can be found in the project lifecycle badges documentation.

What do I need to do?

I suggest you make a PR into your README.md and add a project lifecycle badge near the top where it is easy for your users to pick it up :). Once it is merged feel free to close this issue. I will not open up a new one :)

Processing: as new data files get generated, upload them to object storage

related to #65

The download step is now complete for both Viirs and Modis data. The next step is to rework how the data processing script step works.

This ticket will:

modify the processing of the data so that immediately after new intermediate data is generated it will get copied up to object storage.
The script will always check what data is in object storage first, and pull it down before generating locally.

Fix: swap downloader back to async

async code was commented out during debugging for the downloading of the satellite data. Swap this back to async using the thread executor if easy. Should speed up the downloads significantly.

Modis / Download and process step

Sub ticket of 31...

Rework the download of modis data.

downloads and then uploads to object store
check ostore first for data then source
configure persistence to ostore after downloaded from source.

relates to: #44

Detect Data Availability At Start of Pipeline

Currently the pipeline expects the remote VIIRS/MODIS data to be available when it runs. In the event that the data has not been posted, the script will fail with an error message that provides no indication that the reason for the failure is the missing data.

This ticket would add a step to the start of the snowpack script that will probe the remote data providers to verify that the data required to run the script is in fact available. If not then the script will automatically fail with an error message that clearly indicates why the failure took place.

Increase Script / Data reliability

At the moment the script is setup to run a single date. Sometimes this doesn't work as the modis data or the viirs data is not available when the script runs.

This epic will implement the following:

when the pipeline runs it will evaluate what data we already have and the data that is available and process based on that configuration
Separate the steps taken in the dailypipeline so that if one step fails it doesn't impact other unrelated scripts.
Reconfigure scripts so that they can automatically pickup where they last got to.
- when data is downloaded immediately push it back to object storage.
- before downloading data check to see if the data exists in ostore first
- general improvements to try to make the code more readable. (use utility methods with descriptive names vs file name / date / data hacking using string manipulation regex's etc.) Methods should also provide clear description of what they do with examples

Dependencies

Epic

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means it's critical that we work to make our content as discoverable as possible. Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

That's it, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip 🤓

If your org is not in the list below, or the table contains errors, please create an issue here.
While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.
Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.
If your application is live, add the production URL.

Ministry Short Codes

Short Code	Organization Name
AEST	Advanced Education, Skills & Training
AGRI	Agriculture
ALC	Agriculture Land Commission
AG	Attorney General
MCF	Children & Family Development
CITZ	Citizens' Services
DBC	Destination BC
EMBC	Emergency Management BC
EAO	Environmental Assessment Office
EDUC	Education
EMPR	Energy, Mines & Petroleum Resources
ENV	Environment & Climate Change Strategy
FIN	Finance
FLNR	Forests, Lands, Natural Resource Operations & Rural Development
HLTH	Health
IRR	Indigenous Relations & Reconciliation
JEDC	Jobs, Economic Development & Competitiveness
LBR	Labour Policy & Legislation
LDB	BC Liquor Distribution Branch
MMHA	Mental Health & Addictions
MAH	Municipal Affairs & Housing
BCPC	Pension Corporation
PSA	Public Service Agency
PSSG	Public Safety and Solicitor General
SDPR	Social Development & Poverty Reduction
TCA	Tourism, Arts & Culture
TRAN	Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

Quick fix - adjust dates for snowpack script

The snowpack script is consistently failing due to data not being available. A longer term fix is describe in issue __.

This ticket will simply adjust the time delay to trigger the pipeline for data that is 6 days old vs 3. This is a quick short term fix to address data unavailability.

Object Store Cleanup

Currently there is a lot of duplication in the object storage bucket that is being used for snowpack analysis.

Some of this could be from the original hatfield dump of the data, others are due to a lack of understanding of the data structures when the original implementation of the automated runs were put into place.

Specific Items to evalute:

Check code, to see if we can figure out how the following folders are getting populated:
- modis_terra
- modis-terra
evaluate the archive process for the snowpack data, ie processes that populate data in the folder "snowpack_archive" This is where the daily runs of the comparisons for the snowpack analysis for a specific day vs historical norms are being stored.
suspect that the data in "snowpack_archive/modis-terra" should actually be going to just the folder "modis-terra"

Refactor / Revise how file Paths are handled

While there is some centralization of the file path configurations in the admin.constants.py module, code in the script is frequently building new paths from the constants provided in that module. Having the path manipulation code with business logic makes it difficult to read, and understand.

This ticket would create a path library that centralizes the calculation of paths and directories. Instead of code like:

    if sat == 'modis':
        mosaic = glob(os.path.join(const.INTERMEDIATE_TIF_MODIS, date, 'modis_composite*.tif'))[0]
    elif sat == 'viirs':
        mosaic = glob(os.path.join(const.OUTPUT_TIF_VIIRS, date.split('.')[0], f'{date}.tif'))[0]
    else:

Methods would be created to request the paths, example:

mosaic = snow_pathlib.get_intermediate_composite_tifs(sat='modis', date=date)

Long term objective is to make the code easier to understand / maintain.

Improve error messages when no granules available

The script retrieves granules for modis imagery, sometimes the data is not available and no data gets downloaded. When that happens and the next process that attempts to mosaic the granules, an error is generated.

The code should detect this condition and provide a more informative error message like

"no granules found for the current date range... "
or
"the granule directory for the date range (directory here) doesn't contain any information"

Create Data / Process Mapping Documentation

In order to break up the various processes understanding the relationship between the aspects of code and the data that gets produced will be a big help.

This task will attempt to break up all the various commands that can be sent to the run.py script, and the data that the various commands will produce.

Currently this is a list of the commands that run.py supports:

build
build-kml
clean
compose-kmls
daily-pipeline
dbtocsv
download
plot
process
process-sentinel
run-analysis

Add 'PROD' tag to image when deployed

update the github action for PR-close that does the helm chart deployment, and get it to add a 'PROD' tag to the image that is being deployed.

Bug:

Currently the script is crashing on some dates. Examples:

suspect this issue relates to data being available on a given date for Viirs, however the specific area that we need to download is not. To be verified.

Capture Missing Data (Result of daily pipeline failures)

The daily pipeline script runs as the title would suggest... daily. One of the inputs to the script is the current date. When the script fails for a particular day due to data not being available it creates holes in the historical information.

In order to avoid the holes in the data, this ticket would create a script that evaluates a time window for holes in data, and then runs the daily-pipeline for the missing days.

Improve Access to Hatfield Snowpack Analysis Data

Create dashboard kind of view that ties together all the hatfield data / analysis into a web based dashboarding kind of application

Dependencies

Epic

/issues/54

Lets use common phrasing

TL;DR 🏎️

Teams are encouraged to favour modern inclusive phrasing both in their communication as well as in any source checked into their repositories. You'll find a table at the end of this text with preferred phrasing to socialize with your team.

Words Matter

We're aligning our development community to favour inclusive phrasing for common technical expressions. There is a table below that outlines the phrases that are being retired along with the preferred alternatives.

During your team scrum, technical meetings, documentation, the code you write, etc. use the inclusive phrasing from the table below. That's it - it really is that easy.

For the curious mind, the Public Service Agency (PSA) has published a guide describing how Words Matter in our daily communication. Its an insightful read and a good reminder to be curious and open minded.

What about the master branch?

The word "master" is not inherently bad or non-inclusive. For example people get a masters degree; become a master of their craft; or master a skill. It's generally when the word "master" is used along side the word "slave" that it becomes non-inclusive.

Some teams choose to use the word main for the default branch of a repo as opposed to the more commonly used master branch. While it's not required or recommended, your team is empowered to do what works for them. If you do rename the master branch consider using main so that we have consistency among the repos within our organization.

Preferred Phrasing

Non-Inclusive		Inclusive
Whitelist	=>	Allowlist
Blacklist	=>	Denylist
Master / Slave	=>	Leader / Follower; Primary / Standby; etc
Grandfathered	=>	Legacy status
Sanity check	=>	Quick check; Confidence check; etc
Dummy value	=>	Placeholder value; Sample value; etc

Pro Tip 🤓

This list is not comprehensive. If you're aware of other outdated nomenclature please create an issue (PR preferred) with your suggestion.

Create GHA to run Daily pipeline

Create a GHA that will:

a) build the conda environment
b) populate repo with object storage secrets
c) creates the env.yaml file required for access to different servers to get MODIS/VIIRS data
d) runs the daily pipeline

Depends: #36

recalculate 10 year / 20 year historical norms

This has not been run since the data was originally provided by the hatfield team.

Task:

find and document the code that is required to regenerate the historical norm data.
re-run the calculations for 10y / 20y historical norms

relates: #89 - the code that calculates the historical norms likely is dependent on either the modis-terra or the modis_terra directories, and expects all the data to exist in one of those folders.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 180 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

If this product is being actively maintained, please close this issue.
If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Research: Investigate Google Earth Engine

is it easy to consume the data generated by this process in google earth engine.

Is it possible to easily generate a date slider for the snow data.

Resolve Snowpack Analysis not running

previously adjusted the run time in ticket: ____
the script for some reason continues to fail
get to the bottom of why it is continuing to fail and resolve

Separate hatfiledcmr egg into its own package

Currently the hatfield code is embedded into this project. To simplify the installation of this code recommend creating another repo with the hatfield code and building an automated build that will publish this code to pypi.

Will simplify the installation as we will no longer have to install a local egg package.

Assess current state of code

this task is ongoing... involves reviewing what's required to complete epic: #26

Also documents whats required to get the script running locally in WSL

Task will describe:

how to run the code locally
tickets for work that should be completed in order to enable code to run as GHA

Add Data Persistence Step

Create Persistence Script

When the script executes in a github action all the data generated will be discarded after the script runs.

This task will add code that will replicate the data created by the daily pipeline up to object storage.

Currently thinking this would be a process that is run after the dailypipeline is run

Posting Code

Hey @bsmith-hat and @jsuwala, just checking in to see if you guys can post your code here. Even if its not working / finished, being able to see it will help us with ideas around the best way to get it deployed in our architecture. Please let me know if there is anything I can do to facilitate getting code posted to this repo. Cheers

Create Sentinel Pipeline

There is a sentinel pipeline that is not currently being used.

This ticket would identify:

how is the sentinel data being used
Is the funtionality in the sentinel pipeline useful
if so, how can we automate a sentinel pipeline (does it need to be automated)

Update Logger / Dependencies

The last two scheduled runs for the snowpack analysis have failed.

Looks like they entire process gets hung up when trying to download files. Not sure why this is happening.

Log messages are using the root logger config. Set up new config that will help tie the log messages to the files which they originate from.

Also updated the build environment and the dependencies used by this script to latest versions available in conda-forge.

fix - snowpack script is running into errors again

Look into why script is failing

bcgov / nr-rfc-processing Goto Github PK

nr-rfc-processing's Introduction

Credential YAML

MODIS/VIIRS Pipeline

Docker

Building Docker Image

Running Docker Image

Input Formats

Docker Options

Help

FAQ

Daily-Pipeline

Build Directory Structure

Download

Process

Caclulate Snow Coverage

Database To CSV

Build KMLs and Colour Ramped GTiffs

Compose KMLs

Zip KMLs

Plot

Clean

Sentinel-2 Pipeline

Docker

Building Docker Image

Running Docker Image

Process-Sentinel

Argument Details:

Example 0

Example 1

Example 2

nr-rfc-processing's People

Contributors

Stargazers

Watchers

Forkers

nr-rfc-processing's Issues

Dependencies

Epic

Rate-Limited

Pending Status Checks

Open

Detected dependencies

Rate-Limited

Edited/Blocked

Pending Status Checks

Open

Detected dependencies

No Project Lifecycle Badge found in your readme!

What is a Project Lifecycle Badge?

What do I need to do?

Dependencies

Epic

TL;DR

Why Topic

What to do

How to use

Pro Tip 🤓

Ministry Short Codes

Dependencies

Epic

TL;DR 🏎️

Words Matter

What about the master branch?

Preferred Phrasing

Pro Tip 🤓

Recommend Projects

Recommend Topics

Recommend Org