Giter VIP home page Giter VIP logo

nr-rfc-processing's Introduction

Lifecycle:Stable

Credential YAML

This YAML file will be passed to the --envpth argument to be able to download MODIS/VIIRS/Sentinel-2 data. This file needs to be located inside the mounted volume store in order for the internal processes to access the credentials.

--envpth /data/<credential>.yml
YAML
# register at https://urs.earthdata.nasa.gov/home
EARTHDATA_USER: username_without_quotes
EARTHDATA_PASS: password_without_quotes

# register at https://scihub.copernicus.eu/dhus/#/self-registration 
SENTINELSAT_USER: username_without_quotes
SENTINELSAT_PASS: password_without_quotes

MODIS/VIIRS Pipeline

Docker

Building Docker Image

To build the image from this directory:

docker build -t <tagname> .

Running Docker Image

To run the Docker image use the following schema:

docker run --rm -v <local store>:/data <tagname> <extra_commands>

Input Formats

ARG VALUES TYPE
envpth Linux path string
date YYYY.MM.DD string
sat modis / viirs / sentinel string
typ watersheds / basins string
days 1 / 5 / 8 int

Docker Options

OPTION CAUSE
--rm Removes container once it finishes running
-v Mount volume to Docker container

Help

To list out commands available:

The default CMD is "--help" to list out the available commands.

docker run --rm -v <mount_point>:/data <tag_name>

FAQ

Why is there an authentication error when downloading MODIS or VIIRS data?

Answer

A HTML 503 Authentication Error is an expected behaviour from the NASA servers while downloading large amounts of data. Be patient as the download will retry and continue once the service is made available again.

Daily-Pipeline

docker run --rm -v <mount_point>:/data <tag_name> daily-pipeline --envpth /data/<creds.yml> --date <target_date: YYYY.MM.DD>

daily-pipeline kicks off the entire process chain that will include performing the following per satellite (MODIS/VIIRS):

  • Build up directory structure and supporting files
  • Download raw granule files
  • Process raw granules to GTiff formats
  • Calculate snow coverage per watersheds/basin
  • Build KML of watersheds/basins
  • Clean up intermediate files

Build Directory Structure

Builds necessary supporting files and directories in order for the process pipeline to properly manage file I/O.

docker run --rm -v <mount_point>:/data <tag_name> build

High Level Directory Structure:

  • /data
    • /analysis
    • /basins
    • /intermediate_kml
    • /intermediate_tif
    • /kml
    • /modis-terra
    • /output_tif
    • /plot
    • /watersheds

Download

MODIS requires 5 or 8 days in order to build a composite of valid data. Option to download one day is possible. VIIRS will download a single day as it is a cloud-gap-filled product.

docker run --rm -v <mount_point>:/data <tag_name> download --envpth /data/<creds.yml> --sat <modis/viirs> --date <YYYY.MM.DD> --days <1/5/8>

Output:

  • Raw granules: modis-terra/<product>/<date>/.
    • MODIS: MOD10A1.006
    • VIIRS: VNP10A1F.001

Process

MODIS requires 5 or 8 days in order to build a composite of valid data. Default value is 5 days.

docker run --rm -v <mount_point>:/data <tag_name> process --sat <modis/viirs> --date <YYYY.MM.DD> --days <1/5/8>

Output:

  • Clipped watershed/basin GTiff: <watershed/basin>/<name>/<satellite>/<date>/.
    • EPSG:4326 -> needed for KML
    • EPSG:3153 -> BC Albers projection GTiff

Caclulate Snow Coverage

Analyze each watershed and basin to calculate the snow coverage based on the NDSI value.

docker run --rm -v <mount_point>:/data <tag_name> run-analysis --typ <watersheds/basins> -sat <modis/viirs> --date <YYYY.MM.DD>

Output:

  • SQLITE3 database: analysis/analysis.db

Database To CSV

Convert the SQLITE3 database into a CSV

docker run --rm -v <mount_point>:/data <tag_name> dbtocsv

Output:

  • CSV: analysis/analysis.csv

Build KMLs and Colour Ramped GTiffs

Build the colour-ramp GTiff and KML versions of the watersheds/basins.

docker run --rm -v <mount_point>:/data <tag_name> build-kml --date <YYYY.MM.DD> --typ <watersheds/basins> --sat <modis/viirs>

Output :

  • colourized GTiffs: <watersheds/basins>/<name>/<satellite>/<date>/.
  • KML : kml/<date>/

Compose KMLs

Compose built KML files into a heirarchal KML

docker run --rm -v <mount_point>:/data <tag_name> compose-kmls --date <YYYY.MM.DD> --sat <modis/viirs>

Zip KMLs

ZIP KMLs into a ZIP file

docker run --rm -v <mount_point>:/data <tag_name> zip-kmls

KNOWN ISSUE: ZIP file is larger than original KMLs -- deprecated

Plot

Plot all watersheds and basins into PNG plots with mapped colour bar.

docker run --rm -v <mount_point>:/data <tag_name> plot --date <YYYY.MM.DD> --sat <modis/viirs>

Clean

Manually clean up files and directories.

TARGET CAUSE
all Non-vital directories in /data
intermediate Intermediate files in intermediate_\[tif/kml\]
output Output files in output_tif
downloads Raw granules in modis-terra/
watersheds All files/dirs in watersheds/
basins All files/dirs in basins/
docker run --rm -v <mount_point>:/data <tag_name> clean --target <target>

Sentinel-2 Pipeline

Docker

Building Docker Image

To build the image from this directory:

docker build -t <tagname> .

Running Docker Image

To run the Docker image use the following schema:

docker run --rm -it -v <local store>:/data <tagname> <extra_commands>

It is necessary to run the docker container in interactive mode (by including the -it option when calling docker run) as the Sentinel-2 process requires user interaction.

Process-Sentinel

Call the Sentinel-2 pipeline

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/<creds.yml> --lat <latitude> --lng <longtitude> --date <YYYY.MM.DD>

The pipeline will return a date ordered list of 10

OPTIONAL ARGS VALUES DEFAULT
--rgb true / false false
--max-allowable-cloud int 50
--force-download true / false false
--day-tolerance int 50

Outputs are logged to a log file in /data/log/.

Argument Details:

  • Latitude and Longtitude are WSG84 float numbers (-180 <= lat/lng => +180).
  • --rgb true : Creates a RGB output GTiff of the selected tile
  • --max-allowable-cloud <int> : Max percentage of cloud cover that is allowable in the query. Default is 50%.
  • --force-download true : Deletes existing downloads in target directory
  • --day-tolerance <int> : Number of days to look back from target date with given cloud expections. Default is 50 days.

Example 0

Using default arguments to demonstrate expected output.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.12 --lng -126.5  --date 2021.03.18

0 : DATE: 2021-03-17 || CLOUD%:  45.588648
1 : DATE: 2021-03-14 || CLOUD%:  17.49127
2 : DATE: 2021-03-07 || CLOUD%:  39.499657
3 : DATE: 2021-02-05 || CLOUD%:  16.117952
4 : DATE: 2021-01-31 || CLOUD%:  23.281466
5 : DATE: 2021-01-28 || CLOUD%:  2.606739
Pick which product to download and process [0-5/n]:

Example 1

Calling with high cloud tolerance and demonstrate upper limit of selection and typical output.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.73 --lng -126.5  --date 2021.03.18 --rgb true --max-allowable-cloud 90

0 : DATE: 2021-03-17 || CLOUD%:  32.746968
1 : DATE: 2021-03-14 || CLOUD%:  87.006531
2 : DATE: 2021-03-09 || CLOUD%:  45.446411
3 : DATE: 2021-03-07 || CLOUD%:  74.669036
4 : DATE: 2021-02-25 || CLOUD%:  84.000973
5 : DATE: 2021-02-22 || CLOUD%:  79.523277
6 : DATE: 2021-02-17 || CLOUD%:  56.401063
7 : DATE: 2021-02-15 || CLOUD%:  74.093911
8 : DATE: 2021-02-10 || CLOUD%:  32.880395
9 : DATE: 2021-02-07 || CLOUD%:  75.293885
Pick which product to download and process [0-9/n]: 0

Downloading: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.09G/1.09G [03:30<00:00, 5.19MB/s]MD5 checksumming: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.09G/1.09G [00:04<00:00, 231MB/s]
0

Notice the clouds and water are masked out form the analysis.

Sentinel-2 Example 1 RGB Output Sentinel-2 Example 1 RGB Output

Example 2

Calling with low cloud tolerance to demonstrate limited selection.

docker run --rm -it -v <mount point>:/data <tag name> process-sentinel --creds /data/sat.yml --lat 49.12 --lng -126.5  --date 2021.03.18 --max-allowable-cloud 20 --max-allowable-cloud 20

0 : DATE: 2021-03-14 || CLOUD%:  17.49127
1 : DATE: 2021-02-05 || CLOUD%:  16.117952
2 : DATE: 2021-01-28 || CLOUD%:  2.606739
Pick which product to download and process [0-2/n]:

nr-rfc-processing's People

Contributors

bcgov-devops avatar bnjam avatar derekroberts avatar frantarkenton avatar renovate-bot avatar repo-mountie[bot] avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

jsuwala webgismd

nr-rfc-processing's Issues

Retrieve data provider secrets from env vars

Currently the script expects a yaml file to retrieve environment variables required to download information.

All secrets should be passed to the scripts via environment variables.

This ticket will re-work the code so the various parameters defined in the yaml file will be defined and retrieved through environment variables

Integrate All Steps with Object Storage

When the script is run want it to be able to pickup from previous failures.

  • With the download process after data is downloaded it is copied to object storage, immediately after
  • When the download process starts before it attempts to retrieve data from object storage first before going to its official sources
  • When processing starts, it determines what needs to be done based on what data exists in the object storage

Create Object Storage Utility Module

In order to be able to get the snow pack script runing again, we need to include the ability for the code to communicate with object store via the api, vs expecting the data to exist locally.

Code already exists to facilitate this task, however in order to use it properly it should be just declared like any other object store dependency.

This task will take the code from this repo (https://github.com/franTarkenton/nr-objectstore-util) and:

  • transfer to bcgov repo
  • add autobuild github action that will create the package in pypi
  • update documentation

Modify Daily pipeline so fills in missing data for a given time window

currently the script is setup to run on a daily basis where it runs using the current date.

Issues with this approach is the data is not always available for the current date. Sometimes there can be a 3-4 day lag before data becomes available.

Changes:

  • instead of just running a specific date, the script will check what data exists in the object store bucket vs the data that is currently available. It will then run the pipeline for consecutive days, untill al the data that is currently available has been processed.

Fix Viirs processing - load object store data as required

Similar to how the modis data is currently being handled, need to modify how the viirs processing is taking place

  • move the path config to the path config module
  • use the ostore module to load the object store data on an as needed basis.

Implement timeout on async functions

When the processing (dailypipeline) is run it sometimes hangs. This is an example of the logs from the script running in GHA..

2023-04-26T14:18:48.3178324Z 2023-04-26 07:18:48 - 172 - hatfieldcmr.earthdata - INFO - downloading file https://n5eil01u.ecs.nsidc.org/DP4/MOST/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T14:18:49.3100898Z 2023-04-26 07:18:49 - 186 - hatfieldcmr.earthdata - INFO - saving granule modis-terra/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T14:18:49.3102653Z 2023-04-26 07:18:49 - 192 - hatfieldcmr.earthdata - INFO - fetched and uploaded file modis-terra/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml from https://n5eil01u.ecs.nsidc.org/DP4/MOST/MOD10A1.061/2023.04.21/MOD10A1.A2023111.h20v01.061.2023113034230.hdf.xml
2023-04-26T20:08:12.1424245Z ##[error]The operation was canceled.
2023-04-26T20:08:12.1483953Z Post job cleanup.

Note the timestamps jump from 14:18:49 to 20:08:12, with no log messages in between suggesting that a process is stuck waiting.

Setting a 5 minutes timout would be more than reasonable for the process that is getting hung up.

Make script idempotent

When the script runs it should treat the object store as the source of truth. This task will:

  • break up the daily pipeline into smaller components.
    • each satellite will run as a separate pipeline,
    • each step in the satellite specific pipeline will be broken into smaller pieces and reflected in separate steps in the github action. Each of these steps should first check ostore for data and then pull locally, if its not available the go to source.
    • steps:
      • build aoi's - creating the basin / watershed directories locally and extract the data from the shapefiles
      • download - download the data - convert into two way download and then push to ostore
      • process - process the data - push changes as data is processed. Ideally make this async
      • calculate stats - research spike, look into how this works and best way to persist the results. (think its getting saved to a sql db)
      • daily_kml - drop this step for now as its not being used
      • plot - Again create and cache at same time

Document Steps to run Locally

There is a readme for how to run using docker... using both the dockerfile and the existing readme, create docs on how to setup a local dev environment for working on this code.

Get Hatfield snowpack running again

  • quick fix, whatever is required to make this work again

  • originally was running using geodrive, scheduled through jenkins. Because the files on object store have files that are named using mixed case, it means we cannot use the previous geodrive solution.

Description of Task

  • get script running through github actions
  • add logic to the code to pull the files it needs from object store when / if they are required (just in time data load)
  • refine the object store module so can be re-used by other projects easily
  • fix the logging config to use logging.config
  • Add the github actions to complete build / run of the code

Dependencies

Epic

Pull norm data on an as needed basis

when trying to compare 10y 20y norms, should pull the data from object storage on an as needed basis. Shouldn't expect the data to just be available on a mounted drive

Set Objects to PUBLIC / READ when uploaded

When uploading objects to object storage, set them to public/read permissions.

This ticket will also require creation of a script that will set all the various objects to public read also.

required in order to properly deploy the frontend as if the objects are not set to public read they will not show up in the view

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • chore(deps): update dependency pylint to v3
  • chore(deps): update github actions all dependencies (major) (actions/checkout, docker/build-push-action, docker/login-action, ubuntu)
  • 🔐 Create all rate-limited PRs at once 🔐

Pending Status Checks

These updates await pending status checks. To force their creation now, click the checkbox below.

  • chore(deps): update all non-major dependencies (black, continuumio/miniconda3, flake8, flake8-docstrings, mambaorg/micromamba, minio, pylint, pytest, python-cmr)

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile
Dockerfile
  • mambaorg/micromamba 1.4.9
dockerfile.hatfield
  • continuumio/miniconda3 4.9.2-alpine
  • ubuntu 20.04
github-actions
.github/workflows/pr_open.yaml
  • mathieudutour/github-tag-action v6.1
  • actions/checkout v3
  • docker/login-action v2
  • docker/build-push-action v4
  • ubuntu 22.04
  • ubuntu 22.04
  • ubuntu 22.04
.github/workflows/run_daily_pipeline_cron_mamba.yaml
  • actions/checkout v3
  • ubuntu 22.04
.github/workflows/run_pipeline_auto_fill_data.yaml
  • actions/checkout v3
  • mamba-org/setup-micromamba v1
  • ubuntu 20.04
.github/workflows/test_run_container_image.yaml
  • actions/checkout v3
  • docker/login-action v2
  • ubuntu 22.04
pip_requirements
hatfieldcmr/requirements.txt
  • scandir ==1.10.0
  • python-dotenv ==1.0.0
  • minio ==7.1.14
  • python-cmr ==0.7.0
requirements-dev.txt
  • flake8 ==6.0.0
  • black ==23.1.0
  • pytest ==7.3.1
requirements.txt
  • nr-objstore-util ==0.10.0
  • python-cmr ==0.7.0
snowpack_archive/requirements-dev.txt
  • pylint ==2.8.2
  • flake8 ==3.9.2
  • black ==21.5b1
  • flake8-docstrings ==1.6.0

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Rate-Limited

These updates are currently rate-limited. Click on a checkbox below to force their creation now.

  • chore(deps): update mathieudutour/github-tag-action action to v6.2
  • chore(deps): update dependency pytest to v8
  • chore(deps): update github actions all dependencies (major) (actions/checkout, docker/build-push-action, docker/login-action, ubuntu)
  • 🔐 Create all rate-limited PRs at once 🔐

Edited/Blocked

These updates have been manually edited so Renovate will no longer make changes. To discard all commits and start over, click on a checkbox.

  • chore(deps): update all non-major dependencies (continuumio/miniconda3, flake8, flake8-docstrings, mambaorg/micromamba, minio, pylint, python-cmr, python-dotenv)

Pending Status Checks

These updates await pending status checks. To force their creation now, click the checkbox below.

  • chore(deps): update dependency pytest to v7.4.4

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

dockerfile
Dockerfile
  • mambaorg/micromamba 1.4.9
dockerfile.hatfield
  • continuumio/miniconda3 4.9.2-alpine
  • ubuntu 20.04
github-actions
.github/workflows/pr_open.yaml
  • mathieudutour/github-tag-action v6.1
  • actions/checkout v3
  • docker/login-action v2
  • docker/build-push-action v4
  • ubuntu 22.04
  • ubuntu 22.04
  • ubuntu 22.04
.github/workflows/run_daily_pipeline_cron_mamba.yaml
  • actions/checkout v3
  • ubuntu 22.04
.github/workflows/run_pipeline_auto_fill_data.yaml
  • actions/checkout v3
  • mamba-org/setup-micromamba v1
  • ubuntu 20.04
.github/workflows/test_run_container_image.yaml
  • actions/checkout v3
  • docker/login-action v2
  • ubuntu 22.04
pip_requirements
hatfieldcmr/requirements.txt
  • scandir ==1.10.0
  • python-dotenv ==1.0.0
  • minio ==7.1.14
  • python-cmr ==0.7.0
requirements-dev.txt
  • flake8 ==6.0.0
  • black ==23.1.0
  • pytest ==7.3.1
requirements.txt
  • nr-objstore-util ==0.10.0
  • python-cmr ==0.7.0
snowpack_archive/requirements-dev.txt
  • pylint ==2.8.2
  • flake8 ==3.9.2
  • black ==21.5b1
  • flake8-docstrings ==1.6.0

  • Check this box to trigger a request for Renovate to run again on this repository

Speedup Object Storage Uploads (Async/Multiprocess)

Currently the upload to object storage works on a single thread. Could significantly reduce upload time by making the upload process async.

See example for how to do that here:
download_granules.py which uses the multiprocess module. Similar approach could be used for syncing to object storage.

Add project lifecycle badge

No Project Lifecycle Badge found in your readme!

Hello! I scanned your readme and could not find a project lifecycle badge. A project lifecycle badge will provide contributors to your project as well as other stakeholders (platform services, executive) insight into the lifecycle of your repository.

What is a Project Lifecycle Badge?

It is a simple image that neatly describes your project's stage in its lifecycle. More information can be found in the project lifecycle badges documentation.

What do I need to do?

I suggest you make a PR into your README.md and add a project lifecycle badge near the top where it is easy for your users to pick it up :). Once it is merged feel free to close this issue. I will not open up a new one :)

Processing: as new data files get generated, upload them to object storage

related to #65

The download step is now complete for both Viirs and Modis data. The next step is to rework how the data processing script step works.

This ticket will:

  1. modify the processing of the data so that immediately after new intermediate data is generated it will get copied up to object storage.
  2. The script will always check what data is in object storage first, and pull it down before generating locally.

Fix: swap downloader back to async

async code was commented out during debugging for the downloading of the satellite data. Swap this back to async using the thread executor if easy. Should speed up the downloads significantly.

Modis / Download and process step

Sub ticket of 31...

Rework the download of modis data.

  • downloads and then uploads to object store
  • check ostore first for data then source
  • configure persistence to ostore after downloaded from source.

relates to: #44

Detect Data Availability At Start of Pipeline

Currently the pipeline expects the remote VIIRS/MODIS data to be available when it runs. In the event that the data has not been posted, the script will fail with an error message that provides no indication that the reason for the failure is the missing data.

This ticket would add a step to the start of the snowpack script that will probe the remote data providers to verify that the data required to run the script is in fact available. If not then the script will automatically fail with an error message that clearly indicates why the failure took place.

Increase Script / Data reliability

At the moment the script is setup to run a single date. Sometimes this doesn't work as the modis data or the viirs data is not available when the script runs.

This epic will implement the following:

  • when the pipeline runs it will evaluate what data we already have and the data that is available and process based on that configuration
  • Separate the steps taken in the dailypipeline so that if one step fails it doesn't impact other unrelated scripts.
  • Reconfigure scripts so that they can automatically pickup where they last got to.
    • when data is downloaded immediately push it back to object storage.
    • before downloading data check to see if the data exists in ostore first
    • general improvements to try to make the code more readable. (use utility methods with descriptive names vs file name / date / data hacking using string manipulation regex's etc.) Methods should also provide clear description of what they do with examples

Dependencies

Epic

Add missing topics

TL;DR

Topics greatly improve the discoverability of repos; please add the short code from the table below to the topics of your repo so that ministries can use GitHub's search to find out what repos belong to them and other visitors can find useful content (and reuse it!).

Why Topic

In short order we'll add our 800th repo. This large number clearly demonstrates the success of using GitHub and our Open Source initiative. This huge success means it's critical that we work to make our content as discoverable as possible. Through discoverability, we promote code reuse across a large decentralized organization like the Government of British Columbia as well as allow ministries to find the repos they own.

What to do

Below is a table of abbreviation a.k.a short codes for each ministry; they're the ones used in all @gov.bc.ca email addresses. Please add the short codes of the ministry or organization that "owns" this repo as a topic.

add a topic

That's it, you're done!!!

How to use

Once topics are added, you can use them in GitHub's search. For example, enter something like org:bcgov topic:citz to find all the repos that belong to Citizens' Services. You can refine this search by adding key words specific to a subject you're interested in. To learn more about searching through repos check out GitHub's doc on searching.

Pro Tip 🤓

  • If your org is not in the list below, or the table contains errors, please create an issue here.

  • While you're doing this, add additional topics that would help someone searching for "something". These can be the language used javascript or R; something like opendata or data for data only repos; or any other key words that are useful.

  • Add a meaningful description to your repo. This is hugely valuable to people looking through our repositories.

  • If your application is live, add the production URL.

Ministry Short Codes

Short Code Organization Name
AEST Advanced Education, Skills & Training
AGRI Agriculture
ALC Agriculture Land Commission
AG Attorney General
MCF Children & Family Development
CITZ Citizens' Services
DBC Destination BC
EMBC Emergency Management BC
EAO Environmental Assessment Office
EDUC Education
EMPR Energy, Mines & Petroleum Resources
ENV Environment & Climate Change Strategy
FIN Finance
FLNR Forests, Lands, Natural Resource Operations & Rural Development
HLTH Health
IRR Indigenous Relations & Reconciliation
JEDC Jobs, Economic Development & Competitiveness
LBR Labour Policy & Legislation
LDB BC Liquor Distribution Branch
MMHA Mental Health & Addictions
MAH Municipal Affairs & Housing
BCPC Pension Corporation
PSA Public Service Agency
PSSG Public Safety and Solicitor General
SDPR Social Development & Poverty Reduction
TCA Tourism, Arts & Culture
TRAN Transportation & Infrastructure

NOTE See an error or omission? Please create an issue here to get it remedied.

Quick fix - adjust dates for snowpack script

The snowpack script is consistently failing due to data not being available. A longer term fix is describe in issue __.

This ticket will simply adjust the time delay to trigger the pipeline for data that is 6 days old vs 3. This is a quick short term fix to address data unavailability.

Object Store Cleanup

Currently there is a lot of duplication in the object storage bucket that is being used for snowpack analysis.

Some of this could be from the original hatfield dump of the data, others are due to a lack of understanding of the data structures when the original implementation of the automated runs were put into place.

Specific Items to evalute:

  • Check code, to see if we can figure out how the following folders are getting populated:

    • modis_terra
    • modis-terra
  • evaluate the archive process for the snowpack data, ie processes that populate data in the folder "snowpack_archive" This is where the daily runs of the comparisons for the snowpack analysis for a specific day vs historical norms are being stored.

  • suspect that the data in "snowpack_archive/modis-terra" should actually be going to just the folder "modis-terra"

Refactor / Revise how file Paths are handled

While there is some centralization of the file path configurations in the admin.constants.py module, code in the script is frequently building new paths from the constants provided in that module. Having the path manipulation code with business logic makes it difficult to read, and understand.

This ticket would create a path library that centralizes the calculation of paths and directories. Instead of code like:

    if sat == 'modis':
        mosaic = glob(os.path.join(const.INTERMEDIATE_TIF_MODIS, date, 'modis_composite*.tif'))[0]
    elif sat == 'viirs':
        mosaic = glob(os.path.join(const.OUTPUT_TIF_VIIRS, date.split('.')[0], f'{date}.tif'))[0]
    else:

Methods would be created to request the paths, example:

mosaic = snow_pathlib.get_intermediate_composite_tifs(sat='modis', date=date)

Long term objective is to make the code easier to understand / maintain.

Improve error messages when no granules available

The script retrieves granules for modis imagery, sometimes the data is not available and no data gets downloaded. When that happens and the next process that attempts to mosaic the granules, an error is generated.

The code should detect this condition and provide a more informative error message like

"no granules found for the current date range... "
or
"the granule directory for the date range (directory here) doesn't contain any information"

Create Data / Process Mapping Documentation

In order to break up the various processes understanding the relationship between the aspects of code and the data that gets produced will be a big help.

This task will attempt to break up all the various commands that can be sent to the run.py script, and the data that the various commands will produce.

Currently this is a list of the commands that run.py supports:

  • build
  • build-kml
  • clean
  • compose-kmls
  • daily-pipeline
  • dbtocsv
  • download
  • plot
  • process
  • process-sentinel
  • run-analysis

Capture Missing Data (Result of daily pipeline failures)

The daily pipeline script runs as the title would suggest... daily. One of the inputs to the script is the current date. When the script fails for a particular day due to data not being available it creates holes in the historical information.

In order to avoid the holes in the data, this ticket would create a script that evaluates a time window for holes in data, and then runs the daily-pipeline for the missing days.

Lets use common phrasing

TL;DR 🏎️

Teams are encouraged to favour modern inclusive phrasing both in their communication as well as in any source checked into their repositories. You'll find a table at the end of this text with preferred phrasing to socialize with your team.

Words Matter

We're aligning our development community to favour inclusive phrasing for common technical expressions. There is a table below that outlines the phrases that are being retired along with the preferred alternatives.

During your team scrum, technical meetings, documentation, the code you write, etc. use the inclusive phrasing from the table below. That's it - it really is that easy.

For the curious mind, the Public Service Agency (PSA) has published a guide describing how Words Matter in our daily communication. Its an insightful read and a good reminder to be curious and open minded.

What about the master branch?

The word "master" is not inherently bad or non-inclusive. For example people get a masters degree; become a master of their craft; or master a skill. It's generally when the word "master" is used along side the word "slave" that it becomes non-inclusive.

Some teams choose to use the word main for the default branch of a repo as opposed to the more commonly used master branch. While it's not required or recommended, your team is empowered to do what works for them. If you do rename the master branch consider using main so that we have consistency among the repos within our organization.

Preferred Phrasing

Non-Inclusive Inclusive
Whitelist => Allowlist
Blacklist => Denylist
Master / Slave => Leader / Follower; Primary / Standby; etc
Grandfathered => Legacy status
Sanity check => Quick check; Confidence check; etc
Dummy value => Placeholder value; Sample value; etc

Pro Tip 🤓

This list is not comprehensive. If you're aware of other outdated nomenclature please create an issue (PR preferred) with your suggestion.

Create GHA to run Daily pipeline

Create a GHA that will:

a) build the conda environment
b) populate repo with object storage secrets
c) creates the env.yaml file required for access to different servers to get MODIS/VIIRS data
d) runs the daily pipeline

Depends: #36

recalculate 10 year / 20 year historical norms

This has not been run since the data was originally provided by the hatfield team.

Task:

  • find and document the code that is required to regenerate the historical norm data.
  • re-run the calculations for 10y / 20y historical norms

relates: #89 - the code that calculates the historical norms likely is dependent on either the modis-terra or the modis_terra directories, and expects all the data to exist in one of those folders.

It's Been a While Since This Repository has Been Updated

This issue is a kind reminder that your repository has been inactive for 180 days. Some repositories are maintained in accordance with business requirements that infrequently change thus appearing inactive, and some repositories are inactive because they are unmaintained.

To help differentiate products that are unmaintained from products that do not require frequent maintenance, repomountie will open an issue whenever a repository has not been updated in 180 days.

  • If this product is being actively maintained, please close this issue.
  • If this repository isn't being actively maintained anymore, please archive this repository. Also, for bonus points, please add a dormant or retired life cycle badge.

Thank you for your help ensuring effective governance of our open-source ecosystem!

Resolve Snowpack Analysis not running

  • previously adjusted the run time in ticket: ____
  • the script for some reason continues to fail
  • get to the bottom of why it is continuing to fail and resolve

Separate hatfiledcmr egg into its own package

Currently the hatfield code is embedded into this project. To simplify the installation of this code recommend creating another repo with the hatfield code and building an automated build that will publish this code to pypi.

Will simplify the installation as we will no longer have to install a local egg package.

Assess current state of code

this task is ongoing... involves reviewing what's required to complete epic: #26

Also documents whats required to get the script running locally in WSL

Task will describe:

  • how to run the code locally
  • tickets for work that should be completed in order to enable code to run as GHA

Add Data Persistence Step

Create Persistence Script

When the script executes in a github action all the data generated will be discarded after the script runs.

This task will add code that will replicate the data created by the daily pipeline up to object storage.

Currently thinking this would be a process that is run after the dailypipeline is run

Posting Code

Hey @bsmith-hat and @jsuwala, just checking in to see if you guys can post your code here. Even if its not working / finished, being able to see it will help us with ideas around the best way to get it deployed in our architecture. Please let me know if there is anything I can do to facilitate getting code posted to this repo. Cheers

Create Sentinel Pipeline

There is a sentinel pipeline that is not currently being used.

This ticket would identify:

  • how is the sentinel data being used
  • Is the funtionality in the sentinel pipeline useful
  • if so, how can we automate a sentinel pipeline (does it need to be automated)

Update Logger / Dependencies

The last two scheduled runs for the snowpack analysis have failed.

Looks like they entire process gets hung up when trying to download files. Not sure why this is happening.

Log messages are using the root logger config. Set up new config that will help tie the log messages to the files which they originate from.

Also updated the build environment and the dependencies used by this script to latest versions available in conda-forge.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.