Giter VIP home page Giter VIP logo

ciroh-ua / ngiab-cloudinfra Goto Github PK

View Code? Open in Web Editor NEW

This project forked from alabamawaterinstitute/cloudinfra

11.0 3.0 15.0 3.3 MB

NextGen In A Box: NextGen Generation Water Modeling Framework for Community Release (Docker version)

Home Page: https://docs.ciroh.org/docs/products/nextgeninaboxDocker/

Shell 71.85% HCL 22.31% Dockerfile 5.84%
catchment nexus ngen realizations forcings t-route drought flooding nextgen

ngiab-cloudinfra's Introduction

NextGen In A Box (NGIAB)

Run the NextGen National Water Resources Modeling Framework locally with ease.

NGIAB provides a containerized and user-friendly solution for running the NextGen framework, allowing you to control inputs, configurations, and execution on your local machine.

alt text Funding for this project was provided by the National Oceanic & Atmospheric Administration (NOAA), awarded to the Cooperative Institute for Research to Operations in Hydrology (CIROH) through the NOAA Cooperative Agreement with The University of Alabama (NA22NWS4320003).

ARM Build and push final image X86 Build and push final image

Why NextGen In A Box?

  • Run NextGen Locally: Experiment with the framework and customize configurations on your local machine.
  • Control Over Inputs: Choose specific regions or basins for analysis and modify input data as needed.
  • Simplified Setup: Utilize Docker containers for effortless deployment, avoiding complex software installations.
  • Open Research Practices: Promote transparency and reproducibility through open-source tools like Git and GitHub.

Case Study: This repository demonstrates running NWM for Provo River Basin, UT

Repository Contents:

  • Dockerfile for running the NextGen Framework
  • Guide script (guide.sh) for easy execution
  • README with instructions and documentation

Getting Started

Prerequisites

Windows:

  1. Install WSL: Head over to Microsoft's official documentation and follow their comprehensive guide on installing WSL: https://learn.microsoft.com/en-us/windows/wsl/install
  2. Install Docker Desktop: Begin by downloading and installing Docker Desktop from the official website: https://docs.docker.com/desktop/install/windows-install/#install-docker-desktop-on-windows
  3. Start Docker Desktop: After installation, launch the Docker Desktop application.
  4. Open WSL as Admin: Right-click on the WSL icon and select "Run as Administrator".
  5. Verify Installation: In the WSL window, type the command docker ps -a to check if Docker is running correctly. This command should display a list of Docker containers.

Mac:

  1. Install Docker Desktop: Download and install Docker Desktop for Mac from: https://docs.docker.com/desktop/install/mac-install/
  2. Start Docker Desktop: Launch the Docker Desktop application once the installation is complete.
  3. Open Terminal: Open the Terminal application on your Mac.
  4. Verify Installation: Similar to Windows, use the command docker ps -a in the Terminal to verify Docker is functioning as expected.

Linux:

  1. Install Docker: The installation process for Linux varies depending on your distribution. Refer to the official documentation for detailed instructions: https://docs.docker.com/desktop/install/linux-install/
  2. Start Docker and Verify: Follow the same steps as described for Mac to start Docker and verify its installation using the docker ps -a command in the terminal.
  • Input Data:
    • Download Sample Data: Use the provided commands to download sample data for the Sipsey Fork case study.
    • To generate your own data: Refer to the NGIAB-datapreprocessor for instructions on generating custom input data.
    • To generate your own data and run using NGIAB: Refer to the ngen-datastream repository for instructions on generating custom input data.

This section guides you through downloading and preparing the sample input data for the NextGen In A Box project.

Step 1: Create Project Directory

  • Linux/Mac: Open your terminal and go to your desired folder where you want to checkout repo and ngen-data folder and run the following commands:
mkdir -p NextGen/ngen-data
cd NextGen/ngen-data
  • WSL (Right click and run as Admin): Open WSL with administrator privileges and execute:
cd /mnt/c/Users/<Folder>
mkdir -p NextGen/ngen-data
cd NextGen/ngen-data

Step 2: Download Sample Data

  • Linux/Mac/Windows WSL: Use wget to download the compressed data file:
wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-006/AWI_16_2853886_006.tar.gz

Step 3: Extract and Rename

  • All Platforms: Extract the downloaded file and optionally rename the folder:
tar -xf AWI_16_2853886_006.tar.gz

Below is Optional: Rename the folder

mv AWI_16_2853886_006 my_data

Now you have successfully downloaded and prepared the sample input data in the NextGen/ngen-data directory. Remember to replace "my_data" with your preferred folder name if you choose to rename it.

Case Study Map for the Provo River Basin, UT

AWI_16_2853886_006

Running NGIAB

  1. Clone the Repository: Go to the folder created earlier during step #1 above
cd NextGen
git clone https://github.com/CIROH-UA/NGIAB-CloudInfra.git
cd NGIAB-CloudInfra
  1. Run the Guide Script:
./guide.sh
  1. Follow the prompts:
    • Input Data Path: Enter the path to your downloaded or generated input data directory. (e.g NextGen/ngen-data/my_data)

    • Run Mode: Choose between parallel or serial execution based on your preferences. The script pulls the related image from the awiciroh DockerHub, based on the local machine's architecture:

      For Mac with apple silicon (arm architecture), it pulls awiciroh/ciroh-ngen-image:latest.
      For x86 machines, it pulls awiciroh/ciroh-ngen-image:latest-x86.
      

      Example NGEN run command for parallel mode:

      mpirun -n 10 /dmod/bin/ngen-parallel ./config/wb-2853886_subset.gpkg all ./config/wb-2853886_subset.gpkg all ./config/realization.json /ngen/ngen/data/partitions_10.json

      Example NGEN run command for serial mode:

      /dmod/bin/ngen-serial ./config/wb-2853886_subset.gpkg all ./config/wb-2853886_subset.gpkg all ./config/realization.json
    • Select Files (automatically): Script selects specific catchment, nexus, and realization files based on input data.

    • After the model has finished running, the script prompts the user whether they want to continue.

    • If the user selects 1, the script opens an interactive shell.

    • If the user selects 2, then the script exits.

Output:

  • Model outputs will be saved in the outputs folder within your input data directory. (e.g '.../NextGen/ngen-data/my_data/')

After the guide.sh is finished, the user can decide to use the Tethys Platform for visualization of the outputs (nexus and catchments). The script will pull the latest image of the Ngiab visualizer tethys app. It will also spin a GeoServer container in order to visualize the catchments layers (due to the size of the layer, this layer is visualized as with WMS service)

Your NGEN run command is mpirun -n 8 /dmod/bin/ngen-parallel ./config/wb-2853886_subset.gpkg all ./config/wb-2853886_subset.gpkg all ./config/realization.json /ngen/ngen/data/partitions_8.json
If your model didn't run, or encountered an error, try checking the Forcings paths in the Realizations file you selected.
Do you want to redirect command output to /dev/null? (y/N, default: n):
y
Redirecting output to /dev/null.

real    0m44.057s
user    3m59.398s
sys     0m7.809s
Finished executing command successfully.
Would you like to continue?
Select an option (type a number):
1) Interactive-Shell
2) Exit
#? 2
Have a nice day.
3 new outputs created.
Any copied files can be found here: /home/ubuntu/AWI_16_2853886_006/outputs
Visualize outputs using the Tethys Platform (https://www.tethysplatform.org/)? (y/N, default: y):

How to run NGIAB Visualizer?

If you have previous runs that you would like to use, you can also visualize them running the ./ViewOnTethys.sh script.

(base) [ubuntu@ubuntu NGIAB-CloudInfra]$ ./viewOnTethys.sh
Last used data directory path: /home/ubuntu/AWI_16_2853886_006
Do you want to use the same path? (Y/n): Y
Visualize outputs using the Tethys Platform (https://www.tethysplatform.org/)? (y/N, default: y):
y
Setup Tethys Portal image...

Output of the model guide script

The output files are copied to the outputs folder in the '/NextGen/ngen-data/my_data/' directory you created in the first step

If the Tethys Platform is used to visualize the outputs after the guide.sh, or if the viewOnTethys.sh script is used, you can expect to see geospatial and time series visualization of the catchments and nexus points:

Geopatial Visualization 1715704416415 Nexus Time Series 1715704437369 Catchments Time Series 1715704450639

Additional Resources:

ngiab-cloudinfra's People

Contributors

arpita0911patel avatar arpitapatel09 avatar benlee0423 avatar hariteja-jajula avatar hellkite500 avatar jameshalgren avatar jordanlasergit avatar joshcu avatar manjilasingh avatar manjirigunaji avatar mgdenno avatar romer8 avatar sepehrkrz avatar shahab122 avatar zacharywills avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

ngiab-cloudinfra's Issues

Consider merging (for CI efficiency) with ngen-singularity

The NGEN-Singularity repository has permitted the development of an "HPC-Friendly" version of NGIAB. There could be some efficiencies obtained by merging that repository with this one.

https://github.com/CIROH-UA/Ngen-Singularity

CRITICAL
We need to make sure that we are a little over cautious with the initial testing on this to make sure that whatever we set up for automated CI (e.g., github runners, AWS, etc.) is producing artifacts that actually work on at least a few of our target environments (e.g., Pantarhei, Wukong, UAHPC, etc.)

Google Big Query - Baseflow Machine Learning Model - BYU

1. Requester Information:
This should include the name and contact information of the person making the request.

Norm Jones
Brigham Young University
[email protected]
801-422-7569

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

We are working on this project:

CIROH: Advancing Science to Better Characterize Drought and Groundwater-Driven Low-Flow Conditions in NOAA and USGS National-Scale Models

While most of the CIROH research is focused on dealing with extreme precipitation and flood events, low flow conditions resulting from extended dry periods can impact critical operations such as municipal water supply. The focus of our project, which is jointly funded by the USGS and NOAA, is to develop machine learning tools for accurately predicting low flow conditions in US streams. 

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are developing a Python library with a suite of tools for digitally filtering baseflow from hydrographs at stream gages. We would to leverage Google Big Query which is now hosting both USGS streamflow gage data and NWM streamflow forecasts, resulting in over 100 billion records. We will analyze these data to identify periods in the streamflow records corresponding to baseflow-only conditions and then develop machine learning methods that use a combination of remote sensing and groundwater monitoring well data as input to predict baseflow at these periods. Our long-term goal is to develop baseflow prediction tools that could be integrated with the NexGen model.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Following our prototype project, we would like to use:

USGS stream gage data
NWM retrospective forecast data
USGS groundwater level data
Remote sensing data (GRACE grids, GLDAS grids, etc - in netCDF format)

we will label periods in the stream gage hydrographs that correspond to baseflow only (BFO) and then develop machine learning models that use well data and remote sensing data as features to predict baseflow flowrates (Q) at the BFO time periods. We would like to attempt this using the SQL and Google Colab interfaces to Google Big Query.

Options:

  1. Cloud Provider: AWS/Azure/GCP

GCP

  1. Required Services in the Cloud:

    List of GCP Services

  • Google Compute Engine
  • Google Kubernetes Engine (GKE)
  • Google Cloud Storage
  • Google VPC
  • Google IAM
  • Google BigQuery
  • Google Cloud Functions
  • Dataflow
  • Other: please list

Google Big Query for sure. We could use some help in determining if some of these other resources are needed.

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

Starting in 1-2 months and then continuing over the course of our project (6/1/2023 - 6/1/2025)

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

???

Could use some guidance here.

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Request to use Sagemaker for Project 8: Enhancing Water Supply Forecasting for Systems Management - UA

1. Requester Information:

Savalan Naser Neisary ([email protected])
Ryan Johnson ([email protected])
Md Shahabul Alam ([email protected])

2. Project Information:

The goal of this project is to enhance NWM predictions downstream of the Great Salt Lake (GSL) watershed. We need to run machine learning models that require GPUs and cloud computing (Sagemaker and S3 buckets) with preinstalled Pytorch and Tensorflow packages.

3. Project Description:

The scripts and software we plan to develop gets different variables and NWM predictions and post-process them using ML models.

4. Resource Requirements:

  1. Development interface for Python code, which also has access to the internet, such as Sagemaker or Jupyter Notebook.

  2. GPUs (I don't know how many).

  3. S3 bucket.

  4. Preinstalled Pytorch libraries.

  5. Required Services in the Cloud:

    List of AWS Services

  • S3 – public, private, requester pay, bucket name suggestion?

5. Timeline:

We need it ASAP, and we'll use it for the coming next 2 years.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

AWS EC2 / S3 Request - University of Iowa

1. Requester Information:
Name: Ibrahim Demir
Position: Associate Professor
Institution: University of Iowa
Contact: +1 (319) 335-5780, [email protected]

2. Project Information:
Our project aims to leverage advanced AI and multimodal models to enhance the research, education, and operation capabilities for stakeholders of the hydrological domain. We will be conducting research on perception-language tasks, multimodal dialogue, image captioning, visual question answering, and long-term conversation memory. The project will handle a variety of modalities such as raster images, videos, 3D scenes and animations, and maps. The cloud credits will significantly support the research, testing, and fine-tuning of our AI models, contributing to the shared infrastructure by providing advanced analytics capabilities and enriching hydrological research.

3. Project Description:
We plan to develop software that uses AI models like Multimodal-GPT for advanced use cases and LLama-v2 for purpose-specific mini-models. The software will be used to analyze, process, and interpret hydrological data, aiding in the development of more effective and efficient hydrological solutions.

4. Resource Requirements:

  1. Cloud Provider: AWS

  2. Required Services in the Cloud:

  • EC2 – To host the server and run our AI models
  • S3 – To store our large datasets

5. Timeline:
We are planning to start the training and testing process immediately. The resources will be needed throughout the project duration, with the most computational resources needed in the first three months, and deployment and testing for various research and operational use cases during the project timeline (2 years).

6. Security and Compliance Requirements:
The project will adhere to the University of Iowa's security and compliance guidelines, including ensuring that any sensitive data is securely stored and transported. All AWS services will be configured to comply with these guidelines.

7. Estimation:
We (roughly) estimate the cost of the project to be approximately $10,000 (annually) for the duration of the project (2 years), based on the AWS Cost Calculator. If possible, we would like to utilize the p4d.24xlarge instance, particularly for multimodal endeavors. We expect to process less than 10 Gb data.

AWS Cost Calculator: https://calculator.aws/#/

8. Approval:
Contact PI Ibrahim Demir for questions.

NGIAB throwing error while running with AWI_002

Current behavior

Seeing below error while trying to run the NGIAB using latest image:

At declaration of smc_profile size, soil_reservoir.n_soil_layers = 0
terminate called after throwing an instance of ‘realization::ConfigurationException’
what(): Multi BMI formulation cannot be created from config: cannot find available data provider to satisfy set of deferred provisions for nested module at index 1: {ice_fraction_xinanjiang}
./HelloNGEN.sh: line 116: 14 Aborted $run_command

Expected behavior

Run successfully

Steps to replicate behavior (include URLs)

  1. Run NGIAB using AWI_002 data.

Screenshots

parallel mode run is failing

While using the latest image in guide.sh script getting below error:

Enter the hydrofabric catchment file path: /ngen/ngen/data/config/catchments.geojson
/ngen/ngen/data/config/catchments.geojson selected
Enter the hydrofabric nexus file path: /ngen/ngen/data/config/nexus.geojson
/ngen/ngen/data/config/nexus.geojson selected
Enter the Realization file path: /ngen/ngen/data/config/awi_simplified_realization.json
/ngen/ngen/data/config/awi_simplified_realization.json selected
./HelloNGEN.sh: line 23: /dmod/bin/partitionGenerator: No such file or directory

Test issue

Short description explaining the high-level reason for the new issue.

Current behavior

This is test

Expected behavior

This is test

Steps to replicate behavior (include URLs)

This is test

Screenshots

This is test

Forcing path override

Problem: Users frequently run into problems properly setting paths within the realization file while using NGIAB.

Solution: Assuming a standard directory structure is enforced as outlined here: #17 , NGIAB should override the forcing path parameter with which ever is finds in the data directory. Same for the forcing file pattern. These two parameters are useful for ngen in other contexts, but it's a source of annoyance for NGIAB users.

Ngen build getting an BMI error

Current behavior

Ngen build getting an error.

3725
#18 255.9 [ 68%] Building CXX object test/CMakeFiles/test_bmi_multi.dir/realizations/catchments/Bmi_Cpp_Multi_Array_Test.cpp.o
3726
#18 258.0 gmake[2]: *** [CMakeFiles/ngen.dir/build.make:76: CMakeFiles/ngen.dir/src/NGen.cpp.o] Error 1
3727
#18 258.0 gmake[1]: *** [CMakeFiles/Makefile2:498: CMakeFiles/ngen.dir/all] Error 2
3728
#18 258.0 gmake[1]: *** Waiting for unfinished jobs....

Expected behavior

Build without error

Steps to replicate behavior (include URLs)

  1. Disable the line 156 in Docker.ngen file.
    156 # && ./build_sub extern/test_bmi_cpp

Recent failed action can be found here.
https://github.com/CIROH-UA/NGIAB-CloudInfra/actions/runs/6949391875

Getting the same error in github runner and locally.

Quality of Life improvements to guide.sh and HelloGen

During testing I've been repeatedly running the scripts and I think they could benefit from the following

  • Remembering the data directory stops you needing to enter the data directory on repeated runs
  • Enabling path auto completion when entering file paths
  • Removing .parquet files that stop ngen running along with output files
  • Auto selecting the first catchment nexus and realization file
  • Adding the option to pull an image or use a local one (this one may be more useful for dev)

CIROH Apps Portal AWS Resources - BYU

Same as previously submitted : AlabamaWaterInstitute#35

  1. Requester Information:
    Dan Ames, [email protected]
    Nathan Swain, [email protected]

  2. Project Information:
    This request is for the Apps Portal project. We need resources to stand up a kubernetes cluster on AWS to support 2 or 3 domains such as apps.ciroh.orgwhere we will host web applications for data visualization and analysis.

  3. Resource Requirements:
    The following resource estimates are for a Tethys Portal Manager, a serverless application that helps us manage the App Portals. Only one instance of this will be required and will be able to manage multiple App Portals.

Tethys Portal Manager:

S3 – public, bucket name suggestion: tpm-static, <1GB per month
EBS (Amazon Elastic Block Store): 4 x 500 GB
VPC (Virtual Private Cloud): 1
DynamoDB: 1 x standard table class, 1 GB standard table size with backup
Lambda: x86 architecture, 513 MB ephemeral storage, 100,000 per month (free tier)
Static IP: 1
The following estimates are for a single App Portal. Eventually we would like to host multiple App Portals with similar resource requirements (e.g. staging and production).

App Portal:

EKS (Kubernetes Cluster): 1
EC2: 4 x c5.2xlarge
EBS (Amazon Elastic Block Store): 4 x 500 GB
VPC (Virtual Private Cloud): 1
Static IP: 1
4. Timeline:
Resources needed March 10, 2023. Web sites and apps are expected to exist in perpetuity with CIROH.

  1. Security and Compliance Requirements:
    None

  2. Budget:
    This is experimental so we don't know exactly what resources we will need. Our goal is to monitor the cost of these resources over several months to give us an idea of what the cost will be in coming years.

According to the AWS Pricing Calculator, the estimated cost for the resources described above is as of 3/10/2023 is:

Tethys Portal Manager: $106.40 /mo., $1,276.80 /yr.
1 x App Portal: $1,383.75 /mo., $16,605.00 /yr.

  1. Approval:
    TBD

USU CIROH Computer Vision Project Request

1. Requester Information:
This should include the name and contact information of the person making the request.

Name: Jeff Horsburgh
Institution: Utah State University
Position: Associate Professor
Email: [email protected]

Details: We are requesting that the following individuals be given access to USU's existing subaccount:
Jeff Horsburgh: [email protected] (project Co-PI)
Sierra Young: [email protected] (project PI)
Sajan Neupane: [email protected] (project MS student)
Razin Issa: [email protected] (project MS student)
Safran Khan: [email protected] (project PhD student)

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

Project Title: CIROH: Advancing Camera-Based Monitoring for Operational Hydrologic Applications
Project Summary: https://ciroh.ua.edu/research-projects/advancing-camera-based-monitoring-for-operational-hydrologic-applications/
Project Goals: The overall goal of this project is to demonstrate how operational requirements for integrating low-cost cameras and computing infrastructure into existing hydrologic monitoring networks can be met, along with evaluating the benefits of cameras for continuous monitoring and prediction.
Project Benefits: We seek to develop serverless cloud workflows/pipelines for processing imagery collected at stream gaging sites. Information extracted from images will provide realtime, hydrologically relevant information - e.g., stream width, depth, velocity to augment available data from realtime USGS streamflow gages and will make more data and information available at gage locations for use in NextGen modeling efforts.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

We plan to develop serverless cloud workflows/pipelines for processing imagery collected at stream gaging sites. Code will generally be written using Python, but will involve a variety of scripts designed to execute different parts of the image processing workflow (e.g., image capture on a datalogger, transfer to cloud storage, image preprocessing, model execution, results generation, and logging hydrologic variables to a data store).

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  1. Cloud Provider:

For now we are testing AWS. We will eventually want to compare results across AWS and GCP to examine potential tradeoffs. We can make a separate request for access to GCP when you have that better set up.

  1. Required Services in the Cloud:

List of AWS Services we anticipate using

  • Amazon S3
    • Each image we capture is about 2 MB for now based on current image resolution
    • May experiment with different image resolutions
    • Volume Estimate: 100 MB of image data per day per site. We have to sites installed already and plan to install 1-2 more
  • AWS Lambda
    • Using AWS Lambda functions to do our image processing
    • Invoked to process each image, but could do bursts of images (multiple images processed at a time to reduce invocations)
    • Using docker containers to create lambda functions
  • AWS RDS
    • Used for recording data values extracted from images like water depth and stream width
    • Likely using some flavor of PostgreSQL
  • AWS ECR (Elastic Container Registry)
    • May require up to 3 GB
    • Using this because data transfer between ECR and lambda functions is faster
    • Using this to host the container images that define lambda functions
  • AWS CloudWatch
    • This is used for logging of activities
    • Will use this a lot in the beginning while testing, but probably used less eventually
  • AWS Sagemaker
    • This would be used for training deep learning models for image segmentation

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We have already done quite a bit of prototyping using AWS on a USU account. We are ready to move parts of our workflow to the UA account and will continue our prototyping there. We anticipate using resources through the end of our project, which is a 3-year CIROH project funded starting June 1, 2023.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

We don't currently have any specific security or compliance requirements for our development and testing. The image data we are collecting are not sensitive and have no restrictions on access or release. All of our source code and development is open, so no concerns there.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

We anticipate that the image storage will be relatively inexpensive in S3, and the rate of image data collected is not overly frequent (e.g., one new image per site every 15-30 minutes). We will incur additional costs with executing lambda functions and storage of the container images that define our lambda functions. We want to try doing image segmentation model training using Sagemaker, but that will be a periodic thing that we won't be doing all of the time. We anticipate that overall costs should be less than $100 - $150 per month (given that my whole AWS bill right now for everything that we are doing at USU across our projects is less than that).

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Please contact me with any questions.

Jeff Horsburgh: [email protected]

National Snow Model - temp EC2 request

1. Requester Information:
This should include the name and contact information of the person making the request.

Joshua Christensen
[email protected]
"Josh Christensen" on slack
8013613299

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources.

National Snow Model - Snow Covered Area enhancement

3. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  • EC2
  • S3 – public, private, requester pay, bucket name suggestion?
  • EBS (Amazon Elastic Block Store)
  • EFS
  • RDS
  • VPC (Virtual Private Cloud)
  • DynamoDB
  • ECS
  • EKS (Kubernetes Cluster)
  • Lambda
  • Others: please list

EC2 - just a small server that can continuously stream data from a NASA download into the National Snow Model s3 bucket. Preferably Ohio region to avoid extra transfer

4. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

Will be completely done by July 28, but likely earlier

5. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

None

6. Budget:
Include any budget constraints or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

Something small and cheap will do great. Just needs an internet connection and to be able to run something along the lines of: curl <any curl parameters as fit> | aws s3 cp - <your s3 bucket/folder> --expected-size <any max size in byte in case the file is larger than 50GB>

7. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

I have no clue, I'm an REU student. But my advisor is Ryan Johnson and I've discussed it with Arpita

Google Big Query - Proof of Concept - BYU

1. Requester Information:
This should include the name and contact information of the person making the request.

Norm Jones
Brigham Young University
[email protected]
801-422-7569

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

We are working on this project:

CIROH: Advancing Science to Better Characterize Drought and Groundwater-Driven Low-Flow Conditions in NOAA and USGS National-Scale Models

While most of the CIROH research is focused on dealing with extreme precipitation and flood events, low flow conditions resulting from extended dry periods can impact critical operations such as municipal water supply. The focus of our project, which is jointly funded by the USGS and NOAA, is to develop machine learning tools for accurately predicting low flow conditions in US streams. 

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are developing a Python library with a suite of tools for digitally filtering baseflow from hydrographs at stream gages. We would to leverage Google Big Query which is now hosting both USGS streamflow gage data and NWM streamflow forecasts, resulting in over 100 billion records. We will analyze these data to identify periods in the streamflow records corresponding to baseflow-only conditions and then develop machine learning methods that use a combination of remote sensing and groundwater monitoring well data as input to predict baseflow at these periods. Our long-term goal is to develop baseflow prediction tools that could be integrated with the NexGen model.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

We have participated on a Google Big Query workshop taught by Kel Markert. In that workshop, he demonstrated how the platform is hosting US stream, Gage, and national water model retrospective forecast data. He showed how to use python code to perform operations on the data and how to run SQL queries on the data. Initially, we will want to run code that processes the streamflow data to labeled baseflow only as described above. We also want to import groundwater level data from the USGS, and then compare baseflow trends at stream gauges to water level trends in nearby monitoring wells to find correlations and see if groundwater trends are impacting, baseflow nationwide.

Options:

  1. Cloud Provider: AWS/Azure/GCP

GCP

  1. Required Services in the Cloud:

    List of GCP Services

  • Google Compute Engine
  • Google Kubernetes Engine (GKE)
  • Google Cloud Storage
  • Google VPC
  • Google IAM
  • Google BigQuery
  • Google Cloud Functions
  • Dataflow
  • Other: please list

Google Big Query for sure. We could use some help in determining if some of these other resources are needed.

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We would like to test and prototype this over the next 1-2 months.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

???

Could use some guidance here.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Water Prediction Node token authentication

Requester Information:

Dylan Lee, [email protected]

Project Information:

This request is for the Water Prediction Node. The Water Prediction Node is creating an experimental catalog that will have restricted access for data that the National Water Center isn't ready to make available to the general public yet. The authentication solution has been created using AWS and will issue temporary tokens to whitelisted domains (for example: "noaa.gov" or "ua.edu") or specific whitelisted emails. The solution stores temporary tokens in a dynamoDB table and tokens are used to grant access to a data stored in a private S3 bucket that is accessible by a cloudfront distribution that is triggering the token validation.

A working demo of the project has been created using cloudformation and can be found at: https://github.com/dylanlee/wpn-auth

Resource Requirements:

  • S3: private, bucket name suggestion: wpn-exp-cat. Bucket size: 0-10 Tb.
  • DynamoDB: 1 x standard table class, 1 GB standard table size with backup.
  • Lambda: x86 architecture, 513 MB ephemeral storage, 100,000 per month (free tier). The current lambdas use ~128 MB of memory or less and usually execute in under 300 ms.
  • API gateway: There will be one API with POST and OPTIONS methods enabled.
  • Cloudfront: 1 cloudfront distribution with a single origin and a default behavior that triggers the lambda authenticator when a viewer makes a request.
  • SES: needed to send tokens to whitelisted emails. SES needs to be setup manually and I want to be able to send emails from the waternode.ciroh.org domain.

Timeline:

Ideally this would be implemented by the end of March, 2024

Security and Compliance Requirements:

None

Budget:

These resources should be tagged Water Node and will be paid for by the Water Node budget.

Approval:

The Water Node approves the cost of these resources so the main concern is any possible issues integrating the existing demo into CIROH's AWS account.

Following steps in the README.md does file not result in successful run

Short description explaining the high-level reason for the new issue.

Current behavior

The NGEN run following the instructions in the main README.md of the main branch (commit 5e1afa1) results in failure:

NGen Framework 0.1.0
Building Nexus collection
Building Catchment collection
Config file details - Line Count: 27 | Max Line Length 46
Config Value - Param: 'forcing_file' | Value: 'BMI' | Units: '(null)'
Config Value - Param: 'surface_partitioning_scheme' | Value: 'Schaake' | Units: '(null)'
Config Value - Param: 'soil_params.depth' | Value: '2.0' | Units: 'm'
Config Value - Param: 'soil_params.b' | Value: '8.93396282196045' | Units: ''
Config Value - Param: 'soil_params.satdk' | Value: '3.19069084890877e-05' | Units: 'm s-1'
Config Value - Param: 'soil_params.satpsi' | Value: '3.98730560956446' | Units: 'm'
Config Value - Param: 'soil_params.slop' | Value: '0.057029859113015' | Units: 'm/m'
Config Value - Param: 'soil_params.smcmax' | Value: '0.401686143900526' | Units: 'm/m'
Config Value - Param: 'soil_params.wltsmc' | Value: '0.048334490431746' | Units: 'm/m'
Config Value - Param: 'soil_params.expon' | Value: '1.0' | Units: ''
Config Value - Param: 'soil_params.expon_secondary' | Value: '1.0' | Units: ''
Config Value - Param: 'refkdt' | Value: '3.72730851635058' | Units: '(null)'
Config Value - Param: 'max_gw_storage' | Value: '0.016' | Units: 'm'
Config Value - Param: 'Cgw' | Value: '0.0018' | Units: 'm h-1'
Config Value - Param: 'expon' | Value: '6.0' | Units: ''
Config Value - Param: 'gw_storage' | Value: '0.05' | Units: 'm/m'
Config Value - Param: 'alpha_fc' | Value: '0.33' | Units: '(null)'
Config Value - Param: 'soil_storage' | Value: '0.05' | Units: 'm/m'
Config Value - Param: 'K_nash' | Value: '0.03' | Units: ''
Config Value - Param: 'K_lf' | Value: '0.01' | Units: ''
Config Value - Param: 'nash_storage' | Value: '0.0,0.0' | Units: '(null)'
Config Value - Param: 'num_timesteps' | Value: '1' | Units: '(null)'
Config Value - Param: 'verbosity' | Value: '1' | Units: '(null)'
Config Value - Param: 'DEBUG' | Value: '0' | Units: '(null)'
Config Value - Param: 'giuh_ordinates' | Value: '1.00,0.00' | Units: '(null)'
Found configured GIUH ordinate values ('1.00,0.00')
Config Value - Param: '' | Value: '(null)' | Units: '(null)'
Config Value - Param: '' | Value: '(null)' | Units: '(null)'
Schaake Magic Constant calculated
All CFE config params present
GIUH ordinates string value found in config ('1.00,0.00')
Counted number of GIUH ordinates (2)
Finished function parsing CFE config
At declaration of smc_profile size, soil_reservoir.n_soil_layers = 0
terminate called after throwing an instance of 'realization::ConfigurationException'
  what():  Multi BMI formulation cannot be created from config: cannot find available data provider to satisfy set of deferred provisions for nested module at index 1: {ice_fraction_xinan}
./HelloNGEN.sh: line 76:    13 Aborted                 (core dumped) /dmod/bin/ngen-serial $n1 all $n2 all $n3

Expected behavior

An output that indicates success run.

Steps to replicate behavior (include URLs)

In an Ubuntu 22.04 environment with regular laptop (16 GB Ram, AMD quad-core processor)

  1. Run the set-up commands as described in the README.md:
    $ mkdir -p NextGen/ngen-dat
    $ cd NextGen/ngen-data
    $ wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-003/AWI_03W_113060_003.tar.gz
    $ tar -xf AWI_03W_113060_003.tar.gz
    $ mv AWI_03W_113060_003 my_data
    $ cd ../..
    $ git clone https://github.com/CIROH-UA/NGIAB-CloudInfra.git
    $ cd NGIAB-CloudInfra/
    $ git checkout main
  2. Run ./guide.sh
  3. When asked: Enter your input data directory path (use absolute path):,
    Provide: /.../NextGen/ngen-data/my_data
  4. When asked: "Select an option (type a number):"
    Provide: 1 ("Run NextGen Model using local docker image");
  5. When asked again: "Select an option (type a number):"
    Provide: 1 ("Run NextGen model framework in serial mode");
  6. When asked to enter the input files (catchment, nexus, realization)
    Provide, respectively:
    /ngen/ngen/data/config/catchments.geojson;
    /ngen/ngen/data/config/nexus.geojson;
    /ngen/ngen/data/config/awi_simplified_realization.json;
  7. The output described in "Current behavior" is given.

(the run also results in error when trying to run in parallel mode - step 5)

My opinion

I was able to run successfully an up-to-date version of the NGIAB using a set of input files that I've downloaded in 2023-Dec-13.

So maybe the input files indicated in the instructions are outdated? Or there is some inconsistency in the indicated input files?

Generating forcing for NextGen

Hi Everyone, I am working on the cfe+noahowpmodular configuration. I am facing problems with creating forcings for my subset domain. I am using the available notebook code (https://github.com/CUAHSI/notebooks/blob/main/ngen/ngen-create-cfe-forcing.ipynb) to create forcing which works only for small datasets. It works if I want to make 1 to 10 days of forcing. I have run the code to create 6 months of forcing on high memory nodes (32 cores) but it couldn't complete even in 24 hours. I am looking for some other ways of creating forcing from netcdf files as I have 5 years of continuous simulation to run. There are around 8000 catchments for my subset domain.

Pantarhei access request to deploy SWE-ML models

1. Requester Information:
This should include the name and contact information of the person making the request.

Karnesh Jain – [email protected] - University of Alabama

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

Looking to find a place for SWE-ML model to live. It will run daily and save the output to AWS S3 storage. Currently, it is run on a local machine by a user. We want it to be deployed on Pantarhei and run it on daily basis automatically.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

Current version of SWE-ML model and workflow have been developed in Python.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  1. HPC or VM

  2. vCPU

  3. Memory

  4. Disk Space

  • Pantarhei Access – It will be used to run SWE-ML model on daily basis.

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We will begin using the resources right away.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No.

7. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Add GitHub Actions Badges to the repo.

Short description explaining the high-level reason for the new issue.

Current behavior

Expected behavior

Steps to replicate behavior (include URLs)

Screenshots

Machine Learning training resource - Pantarhei

@jmframe
Will you fill out this information with respect to the restoration of the Pantarhei cluster? This form is usually used for CIROH members to document computing resource needs -- we have already started to move forward with Pantarhei but it would help to build this back in.

Add a 'RUN' wrapper to capture all inputs a level above 'realization'

To make Ngen-based simulations more reproducible, we have been discussing a wrapper and hash system that would track exactly what inputs, configuration, and environment are used for a given canonical simulation using the ngen framework. We propose calling these individual simulations and the associated master configuration file a 'RUN'.

An initial PR with a basic demonstration is in development -- see here for prior effort in this direction.

AWS Resource Request - HydroServer - USU

1. Requester Information:
This should include the name and contact information of the person making the request.

Name: Ken Lippold
Institution: Utah State University
Position: Software Engineer
Email: [email protected]

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

Project title:
CIROH: Modernized Standards and Tools for Sharing and Integrating
Real-time Hydrologic Observations Data

Project summary:
https://ciroh.ua.edu/research-projects/modernized-standards-and-tools-for-sharing-and-integrating-real-time-hydrologic-observations-data/

The purpose of this deployment is to demo and test the current version of HydroServer and receive feedback from the community to identify issues that need to be addressed and discuss ideas for additional features we may want to add.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are developing several components related to this project which can be found here: https://github.com/hydroserver2

Specifically, the components we want to deploy are found in the hydroserver-webapp-front and hydroserver-webapp-back repositories and are the core of the HydroServer development. Our frontend HydroServer website is built in Vue 3 and will be deployed via Cloudfront/S3, and our backend HydroServer data management and SensorThings APIs are built with Django and will be deployed via Elastic Beanstalk.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  1. Cloud Provider: AWS

  2. Required Services in the Cloud:

  • Cloudfront: Used as a CDN for the HydroServer website and as a reverse proxy to route traffic between the website and Django APIs. We use one distribution per deployment.
  • Elastic Beanstalk: Used to host the HydroServer Django application, which includes the data management and SensorThings APIs. We use one environment per deployment.
  • S3: Used to store website static files and user uploaded photos. We use three buckets per deployment, all of which are private and publicly accessible only through Cloudfront. One bucket is used for the website static files, another is used for user resource storage, such as photos, and the third is used for hosting downloadable desktop applications such as our Streaming Data Loader tool.
  • SES: Used primarily for user email verification and password reset. We had our current account removed from sandbox mode to avoid needing to manually whitelist individual email accounts.
  • EC2: Mostly used by Elastic Beanstalk, but we also need to configure some additional security settings directly through the EC2 dashboard.

We're using a service called Timescale Cloud to manage our main database deployment since Amazon RDS doesn't support timescale for postgresql. We're planning to continue using our own account for that in the meantime, but we can discuss other options if needed; I don't know if CIROH is able to manage any services outside the ones listed.

Detailed AWS deployment instructions can be found here: https://hydroserver2.github.io/hydroserver/deployment/aws-deployment.html

This documentation is still a work in progress as we find issues and improve our deployment process.

5. Timeline:
We currently have a test instance of HydroServer deployed on a separate AWS account and have documented the deployment steps. We can deploy an instance under the CIROH account as soon as the resources are ready, but we're flexible with the timeframe since we have an existing deployment we can use in the meantime.

6. Security and Compliance Requirements:
We don't currently have any special security or compliance requirements for this testing deployment. Initially, we anticipate this deployment will be primarily used for demos and testing. I can provide additional details about our existing IAM security policies and roles if needed, but I assume they'll need to be narrower under CIROH's account.

7. Estimation:
Based on our current AWS bills, we expect the monthly cost to be under $50, depending on how much traffic the demo site gets or if we decide to create additional deployments.

8. Approval:
Jeff Horsburgh: [email protected]

AWS / Azure Resources for Model Assimilation Testbed - Calgary

Request for Cloud Resources

1. Requester Information

Name: David Casson
Title: PhD Candidate, University of Calgary
Email: [email protected]


2. Project Information

Project Title:

  • Developing and Benchmarking Data Assimilation Methods on a Standardized Testbed

Brief Description:

  • A part of the project will focus on the implementation of a SUMMA modelling framework for ensemble data assimilation. The cloud resources will be used for data preprocessing, model simulation and analysis, as well as running data assimilation experiments.

Goals:

  • Assess improvements in streamflow prediction that can be achieved through ensemble data assimilation methods of available snow information.

Benefit

  • This project will provide a basis for simulation and evaluation in the cloud resources, and explore the how model creation and analysis can build on available datasets.

3. Project Description

Software and Tools:

  • SUMMA Modelling Framework Software
  • Python custom scripting (using repositories developed by collaborators)
  • Jupyter Notebooks and Snakemake for workflow management
  • Use of NetCDF operators (NCO) and Climate Data Operators (CDO)
  • Transfer of meteorological and geospatial data from HPC datastores

Workflow:
Development will progress from meteorological data processing, data assimilation, hydrological model simulation, to forecast verification.


4. Resource Requirements

General Requirements:

  • 32 cores
  • CPU processing
  • Storage: 500 GB

Specific Capabilities:

  • Linux environment
  • Fortran compiler
  • Python installation
  • IDE (e.g., Visual Studio Code)

Cloud Provider Options: AWS, Azure, GCP
Preferred Providers: AWS or Azure
Required Services in the Cloud: Not known at this stage.


5. Timeline

Project Duration: November 2023 to August 2025
Note: An initial allocation will ideally grow as the simulations move to ensembles and larger geographic domains.


6. Security and Compliance Requirements

  • No specific security or compliance requirements known.

7. Estimation

Cost Estimation: Cost estimates are speculative at this stage due to a lack of experience with these cloud resources. It is requested to start with a modest allocation for initial learning and interaction with the resources.

8. Approval

Approval Process: TBD

GPU Cluster - Wukong and Pantarhei

Moving ticket from AlabamaWaterInstitute#34

  1. Requester Information:
    Chaopeng Shen, [email protected]

  2. Project Information:
    Phase 1:
    A. Improving the integration of ML with physically-based hydrologic and routing modeling via large-scale parameter and structure learning schemes
    Prospective phase 2 projects:
    B. Developing and benchmarking data assimilation methods on a standardized testbed
    C. Pathways to using multi-model mosaics for operational hydrologic prediction
    D. CIROH: ML-based Flexible Flood Inundation Mapping and Intercomparison Framework
    Currently, three students can use the system. At this time next year, maybe 4-5 students from PSU can be using these resources if they are available. We of course also have local resources so it's not all on the UA side.

Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources.
All projects involve training large-scale machine learning models on GPUs and CPUs. 3 projects (A,B,C) have similar data flow characteristics (high throughput GPU jobs). One project (D)'s requirement will be different as it is more likely solving PDEs on the GPU.

  1. Resource Requirements:
    Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.
    CPU core:GPU ratio>=4:1 is preferred so we maximize the GPU efficiency (CPU is feeding data and addressing framework overhead). A100s is the preferred GPU (80 GB or 40 GB, 80 GB is preferred but they are pricey).
    I describe the growth stages in a box below,

Storage: a little hard to predict now, we can start thinking about a standard compute node with 2 TB system drive and 30 TB storage.

If not too many people are using the GPUs, a queue-less system is preferred. I understand the need for the queue as the number of users grow.

The above is based on my group's use.

Options:

EC2
S3 – public, private, requester pay, bucket name suggestion?
EBS (Amazon Elastic Block Store)
EFS
RDS
VPC (Virtual Private Cloud)
DynamoDB
ECS
EKS (Kubernetes Cluster)
Lambda
Others: please list
I provide a quote I download from Lambda. They are a bit pricey -- there are cheaper vendors. I also don't mean we need to get these specifications. I just use it a starting point for discussion.
https://pennstateoffice365-my.sharepoint.com/:b:/g/personal/cxs1024_psu_edu/Eb0Tf_tu6r1CsKxq2oP6Zz4BGVyszjz0OTUCyOCLopF8vg?e=RQPzGR

  1. Timeline:
    Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.
    The point is we can grow the system gradually and adjust to demand, but the systems need to be future proof (primarily about the networking and GPUs).
    In the very near term (0-3 months), our jobs will be run either independently on each GPU or (more likely) distributed data parallel, so networking is not super important. A 4-A100 system could help out significantly. We would appreciate if there is NvLink between GPUs allowing some parallelism. If the 4 A100s can be made available to us, it can already go a long way. I suggest going that way and see how we adapt to it, and see where needs are down the road.
    In the mid-term (3 months~1 year), there is a decent likelihood we can make use of 8-GPU node, but let's wait for confirmation. As new people come into the project, there will also be more demand just to run more jobs.
    In the long-term (>1 year), I see a future where we will be implementing model parallelism (different parts of the model resides on different GPUs) on a few dozens of GPUs for large jobs. This would require fast interconnect: (i) NvLink between GPUs and (ii) faster connect (Infiniband). This system can be grown gradually ---- we can add nodes as demand rises, but we need the networking components to be in. This is an emergent need and the future demand is a bit uncertain now, so it is recommended we take a "low-regret" approach to be future proof yet not invest too much at once.
    To be future-proof an ideal system include 8-GPU nodes with nvlink and infiniband ready so some nodes can be connected later. The cost of nvlink and infiniband are not excessive so some planning ahead can allow the system to be grown on.

  2. Security and Compliance Requirements:
    If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.
    N/A

  3. Budget:
    Include any budget constraints or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

I did not budget computing cost in my budget.

  1. Approval:
    Indicate the necessary approval processes or sign-offs required for the request.

Restarting CFE simulation within NextGen

Is your feature request related to a problem? Please describe.

The problem is to implement data assimilation (DA) within NextGen. To properly implement DA, there's a need to restart the model at a specific time. This means the model should provide the state variables at every time step as output and at the same time it must be able to accept the state variables as inputs.

To provide an example, assume you run the model from 2000 to 2022. Now if you want to get the streamflow for the year 2023, one option is to run the model all over again from 2000 to 2023. The other option is to simply run it just for the year 2023. However, to properly make this simulation, the model should accept the initial state variables from where we stopped the model (12/31/2022) Otherwise, the result for the year 2023 will be significantly different for these two scenarios.

Describe the solution you'd like
Currently, NextGen only provides Q-out (streamflow) for each catchment in a cat-id.csv file. We need that nextgen to output a new CSV file (it should be a human-readable file that is usually called a restart file) where the state variable values are reported there. In addition, the restart file should be used as an additional input file for running the NextGen. In summary, NextGen needs to provide state variables and also use those state variables as input at each time step. This functionality is doable by using BMI get and set variable features but it needs to be applied within NextGen.

Improve Guide.sh error handling

Ran into some difficulty when getting started due to errors being relatively unhelpful- There doesn't seem to be any significant validation on path / file inputs before running commands, so any "file not found" errors appear a fair bit into the execution, and on non-descriptive line numbers.

Ideally there's some method to check the input file paths before beginning to run the overall program, and give feedback on the validity of the inputs, and stopping execution if necessary.
I.e. if the input dataset is missing a forcing file, catching that before spending a few minutes generating catchment data and such would be very useful.
On further inspection, the "errors" are being printed, but don't stand out or stop execution, so are easy to miss.

Definitely not a high priority, as it's primarily a barrier to entry for inexperienced users, but this difficulty could be mentioned somewhere in documentation, or errors in general could have terminal coloring applied.

image
As mentioned, the relevant errors don't really stand out, especially when other things are colorful.

Modify Guide.sh to be single interface for Docker or Singularity runs of NGIAB

For UI efficiency, what if we added a --singularity option to the guide.sh to trigger execution using singularity containers instead of docker?

That option would come in somewhere here:

NGIAB-CloudInfra/guide.sh

Lines 130 to 182 in 81c053c

# File discovery
echo -e "\nLooking in the provided directory gives us:"
find_files() {
local path=$1
local name=$2
local color=$3
local files=$(find "$path" -iname "*$name*.*")
echo -e "${color}Found these $name files:${Color_Off}"
echo "$files" || echo "No $name files found."
}
find_files "$HOST_DATA_PATH" "catchment" "$UGreen"
find_files "$HOST_DATA_PATH" "nexus" "$UGreen"
find_files "$HOST_DATA_PATH" "realization" "$UGreen"
# Detect Arch and Docker
echo -e "\nDetected ISA = $(uname -a)"
if docker --version ; then
echo "Docker found"
else
echo "Docker not found"
fi
if uname -a | grep arm64 || uname -a | grep aarch64 ; then
IMAGE_NAME=awiciroh/ciroh-ngen-image:latest
else
IMAGE_NAME=awiciroh/ciroh-ngen-image:latest-x86
fi
# Model run options
echo -e "${UYellow}Select an option (type a number): ${Color_Off}"
options=("Run NextGen Model using local docker image" "Run Nextgen using remote docker image" "Exit")
select option in "${options[@]}"; do
case $option in
"Run NextGen Model using local docker image")
echo "running the model"
break
;;
"Run Nextgen using remote docker image")
echo "pulling container and running the model"
docker pull $IMAGE_NAME
break
;;
Exit)
echo "Have a nice day!"
exit 0
;;
*) echo "Invalid option $REPLY, 1 to continue and 2 to exit"
;;
esac
done

CFE output files (cat-*.csv) results are all zero

Hi everyone, I have run CFE using NGIAB for the Sipsey River basin (VPU # 03W) in Alabama. As I was checking the model outputs (e.g., "cat-117770.csv"), I found all values in the"cat-117770.csv" file to be zero. This is the case for all catchment csv files. I used "latest-86" image as I was running CFE within NGIAB on Black Warrior. I am not sure what went wrong as I followed the steps mentioned here: https://github.com/CIROH-UA/NGIAB-CloudInfra.

Screenshot of results with troute on:
image

Screenshot of results with troute off:
image

Integrated Evaluation System Prototype for Testing Research and Operational Advancements

NOTE: THESE RESOURCES ARE ALREADY IN USE.

1. Requester Information:
Matthew Denno
RTI International
[email protected]

2. Project Information:
Our project is the "Integrated Evaluation System Prototype for Testing Research and Operational Advancements". The goal of this subproject is to define and develop a prototype evaluation system and interface to support CIROH’s varied evaluation-related research objectives, including model validation, forecast verification, and forecast product evaluation across multiple scales and datatypes. The design of the system components (e.g., data management structure, evaluation methods, and user interface) itself will be a research contribution as the optimal design to support the level of flexibility and scalability needed is an open research question. Once developed, the system will be used to execute research and advance the state of practice in many areas related to hydrologic model and forecast evaluation. This subproject contributes directly to CIROH Research Theme 1 (Operational Water Prediction Systems), by supporting improvements in models and methods used for operational forecasting.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.
We are developing quite a lot of software for this project to support data ingest and processing, storage and retrieval, and visualization. Determining the full scope of what is needed is part of the project. To date, we have developed a Python library to support multiple evaluation use cases and are utilizing this library to ingest and analyze hydrologic data and build Jupyter Notebooks, REST API(s) and Web Applications that all work in concert to enable visualization.

4. Resource Requirements:
This project has already started and already has cloud resources allocated - no changes are necessary. We will likely explore a wide range of services offered by AWS, but currently we are hosting a JupyterHub instance in EKS that serves as the backbone of the evaluation system. We will continue to build on and experiment with adding additional components to the cluster (such as REST API's and dashboards) to further the research and continue to engage with OWP staff (OWP are currently accessing the resources as users).

List of AWS Services

  • EC2
  • S3
  • EBS (Amazon Elastic Block Store)
  • EFS
  • RDS (not now, but maybe)
  • VPC (Virtual Private Cloud)
  • ECS
  • EKS (Kubernetes Cluster)

5. Timeline:
This project is a Year 1 and Year 2 project and is currently on-going. If it continues to be funded into future years the resources needs will likely continue.

6. Security and Compliance Requirements:
None

7. Estimation:
The monthly cost is running somewhere between $400-$600 depending on use level in a given month.

8. Approval:
I don't know what goes here. These resources are already in use.

start and stop Github Action runner with lambda

To address the issue of GitHub Actions getting queued until someone manually starts the AWS VM that is used for the runner, we need to implement an automatic start and stop of the VM using Lambda. This will ensure that once a PR is submitted, we do not have to wait for someone to manually start the VM.

AWS Resource request to deploy SWE-ML models

1. Requester Information:
This should include the name and contact information of the person making the request.

Karnesh Jain – [email protected] - University of Alabama

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.

Looking to find a place for SWE-ML model to live. It will run daily and save the output to AWS S3 storage. Currently, it is run on a local machine by a user. We want it to be deployed on AWS EC2 and run it on daily basis automatically.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

Current version of SWE-ML model and workflow have been developed in Python.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  1. Cloud Provider: AWS/Azure/GCP

  2. Required Services in the Cloud:

    List of AWS Services

  • EC2

  • S3 – public, private, requester pay, bucket name suggestion?

  • EBS (Amazon Elastic Block Store)

  • EFS

  • RDS

  • VPC (Virtual Private Cloud)

  • DynamoDB

  • ECS

  • EKS (Kubernetes Cluster)

  • Lambda

  • Others: please list

    List of Azure Services

  • Virtual Machines

  • Azure App Service

  • Azure Kubernetes Service (AKS)

  • Azure Functions

  • Azure Batch

  • Azure Blob Storage

  • Azure File Storage

  • Azure Machine Learning

  • Azure Key Vault

  • Other: please list

    List of GCP Services

  • Google Compute Engine

  • Google Kubernetes Engine (GKE)

  • Google Cloud Storage

  • Google VPC

  • Google IAM

  • Google BigQuery

  • Google Cloud Functions

  • Dataflow

  • Other: please list

List of AWS Services

  • EC2 – It will be used to run ML models on daily basis.
  • Sagemaker – It is needed to run the Jupyter notebooks of SWE-ML.

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We will begin using the resources right away.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Sagemaker for P1 Snow Modeling Project - UA

1. Requester Information:
This should include the name and contact information of the person making the request.

Ryan Johnson - [email protected] - University of Alabama

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will benefit from and/or provide benefit to other resources on the shared infrastructure.

Looking to connect my AWS S3 storage with Sagemaker for my research group working on the national-snow-model bucket. We are currently pushing/pulling data from S3 to our 2i2c workspace and would like to bypass this by integrating AWS services. We anticipate the seamless AWS integration to expedite workflow development and access to GPUs supporting machine learning model R&D. As the projects are entering a semi-matured state, we anticipate a more seamless transition to operational testing by using AWS EC2 after our rigorous development stage, eta end of 2024.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.

We are looking to advance the snow module components within the NWM. We will be developing ML models and the respective ML workflows in Python. We will need to be able to create persistent working environments to manage package versions.

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

Options:

  1. Cloud Provider: AWS/Azure/GCP
    AWS

  2. Required Services in the Cloud:

    List of AWS Services

  • EC2 - not yet but plan to use this as a resource in the future
  • S3 – public is fine and we are currently using the national-snow-model bucket. I think we are set up for 1 TB
  • Sagemaker. I do not know the options but something on par with 2i2c (CPUs/Memory) would be more than acceptable, especially with the addition of GPUs to enhance ML model development.

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

We could begin using the resources now and for the next 2 years.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

No.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

$300/month

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Arpita Patel, James Halgren, Steven Burian

Amazon EC2 request - Stevens Institute of Technology

1. Requester Information:
Dr. Marouane Temimi
Associate Professor
Department of Civil, Environmental, and Ocean Engineering (CEOE)
Tel:+12012165303
Mail to: [email protected]

2. Project Information:
The project aims to introduce an innovative, automated deep learning-based approach for near real-time satellite monitoring of river ice conditions within the northern watersheds of the United States and Canada. This technique harnesses high-resolution imagery from the VIIRS bands onboard the NOAA-20 and NPP satellites and utilizes the U-Net deep learning algorithm for the semantic segmentation of images, even under challenging conditions such as varying cloud cover and land surface variations.

3. Project Description:
The project involves the development of software and scripts in multiple programming languages, including Bash, MATLAB, Python, and the Google Earth Engine.

4. Resource Requirements:
CPUs: 30
Memory: 64 Gb
Network Bandwidth: at least 1Gbps
EBS bandwidth: at least 1Gbps
Disk size: at least 300Gb
OS: Linux (Ubuntu)
Root access required

Options:

  1. Cloud Provider: AWS

  2. Required Services in the Cloud:

    List of AWS Services

  • EC2

  • Amazon CloudWatch or Amazon Managed Grafana

    List of Azure Services

  • Virtual Machines

  • Azure File Storage

  • Azure Machine Learning

5. Timeline:
The project requires resources as soon as possible because the system is operational for Alaska, and it is currently winter, which is a critical period for river ice monitoring. The project is expected to run for a minimum of one year.

6. Security and Compliance Requirements:
No security and compliance requirements

7. Estimation:
Monthly: 624.15/Month for 1 year or 475.96/Month for 3 years

8. Approval:
Contact Dr. Marouane Temimi (contact info above)

Ngen now requires boost 1.79

Upstream changes for ngen now requires a minimum boost version of 1.79

Current behavior

Current dockerfiles use boost 1.72 as the minimum version

Expected behavior

Dockerfiles need to use boost 1.79

x86 build issue with latest code

While trying to build the x86 image locally on windows machine using below commands , ngen-deps and troute images are created successfully but ngen image build is failing with below error.

docker buildx build --tag awiciroh/ngen-deps:latest-x86-test -f Dockerfile.ngen-deps .
docker push awiciroh/ngen-deps:latest-x86-test

docker buildx build --tag awiciroh/t-route:latest-x86-test -f Dockerfile.t-route .
docker push awiciroh/t-route:latest-x86-test

docker buildx build --tag awiciroh/ngen:latest-x86-test -f Dockerfile.ngen .
docker push awiciroh/ngen:latest-x86-test

docker buildx build --tag awiciroh/ciroh-ngen-image:latest-x86-test -f Dockerfile .
docker push awiciroh/ciroh-ngen-image:latest-x86-test

Current behavior

docker buildx build --tag awiciroh/ngen:latest-x86-test -f Dockerfile.ngen .
[+] Building 0.5s (15/16)
=> [internal] load build definition from Dockerfile.ngen 0.0s
=> => transferring dockerfile: 9.00kB 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/awiciroh/t-route:latest 0.0s
=> [internal] load metadata for docker.io/awiciroh/ngen-deps:latest 0.0s
=> FROM docker.io/awiciroh/t-route:latest 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 79B 0.0s
=> [rocky_build_ngen 1/8] FROM docker.io/awiciroh/ngen-deps:latest 0.0s
=> CACHED [rocky_init_repo 2/3] WORKDIR /ngen 0.0s
=> CACHED [rocky_init_repo 3/3] RUN cd /ngen && if [ "x$COMMIT" != "x" ]; then git clone --single-branch --branch 0.0s
=> CACHED [rocky_build_ngen 2/8] COPY --chown= --from=rocky_init_repo /ngen/ngen /ngen/ngen 0.0s
=> CACHED [rocky_build_ngen 3/8] COPY --chown= --from=awiciroh/t-route:latest /ngen/t-route/wheels /tmp/t-route-wheels 0.0s
=> CACHED [rocky_build_ngen 4/8] COPY --chown= --from=awiciroh/t-route:latest /ngen/t-route/requirements.txt /tmp/t-route-req 0.0s
=> CACHED [rocky_build_ngen 5/8] RUN if [ "ON" == "ON" ]; then chgrp -R ${USER} /usr/local/lib*/python3.* ; c 0.0s
=> CACHED [rocky_build_ngen 6/8] COPY fix_io_sub_7551590a415b89026559c1c570d4154e4746161b.patch /ngen/ngen/fix_io_sub.patch 0.0s
=> ERROR [rocky_build_ngen 7/8] RUN cd /ngen/ngen && git apply --reject --whitespace=fix fix_io_sub.patch 0.4s


[rocky_build_ngen 7/8] RUN cd /ngen/ngen && git apply --reject --whitespace=fix fix_io_sub.patch:
#0 0.392 Checking patch extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_aorc.c...
#0 0.392 error: while searching for:
#0 0.392 return -1;?
#0 0.392 }?
#0 0.392 int seen_non_whitespace = 0;?
#0 0.392 char c;?
#0 0.392 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.392 // keep track if this line has seen any char other than space or tab?
#0 0.392 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.392
#0 0.392 error: patch failed: extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_aorc.c:480
#0 0.392 Checking patch extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_pet.c...
#0 0.392 error: while searching for:
#0 0.392 return -1;?
#0 0.392 }?
#0 0.392 int seen_non_whitespace = 0;?
#0 0.392 char c;?
#0 0.392 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.392 // keep track if this line has seen any char other than space or tab?
#0 0.392 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.392
#0 0.392 error: patch failed: extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_pet.c:497
#0 0.392 Checking patch extern/cfe/cfe/forcing_code/src/bmi_aorc.c...
#0 0.393 error: while searching for:
#0 0.393 return -1;?
#0 0.393 }?
#0 0.393 int seen_non_whitespace = 0;?
#0 0.393 char c;?
#0 0.393 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.393 // keep track if this line has seen any char other than space or tab?
#0 0.393 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.393
#0 0.393 error: patch failed: extern/cfe/cfe/forcing_code/src/bmi_aorc.c:480
#0 0.393 Checking patch extern/cfe/cfe/forcing_code/src/bmi_pet.c...
#0 0.393 error: while searching for:
#0 0.393 return -1;?
#0 0.393 }?
#0 0.393 int seen_non_whitespace = 0;?
#0 0.393 char c;?
#0 0.393 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.393 // keep track if this line has seen any char other than space or tab?
#0 0.393 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.393
#0 0.393 error: patch failed: extern/cfe/cfe/forcing_code/src/bmi_pet.c:497
#0 0.393 Checking patch extern/cfe/cfe/src/bmi_cfe.c...
#0 0.394 error: while searching for:
#0 0.394 return -1;?
#0 0.394 }?
#0 0.394 int seen_non_whitespace = 0;?
#0 0.394 char c;?
#0 0.394 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.394 // keep track if this line has seen any char other than space or tab?
#0 0.394 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.394
#0 0.394 error: patch failed: extern/cfe/cfe/src/bmi_cfe.c:2814
#0 0.394 Checking patch extern/evapotranspiration/evapotranspiration/forcing_code/src/bmi_aorc.c...
#0 0.395 error: while searching for:
#0 0.395 return -1;?
#0 0.395 }?
#0 0.395 int seen_non_whitespace = 0;?
#0 0.395 char c;?
#0 0.395 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.395 // keep track if this line has seen any char other than space or tab?
#0 0.395 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.395
#0 0.395 error: patch failed: extern/evapotranspiration/evapotranspiration/forcing_code/src/bmi_aorc.c:480
#0 0.395 Checking patch extern/evapotranspiration/evapotranspiration/src/bmi_pet.c...
#0 0.396 error: while searching for:
#0 0.396 return -1;?
#0 0.396 }?
#0 0.396 int seen_non_whitespace = 0;?
#0 0.396 char c;?
#0 0.396 for (c = fgetc(fp); c != EOF; c = fgetc(fp)) {?
#0 0.396 // keep track if this line has seen any char other than space or tab?
#0 0.396 if (c != ' ' && c != '\t' && c != '\n')?
#0 0.396
#0 0.396 error: patch failed: extern/evapotranspiration/evapotranspiration/src/bmi_pet.c:510
#0 0.397 Applying patch extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_aorc.c with 1 reject...
#0 0.397 Rejected hunk #1.
#0 0.398 Applying patch extern/SoilFreezeThaw/SoilFreezeThaw/forcing_code/src/bmi_pet.c with 1 reject...
#0 0.398 Rejected hunk #1.
#0 0.398 Applying patch extern/cfe/cfe/forcing_code/src/bmi_aorc.c with 1 reject...
#0 0.398 Rejected hunk #1.
#0 0.398 Applying patch extern/cfe/cfe/forcing_code/src/bmi_pet.c with 1 reject...
#0 0.398 Rejected hunk #1.
#0 0.398 Applying patch extern/cfe/cfe/src/bmi_cfe.c with 1 reject...
#0 0.398 Rejected hunk #1.
#0 0.399 Applying patch extern/evapotranspiration/evapotranspiration/forcing_code/src/bmi_aorc.c with 1 reject...
#0 0.399 Rejected hunk #1.
#0 0.399 Applying patch extern/evapotranspiration/evapotranspiration/src/bmi_pet.c with 1 reject...
#0 0.399 Rejected hunk #1.


Dockerfile.ngen:98

97 | # Apply the IO fix to submodules, once they all get patched/merged, this can be dropped...
98 | >>> RUN cd ${WORKDIR}/ngen && git apply --reject --whitespace=fix
99 | >>> #patch the submodules
100 | >>> fix_io_sub.patch
101 |

ERROR: failed to solve: process "/bin/sh -c cd ${WORKDIR}/ngen && git apply --reject --whitespace=fix fix_io_sub.patch" did not complete successfully: exit code: 1

Segmentation Fault Issue while running the latest image created using four Dockerfiles (arm)

Steps to reproduce the issue:

  1. Built the image using four Dockerfiles and tagged it as latest as below:

docker build --platform linux/arm64 --tag awiciroh/ngen-deps:latest -f Dockerfile.ngen-deps .
docker build --platform linux/arm64 --tag awiciroh/t-route:latest -f Dockerfile.t-route .
docker build --platform linux/arm64 --tag awiciroh/ngen:latest -f Dockerfile.ngen .
docker build --platform linux/arm64 --tag awiciroh/ciroh-ngen-image:latest -f Dockerfile .

docker push awiciroh/ngen-deps:latest
docker push awiciroh/t-route:latest
docker push awiciroh/ngen:latest
docker push awiciroh/ciroh-ngen-image:latest

  1. Input data used: our sample data on S3 bucket: $ wget --no-parent https://ciroh-ua-ngen-data.s3.us-east-2.amazonaws.com/AWI-001/AWI_03W_113060_001.tar.gz

  2. Updated guide.sh to use the latest image.

  3. While trying to run the guide.sh script on Mac laptop, its pulling latest image but seeing below error:

image

PR submitted using forked repos are failing

Current behavior

The current behavior of the CI pipeline is that it fails when someone uses a forked repository to submit a pull request due to DockerHub secrets not being available.

Error message at:
https://github.com/CIROH-UA/NGIAB-CloudInfra/actions/runs/7558470899
Run echo "" | docker login --username awiciroh --password-stdin
echo "" | docker login --username awiciroh --password-stdin
shell: /usr/bin/bash --noprofile --norc -e -o pipefail {0}
env:
TAG_NAME: merge
Error: Cannot perform an interactive login from a non TTY device

Expected behavior

CI pipeline should run successfully for forked repositories as well as pull requests submitted using branches.

Steps to replicate behavior (include URLs)

Submit a PR using a forked repo.

Screenshots

Temporary storage for data transfer

1. Requester Information:
This should include the name and contact information of the person making the request.
Chaopeng Shen
The Pennsylvania State University
Email: [email protected]
Office: 814-863-5844

2. Project Information:
Provide a brief description of the project and its goals. This can help the infrastructure team understand the context and purpose of the requested resources. Please highlight how this project will be benefit from and/or provide benefit to other resources on the shared infrastructure.
This request is for projects at the Cooperative Institute for Research to Operations in Hydrology (CIROH), founded by NOAA and USGS. The projects' goals are to develop large-scale differentiable models for hydrological modeling. The developed models can be the potential next-generation national water model, and the generated products can also benefit other related research in this field.

3. Project Description:
If your project involves developing software or scripts, briefly describe the software you plan to develop.
This project aims to develop PyTorch-supported differentiable models

4. Resource Requirements:
Specify the compute, storage, and network resources needed for the project. Be as specific as possible about the number of resources required, and any specific configurations or capabilities needed. This information will help the infrastructure team determine the appropriate resources to allocate.

We request at least 1.2 TB temporary storage on S3 with RW permission? This is to enable transfer of data from Penn state system to Univ of Alabama compute system.

Options:

  1. Cloud Provider: AWS/Azure/GCP

  2. Required Services in the Cloud:

    List of AWS Services

  • S3

5. Timeline:
Indicate the expected timeline for the project and when the resources will be needed. This information can help the infrastructure team plan and allocate resources accordingly.

These projects start in 2023, thus we need these resources as soon as possible.

6. Security and Compliance Requirements:
If there are any specific security or compliance requirements for the project, these should be clearly stated in the request. This will help ensure that the necessary security measures are in place for the project.

7. Estimation:
Include any cost estimation or requirements for the project. This will help the infrastructure team select the most cost-effective solutions for the project.

AWS Cost Calculator: https://calculator.aws/#/

Google Cloud Pricing Calculator: https://cloud.google.com/products/calculator

8. Approval:
Indicate the necessary approval processes or sign-offs required for the request.

Benchmarking LSTM models on Canada and US basins: CAMELS_spat dataset

1. Requester Information:

Name: Raymond Spiteri

Institution: University of Saskatchewan

Position: Professor - Computer Science

Email: [email protected]

2. Project Information:

This project focuses on benchmarking Long Short-Term Memory (LSTM) models across Canadian and US basins (from CAMELS_spat, under development). Specifically, we concentrate on hydrological modeling tasks related to extreme precipitation and flood events. Leveraging the shared infrastructure, this initiative serves as a platform for executing models efficiently across diverse basins, thereby contributing to the establishment of a benchmarking standard for LSTM models.

Jesus Perez Curbelo, Postdoc at the University of Saskatchewan, will be accessing and using the resources and will work under the supervision of Raymond Spiteri and Martyn Clark. We have collaborative efforts with Chaopeng Shen in this research.

Utilizing the NeuralHydrology library, we plan to replicate published findings from the CAMELS-US dataset. Additionally, we will employ CAMELS_spat for modeling Canadian basins. The NeuralHydrology library, hosted on GitHub, provides the necessary framework for our modeling endeavors.

3. Project Description:

The project involves using and developing LSTM models using virtual environments where we would install the dependencies for the NeuralHydrology library. Also, the software and scripts will be developed in Python including packages such as TensorFlow, PyTorch, and Keras. The data will be stored in netcdf and csv files.

4. Resource Requirements:

Subject to change:

GPU: 8xA100 (when available) / 4xV100

CPU: 16 cores

Memory: 100 GB RAM

Disk Space: 200 GB HD

5. Timeline:

Starting as soon as possible and continuing until December 31, 2024

6. Security and Compliance Requirements:

No apply

7. Approval:

No apply

Support for .gpkg geopackage

Currently, SQLITE is set to OFF as below:

#16 178.5 --     NGEN_WITH_MPI: ON
#16 178.5 --     NGEN_WITH_NETCDF: ON
#16 178.5 --     NGEN_WITH_SQLITE: OFF
#16 178.5 --     NGEN_WITH_UDUNITS: ON
#16 178.5 --     NGEN_WITH_BMI_FORTRAN: ON
#16 178.5 --     NGEN_WITH_BMI_C: ON
#16 178.5 --     NGEN_WITH_PYTHON: ON
#16 178.5 --     NGEN_WITH_ROUTING: ON
#16 178.5 --     NGEN_WITH_TESTS: ON
#16 178.5 --     NGEN_QUIET: OFF

Current behavior

Once the input data package is created with .gpkg we will have to update the functionality to add this support.

Expected behavior

NGEN_WITH_SQLITE: ON

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.