Giter VIP home page Giter VIP logo

nwmurl's People

Contributors

jameshalgren avatar jordanlasergit avatar karnesh avatar manjirigunaji avatar rohansunkarapalli avatar sepehrkrz avatar

Stargazers

 avatar  avatar

nwmurl's Issues

Re-enable validation checker as an option and refactor as separate file and function.

This is a proposal to move the validation checking capability to a separate file. Moving this to a separate file would reduce the tqdm dependency, which is only needed if validation is requested.

A use case we might want to support is providing an externally generated list of files to this utility to check if the URLs have downloadable endpoints.

from tqdm import tqdm
# def check_valid_urls(file_list, session=None):
# """if not session:
# session = requests.Session()"""
# t = tqdm(range(len(file_list)))
# check_url_part = partial(check_url, t)
# """with ThreadPoolExecutor(max_workers=10) as executor:
# valid_file_list = list(executor.map(check_url_part, file_list))"""
# valid_file_list = [gevent.spawn(check_url_part, file_name) for file_name in file_list]
# gevent.joinall(valid_file_list)
# return [file.get() for file in valid_file_list if file.get() is not None]
# def check_url(t, file):
# filename = file.split("/")[-1]
# try:
# with requests.head(file) as response:
# if response.status_code == 200:
# t.set_description(f"Found: {filename}")
# t.update(1)
# t.refresh()
# return file
# else:
# t.set_description(f"Not Found: {filename}")
# t.update(1)
# t.refresh()
# return None
# #response = session.head(file, timeout=1)
# except requests.exceptions.RequestException:
# t.set_description(f"Not Found: {filename}")
# t.update(1)
# t.refresh()
# return None

Production deployment error

Github workflow unable to upload as pip package due to following error:

Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.4/x64/bin/twine", line 5, in <module>
    from twine.__main__ import main
  File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/twine/__init__.py", line 43, in <module>
    __license__ = metadata["license"]
                  ~~~~~~~~^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.4/x64/lib/python3.12/site-packages/importlib_metadata/_adapters.py", line 54, in __getitem__
    raise KeyError(item)
KeyError: 'license'
Error: Process completed with exit code 1.

Add output options

Currently, the behavior is to write a file.

Options should be to
a) return a list of names
b) write a list to an arbitrary file
c) write a list to a specified file

This issue assumes the output format is a simple text file with one file name per line.

Update Readme with URL options

The Readme would be more clear with the baseurlinput options. This could be added below the runinput options.

- `urlbaseinput`: An integer selecting the overall data source. Options include:
  - 0: "",  # Empty
  - 1: "https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/",  # NOAA hosted operational output
  - 2: "https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/post-processed/WMS/",  # not functioning
  - 3: "https://storage.googleapis.com/national-water-model/",  # authenticated access to Google buckets
  - 4: "https://storage.cloud.google.com/national-water-model/",  # open-access to Google buckets
  - 5: "gs://national-water-model/",  
  - 6: "gcs://national-water-model/",
  - 7: "https://noaa-nwm-pds.s3.amazonaws.com/",  # 4 week rolling AWS short range
  - 8: "https://ciroh-nwm-zarr-copy.s3.amazonaws.com/national-water-model/",  # operational Kerchunk json archive

Remove hidden files and folders

The notebook checkpoint folder as well as some of the pycache files have been committed to the repository. These should be ephemeral artifacts.

These:

├── .ipynb_checkpoints
│  ├── README-checkpoint.md
│  └── setup-checkpoint.py

and these (except for the nwmurl folder)

├── nwmurl
│  ├── .ipynb_checkpoints
│  │  └── urlgennwm-checkpoint.py
│  ├── __pycache__
│  │  └── urlgennwm.cpython-310.pyc

...should all be removed from versioning.

Update retrospective url generaton function to include zarr s3

Current base url list for retrospective files does not include the CIROH-developed zarr archive for NWM 2.1 retrospective output. Those files are here:
https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/index.html#noaa-nwm-retrospective-2-1-zarr-pds/

See here:

nwmurl/nwmurl/urlgennwm.py

Lines 164 to 169 in d2b9676

urlbasedict_retro = {
1: "https://noaa-nwm-retrospective-2-1-pds.s3.amazonaws.com/",
2: "s3://noaa-nwm-retrospective-2-1-pds/",
3: "https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/",
4: "https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/",
}

Retro 2.1 zarr headers bug

The nwmurl generates wrong file list for retro 2.1 zarr headers (.json files). Following input arguments:

start_date = "200701010000"
end_date = "200701010800"
urlbaseinput = 3
selected_var_types = [1]
selected_object_types = [1]  
write_to_file = False

file_list = nwmurl.generate_urls_retro(
    start_date,
    end_date,
    urlbaseinput,
    selected_object_types,
    selected_var_types,
    write_to_file
)

generates

['1.json',
 '1.json',
 '1.json',
 '1.json',
 '1.json',
 '1.json',
 '1.json',
 '1.json',
 '1.json']

instead of

['https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010100.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010101.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010102.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010103.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010104.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010105.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010106.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010107.LDASIN_DOMAIN1.json',
 'https://ciroh-nwm-zarr-retrospective-data-copy.s3.amazonaws.com/noaa-nwm-retrospective-2-1-zarr-pds/forcing/2007/2007010108.LDASIN_DOMAIN1.json']

Add tests for all potential outputs of naming

It would be helpful, both for purposes of documentation as well as for good code practice to introduce formal tests for all of the expected outputs of the naming capability (i.e., pytest).

Tests should demonstrate proper output control for situations in each of the following output folders:

  • analysis_assim
  • analysis_assim_alaska
  • analysis_assim_alaska_no_da
  • analysis_assim_coastal_atlgulf
  • analysis_assim_coastal_hawaii
  • analysis_assim_coastal_pacific
  • analysis_assim_coastal_puertorico
  • analysis_assim_extend
  • analysis_assim_extend_alaska
  • analysis_assim_extend_alaska_no_da
  • analysis_assim_extend_coastal_atlgulf
  • analysis_assim_extend_coastal_pacific
  • analysis_assim_extend_no_da
  • analysis_assim_hawaii
  • analysis_assim_hawaii_no_da
  • analysis_assim_long
  • analysis_assim_long_no_da
  • analysis_assim_no_da
  • analysis_assim_puertorico
  • analysis_assim_puertorico_no_da
  • forcing_analysis_assim
  • forcing_analysis_assim_alaska
  • forcing_analysis_assim_extend
  • forcing_analysis_assim_extend_alaska
  • forcing_analysis_assim_hawaii
  • forcing_analysis_assim_puertorico
  • forcing_medium_range
  • forcing_medium_range_alaska
  • forcing_medium_range_blend
  • forcing_medium_range_blend_alaska
  • forcing_short_range
  • forcing_short_range_alaska
  • forcing_short_range_hawaii
  • forcing_short_range_puertorico
  • long_range_mem1
  • long_range_mem2
  • long_range_mem3
  • long_range_mem4
  • medium_range_alaska_mem1
  • medium_range_alaska_mem2
  • medium_range_alaska_mem3
  • medium_range_alaska_mem4
  • medium_range_alaska_mem5
  • medium_range_alaska_mem6
  • medium_range_alaska_no_da
  • medium_range_blend
  • medium_range_blend_alaska
  • medium_range_blend_coastal_atlgulf
  • medium_range_blend_coastal_pacific
  • medium_range_coastal_atlgulf_mem1
  • short_range
  • short_range_alaska
  • short_range_coastal_atlgulf
  • short_range_coastal_hawaii
  • short_range_coastal_pacific
  • short_range_coastal_puertorico
  • short_range_hawaii
  • short_range_hawaii_no_da
  • short_range_puertorico
  • short_range_puertorico_no_da
  • usgs_timeslices

Similarly, tests should show the capability to provide the output for the different base urls
'''
0: "",
1: "https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/",
2: [DO NOT USE] "https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/post-processed/WMS/",
3: "https://storage.googleapis.com/national-water-model/",
4: "https://storage.cloud.google.com/national-water-model/",
5: "gs://national-water-model/",
6: "gcs://national-water-model/",
7: "https://noaa-nwm-pds.s3.amazonaws.com/",
8: "https://ciroh-nwm-zarr-copy.s3.amazonaws.com/national-water-model/",
'''

Finally, testing should confirm the capability to check or not check the validity of the generated urls, with some documentation about the corner cases involved with checking and additional documentation about the input command-line options for that.

Add meminput (member input) as option

Fix hard-coded input here:

meminput = 0

Also needs some documentation to explain what the meminput is (it is the ensemble member designation)

Also could include some validation/checking -- there is no ensemble (and therefore no ensemble members) for short_term, analysis_assim, nor forcing.

Add CICD

Packages get automatically deployed when pull requests are approved.

retro nwm 3.0 url bug

This conf_nwmurl.json

{
  "forcing_type": "retrospective",
  "start_date": "202001200100",
  "end_date": "202001200100",
  "urlbaseinput": 4,
  "selected_object_type": [
    1
  ],
  "selected_var_types": [
    6
  ],
  "write_to_file": true
}

yields this retro_filenamelist.txt

https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/forcing/2020/2020012001.LDASIN_DOMAIN1

but I think it should be

https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/FORCING/2020/202001200100.LDASIN_DOMAIN1

so there's just two missing digits for the minutes, not sure why they are required.

Wrong URLs for retrospective NWM Retrospective dataset version 3.0

The nwmurl generates wrong URLs for retro 3.0 dataset. Following input arguments:

start_date = "200801010000"
end_date = "200801010800"
urlbaseinput = 4
selected_var_types = [1,2,3]
selected_object_types = [2]  
write_to_file = False

file_list = nwmurl.generate_urls_retro(
    start_date,
    end_date,
    urlbaseinput,
    selected_object_types,
    selected_var_types,
    write_to_file
)

generates

['https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/model_output/2008/2008010100.CHRTOUT_DOMAIN1.comp',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/model_output/2008/2008010101.CHRTOUT_DOMAIN1.comp',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/model_output/2008/2008010102.CHRTOUT_DOMAIN1.comp',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/model_output/2008/2008010103.CHRTOUT_DOMAIN1.comp']

instead of

['https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/2008/200801010000.CHRTOUT_DOMAIN1',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/2008/200801010100.CHRTOUT_DOMAIN1',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/2008/200801010200.CHRTOUT_DOMAIN1',
 'https://noaa-nwm-retrospective-3-0-pds.s3.amazonaws.com/CONUS/netcdf/CHRTOUT/2008/200801010300.CHRTOUT_DOMAIN1']

The directory structure for 3.0 version is different than version 2.1, But, it currently generates URLs following 2.1 version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.