nsidc / antarctica_today Goto Github PK

The "Antarctica Today" code and datasets

License: MIT License

Python 100.00%

antarctica_today's Introduction

Antarctica_Today

The "Antarctica Today" code and datasets necessary to create the database, update it, and generate plots and maps of results.

This code is maintained by:

Mike MacFerrin, original author
National Snow and Ice Data Center

Documentation

Acknowledgements

TODO

antarctica_today's People

Contributors

Stargazers

Watchers

Forkers

betolink mmacferrin

antarctica_today's Issues

Create a regression test for the database correctness

When NSIDC 0080 v2 was released and v1 deprecated, we lost the ability to recreate the database from v1 data. So we can't validate that with v2 data, we generate the exact same database.

It would be impractical to commit the database to the Git repo and use it for regression testing. We'd like a regression test based on hashing a subset of the database, e.g. up to 2022, and comparing it to a known-good hash.

We would need a known-good copy of the database permanently archived somewhere (and the unit test should print a link to it when it fails) to enable deeper investigation if the hash check ever does fail.

Start a doc page on src_baseline and baseline_datasets

Following up on this comment: #9 (comment)

Log with the `logging` module or `loguru`

Replace print calls with logs!

Memory footprint optimization

Currently, a >6GB numpy array is allocated, and if you don't have enough free memory, you're in trouble ;)

numpy.core._exceptions._ArrayMemoryError: Unable to allocate 6.32 GiB for an array with shape (332, 316, 8089) and data type int64

More interoperable format for pickle db

A pickle file containing a 2d grid for every day in the climatology is continuously updated (forward processing) and read as input by this application.

TODO: Document what does the data in the file look like

What other format can we use that would be more interoperable and portable? NetCDF? SQLite? I think it would be best (see #18) if the format we choose supports selective reads.

Consolidate various data directories under `/data`

We have a few directories containing data in the root of the repo. One is titled data/, and I feel we could combine them all in to there for easier navigation.

I feel that the following dirs would make sense there:

Tb
baseline_datasets
plots
qgis

Get grant number(s) which funded Mike's work on v1.0.0

@mmacferrin will contact Ted to find this information. Once that's added to this ticket, we can update CITATION.cff.

#3 (comment)

Release v1

Need:

Initial CITATION.cff (#3)
Zenodo integration
Create release in GH

Apply operation documentation feedback comments from Mike in Slack

https://nsidc.slack.com/archives/C55HTSQQL/p1702500070331579?thread_ts=1702497864.662309&cid=C55HTSQQL

@mmacferrin:

...a couple of suggestions (for one, in Step 1, noting that the .bin files based on nsidc-0001 and nsidc-007 are actually already pre-made and included in the git repository [/data/daily_melt_bin_files/]. One does not actually need to download any nsidc-0001 or nsidc-0007, as the data derived from those is already there for the user. It's just everything from 2022-01-10 onward that must be downloaded anew). <~~~ But this actually brings up a slightly bigger issue, that the code to generate all those .bin files from nsidc-0001 & -0007, I think, has been lost, and thus hurts the reproducibility of this science. :yikes: It's another issue, perhaps, worth it to reproduce that old code (it'd have to be based upon the v2 data now) to ensure that the old climatologies really are being reproduced completely in v2.

But that's an issue for a day with a bit more funding.

The only "one-liner" is the daily_update.py script, which is correctly documented in that operation.md file, but wasn't yet working in the code (as it was entirely reliant on v1 data still). Basically, that operation.md file is correct though, and well-done. I could suggest an edit for the "What is the gap filled" data file, in Step 3, "Initializing" section. I can explain that. 🙂

Syntax error in `extract_gridded_RCM_data.py`

@mmacferrin thoughts on what this should look like? Found this with a static analysis tool. I believe this code isn't needed for our current objectives, but good to document the finding anyway :)

Antarctica_Today/src/src_baseline/extract_gridded_RCM_data.py

Line 18 in 4cc5601

netCDF4.

          File "src/src_baseline/extract_gridded_RCM_data.py", line 18
            netCDF4.
                    ^                                                                                                                          
        SyntaxError: invalid syntax