osoceanacoustics / echoregions Goto Github PK

View Code? Open in Web Editor NEW

6.0 3.0 5.0 814.68 MB

Interfacing water column sonar data with annotations and labels

Home Page: https://echoregions.readthedocs.io/

License: Apache License 2.0

Python 100.00%

evr evl ecs region plotting parser

echoregions's People

Contributors

Stargazers

Watchers

Forkers

ngkavin valentina-s leewujung ctuguinay

echoregions's Issues

Update Shapely version to >=2.0.0 or unpin it

shapely dependency (which is a dependency of regionmask) is pinned to 1.8.2, since 2.0.0 version causes some issue (ValueError: Inconsistent coordinate dimensionality). regionmask==0.9.0 seems to require >1.7.0. Going forward the code should be adjusted to the later version, and possibly sort out backward compatibility.

Multiple Line Plotting demonstration notebook.

Read several .evl files and combine the bottoms to plot on one plot. There is some winter observations in each .evl file so there maybe some redundance and it may require more work to get the start and date of the transect.

Add typing.

Review design `Region2D` and other related modules

In reviewing #45 I had some comments on the current structure of Region2D and related modules.

That PR was merged so that we can move forward on the project, so this issue is a reminder that we should revisit these.

The comments are reproduced below:

review the use case of Region2D to have clarity on whether we should just parse EVR file at init, or change the input to accept both EVR and CSV/JSON that were converted previously.
convert_points:
- currently lives in evr_parser.py but I think it should just live in Region2D
- it is currently unused. From your comment above it seems that some more work is needed to smooth out the points.
get_points_from_region: this function does not currently work since there is no get_points_from_region under Regions2DPlotter
Region2DPlotter: when should it be initialized?
add docstring for Regions2DMasker.mask (from Region2D.mask

Update tests

Region Masking Functionality

The mask function of the Regions2D object is not working currently.

The actual functionality is in the mask method of the Regions2DMasker

Working example for making "Hake" region masks (only one region) in this notebook.

Some considerations:

input could be one or many regions (based on region id's)
user wants all regions for a given label to be in the same mask
user want regions with different labels to be layers in the mask dataset
user wants regions with different labels to be in one layer (for example they want to combine all fish regions into one layer)
provide some options to the user how to store the pixel values of the masked regions

A few extra details:

adding regionmask version as a requirement
adding init.py
add examples in notebooks
add examples selecting different regions based on labels

Make output of `Regions2D.select_sonar_file` a list.

Now if there is just one file the output is the name of that file. If expecting a list one can loop through the files, but if it is only a string, one would loop by mistake over the characters in the string. It will be simpler to always expect a list. Maybe change to select_sonar_files?

Update requirements and fix tests from pandas >=2.0

Some existing tests are failing at pandas >=2.0, let's make sure both the requirements.txt is updated and fix test issues so that all tests runs.

Decide on `JSON` structure from Regions2D

Add tests for masking functionality

Finding files based on metadata of the sonar files.

Currently, finding the sonar files of echoregions relies on the data in the name of the files. This is fine for our hake survey use case, but it eventually we should make it work with the metadata from the sonar file (opened without specifying any group), so that it is file name independent.

Fix paths in example notebooks

Revamp testing suite with new converted file from echopype 0.7.1

potentially use fixtures where possible
ensure testing with raw files, nc, zarr files
make sure only small testing files are in github

Test and update `Lines_plotting.ipynb` notebook

Test and update Lines_plotting.ipynb notebook.

Related to #59.

Test and update notebook `Regions2D_functions.ipynb`

The Regions2D_functions.ipynb notebook is likely outdated. Need to test what does or does not work, and update the notebook.

Interpolation for Line Plot

A possible feature could be using interpolation to connect some of these dots. More specifically, the interpolation algorithm would have to generate associated timestamp and depth values.

pass marker arguments with line plotting (possibly with kwargs)

Add installation instruction to README

Reorganize tests

Right the different tests are sprinkled across 3 .py files in the tests folder without clear groupings.
Let's reorganize them (and add new tests as needed) based on the functionalities being tested, as we move forward to add test data into the repo (#25).

Masking Functionality

Regions
Lines

Single Region plotting notebook

Read 1 region from 1 .evr file. Find the overlapping sonar files, and plot the region superimposed on them.

Document current (May 2023) echoregions design

We plan to refactor echoregions to streamline reading EVR and expand to writing EVR in summer 2023. Let's document the current echoregions design as the first step.

Related to #10 and #54

Fix import redundancy

After the major clean up of unused imports in #24, there are still some redundancy in the import statements that need to be fixed.

Below I list the two I saw while going through the import statement issues:
https://github.com/leewujung/echoregions/blob/6734119e55ee1f6fab997a57e8467d2b3dfe1822/echoregions/__init__.py#L1-L5
Under echoregions.convert the CalibrationParser and read_* functions are also imported.
Do we want the users to invoke them at the root level, or as part of the convert subpackage?

https://github.com/leewujung/echoregions/blob/6734119e55ee1f6fab997a57e8467d2b3dfe1822/echoregions/tests/test_ev_parser.py#L5-L6
parse_time is separately imported here but it is actually imported at the root level in echoregions/init.py above already.
Again do we want the users to invoke them at the root level?

datetime encoding for older version of xarray and recent pandas breaking changes

During testing, xarray==0.16.2 raises nanosecond encoding error: pydata/xarray#4400

It seems it is resolved in xarray==2023.2.0. Need to identify a minimum cutoff version and set it in requirements-dev.txt

Update Line Functionality to new Sv format from echopype version 0.6.3

#57 converts only masking functionality to new sv format, not line functionality. Update that with new files and example notebooks.

Add test data into repo

Currently the test data used in the tests are not in the repo. They are should be added.

Read directly from cloud.

Add functionality and examples to read directly from the cloud.

Add echoregions to conda

It'll be good to add echoregions to conda, and have that triggered by GitHub releases. This will improve our overall workflow and ML work.

add test CI

Use `pathlib` instead of `os`

Some parts of the codebase use pathlib but other parts use os for path handling. Some other parts use pure strings. Let's clean up everything to use pathlib.

Warnings that arose from PR #81

Text:

=============================== warnings summary ===============================
echoregions/tests/test_r2d.py::test_mask_no_overlap
:241: RuntimeWarning: numpy.ndarray size changed, may indicate binary incompatibility. Expected 16 from C header, got 96 from PyObject

echoregions/tests/test_r2d.py::test_mask_no_overlap
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pkg_resources/init.py:121: DeprecationWarning: pkg_resources is deprecated as an API
warnings.warn("pkg_resources is deprecated as an API", DeprecationWarning)

echoregions/tests/test_r2d.py::test_mask_no_overlap
/opt/hostedtoolcache/Python/3.11.3/x64/lib/python3.11/site-packages/pkg_resources/init.py:2870: DeprecationWarning: Deprecated call to pkg_resources.declare_namespace('mpl_toolkits').
Implementing implicit namespace packages (as specified in PEP 420) is preferred to pkg_resources.declare_namespace. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)

test ecs format

Remove `frame.append` since it is being deprecated.

evr_parser is using df.append to append rows. Append to list and concat instead.

Line masking problems

Making bottom masks from .evl files requires to interpolate the bottom curve points to the grid of the sonar file. The choice of the interpolation scheme affects the result. We should provide option to select different interpolation schemes, and the user can change the parameters so this works for different datasets.

Version for Hake Survey is in this notebook

Line reading demonstration notebook.

A notebook which demonstates how to read one .evl file and plot the line on top of sonar data, and export the line into a .csv format.

clean notebooks after format change

Add Line Masking Functionality

As of right now, there exists no such Line Mask in echoregion modules. We need line masking in order to get precise depth values to better Hake ML biomass calculations. To resolve this issue, a function line.mask(Sv, interp options) must be created that should include the following:

filter time start/end based on Sv
basic interp that always occur using pandas df interp options
make mask

This implementation will repurpose the following code in the notebook created by Valentina Staneva as Hake Bottom Interpolation.
Sufficient implementation of this function will resolve this issue and those of #43, #82.

Provide an updated (converted with echopype 0.6.3) Sv files for the line plotting notebook

Make sure this notebook is up to date with the newer Sv format.

https://github.com/OSOceanAcoustics/echoregions/blob/main/notebooks/Lines_plotting.ipynb

Echoregions functionality notebook

A notebook describing the different attributes and methods of the objects in echoregions (regions & lines).

UML diagram
regions functionality
lines functionality

Geared toward technical people who would want to modify it/extend it.

Test and update notebooks `Regions2D_plotting.ipynb` and `Regions2D_masking.ipynb`

Regions2D_masking.ipynb is built on top of Regions2D_plotting.ipynb, so test and update Regions2D_plotting.ipynb first.

Store different labels to be layers in the mask dataset

On way to achieve that is through 3D array mask, or through stacking them as variables. One also generate them separately and stack the. Regionmask has the option mask_3D, which is binary. One can also generate them by looping through the region types.

Multiple Region plotting use case.

read through a folder of .evr files.
For each region in the list of regions across files
- find the corresponding sonar data files
- plot the region superimposed on the background of the sonar .sv data
- save a .png

Note the .png's will be of different size since the regions are of different time spans.

Create first release for echoregions

It'll be good in the not too distant future to create a release for echoregions that contains stable basic functionalities. We can set up the releases to publish directly to PyPI and triggered a conda build (#67).

IMHO we should do this after resolving things in #54 so things are not that confusing.

Tasks

Set up webhook for Zenodo
Do slight refactoring so as to match what is found in Scientific Python and Python Packaging
Create release on GitHub
Submit to test PyPI
Submit to official PyPI
Ensure can pip install directly from PyPI
Update pip install instructions in docs to use pypi

Determine Earliest Working Regionmask Version

determine earliest regionmask working version
Masking function does not work with 0.7.0 (one gets unix_time instead of ping_time; need to add tests to catch this error). 0.9.0 works.

Add functionality to create masks for within-transect (good regions)

This can take several forms but it slightly depends on how people have annotated individual breakpoints:

start transect
break transect
resume transect
end transect

Usually those should be annotated with log lines, but sometimes it can be thin boxes, or within-transect boxes. One approach is to create an intermediate table with time stamps for the staring and ending times and create the mask based on it.