Giter VIP home page Giter VIP logo

kso's People

Contributors

dependabot[bot] avatar diewertje11 avatar jannesgg avatar pilarnavarro avatar victor-wildlife avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kso's Issues

Tutorial 4 Issue Initiating project database

๐Ÿ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Project: GU
Input:

# Find project
project = p_utils.find_project(project_name=project_name.value)
# Initialise pp
pp = ProjectProcessor(project)

Output:

PermissionError                           Traceback (most recent call last)
Cell In[5], line 4
      2 project = p_utils.find_project(project_name=project_name.value)
      3 # Initialise pp
----> 4 pp = ProjectProcessor(project)

File /usr/src/app/kso/kso_utils/kso_utils/project.py:68, in ProjectProcessor.__init__(self, project)
     65 self.map_init_csv()
     67 # Create empty db and populate with local csv files data
---> 68 self.setup_db()
     70 # Mount Snic server if needed
     71 if self.project.server == "SNIC":

File /usr/src/app/kso/kso_utils/kso_utils/project.py:161, in ProjectProcessor.setup_db(self)
    155 """
    156 The function creates a database and populates it with the data from the local csv files.
    157 It also return the db connection
    158 :return: The database connection object.
    159 """
    160 # Create a new database for the project
--> 161 db_utils.create_db(self.project.db_path)
    163 # Connect to the database and add the db connection to project
    164 self.db_connection = db_utils.create_connection(self.project.db_path)

File /usr/src/app/kso/kso_utils/kso_utils/db_utils.py:283, in create_db(db_path)
    281 # Delete previous database versions if exists
    282 if os.path.exists(db_path):
--> 283     os.remove(db_path)
    285 # Get sql command for db setup
    286 sql_setup = schema.sql

PermissionError: [Errno 1] Operation not permitted: '/tmp/gu.db'

Expected behavior

Should initiate the database of the project.

Environment

Additional context

Add any other context about the problem here.

Notebook 5: locale.getpreferredencoding() gets changed during the training. Causing the notebook to not be able to train again or run the evaluation part.

When you run Notebook 5, and request the preferred encoding at the beginning, or just before the cell where you do train.run(...), you get 'UTF-8'. (using code below)

import locale
locale.getpreferredencoding()

However, when you run the same thing after the cell in which you train, it returns 'ANSI_X3.4-1968'. (which is ASCII). So somewhere during this training that is performed by the YOLO5 code, this default gets changed. This causes an error with reading the names in the train.txt or valid.txt file when you train again or do the validation. (since these files contain Swedish letters, in the case of the template project)

Exception: train: Error loading data from /content/koster_yolov4/tutorials/ml-template-data/train.txt: 'ascii' codec can't decode byte 0xc3 in position 31: ordinal not in range(128)

This comes from line 470 in /content/koster_yolov4/yolov5/utils/dataloaders.py where the text file is opened with open(). This open() function uses the default encoding, and ASCII cannot read the รค.

We have not located exactly how this change in locale is made. We could not find anything in the code from YOLO5, when we search with git grep for ANSI, locale, encoding, ASCII, coding. only in the file utils/mws/mime.sh they do something with ASCII, but we do not think this file gets used.

Solutions would be to or prevent this change if we can locate where it is made. Or by every time setting it back to the correct default. However, we have not found a command yet that can set it back. We have tried the following:

  • locale.setlocale(locale.LC_ALL, '') (returns en_US.UTF-8')
  • sys.getfilesystemencoding() (returns utf-8)
  • locale.getlocale() (returns ('en_US', 'UTF-8') )
  • result = _locale.nl_langinfo(_locale.CODESET) (result contains 'ANSI_X3.4-1968'
  • _locale.CODESET (returns 14)

So it seems like there are 2 different encoding settings. One system wide one, that stays at UTF8 and is not changed, and one locale that gets changed. However, trying to change this back gives an error:

NotImplementedError: A UTF-8 locale is required. Got ANSI_X3.4-1968

The ways we have tried to set it back:

  • !chcp 65001
  • !vim /etc/default/locale
  • !echo $PYTHONIOENCODING
  • locale.setlocale(locale.LC_CTYPE, 'en_US.UTF-8')

The code below seems to set it back, but it does not solve the issue when training/validating, so it just sets it to a string or something.

import locale
def getpreferredencoding(do_setlocale = True):
return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

locale.getpreferredencoding()

To have the template project working for the workshop on 02-03-2023, we simply change the names of the files so that they do not contain any รค or other Swedish letters.

Tutorial 8 issue with optional 2, Download annotations as CSV file

๐Ÿ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

pp.download_classications_csv(pp.processed_zoo_classifications)
# Uncomment the following line to download the aggregated classifications
pp.download_classications_csv(pp.aggregated_zoo_classifications)

Output:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[16], line 1
----> 1 pp.download_classications_csv(pp.processed_zoo_classifications)
      2 # Uncomment the following line to download the aggregated classifications
      3 # pp.download_classications_csv(pp.aggregated_zoo_classifications)

File /usr/src/app/kso/kso_utils/kso_utils/project.py:1069, in ProjectProcessor.download_classications_csv(self, class_df)
   1063 # Download the processed classifications as a csv file
   1064 csv_filename = (
   1065     self.project.Project_name
   1066     + str(datetime.date.today())
   1067     + "classifications.csv"
   1068 )
-> 1069 class_df.to_csv(csv_filename, index=False)
   1071 logging.info(f"The classications have been downloaded to {csv_filename}")

File /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:3563, in NDFrame.to_csv(self, path_or_buf, sep, na_rep, float_format, columns, header, index, index_label, mode, encoding, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, decimal, errors, storage_options)
   3552 df = self if isinstance(self, ABCDataFrame) else self.to_frame()
   3554 formatter = DataFrameFormatter(
   3555     frame=df,
   3556     header=header,
   (...)
   3560     decimal=decimal,
   3561 )
-> 3563 return DataFrameRenderer(formatter).to_csv(
   3564     path_or_buf,
   3565     line_terminator=line_terminator,
   3566     sep=sep,
   3567     encoding=encoding,
   3568     errors=errors,
   3569     compression=compression,
   3570     quoting=quoting,
   3571     columns=columns,
   3572     index_label=index_label,
   3573     mode=mode,
   3574     chunksize=chunksize,
   3575     quotechar=quotechar,
   3576     date_format=date_format,
   3577     doublequote=doublequote,
   3578     escapechar=escapechar,
   3579     storage_options=storage_options,
   3580 )

File /usr/local/lib/python3.8/dist-packages/pandas/io/formats/format.py:1180, in DataFrameRenderer.to_csv(self, path_or_buf, encoding, sep, columns, index_label, mode, compression, quoting, quotechar, line_terminator, chunksize, date_format, doublequote, escapechar, errors, storage_options)
   1159     created_buffer = False
   1161 csv_formatter = CSVFormatter(
   1162     path_or_buf=path_or_buf,
   1163     line_terminator=line_terminator,
   (...)
   1178     formatter=self.fmt,
   1179 )
-> 1180 csv_formatter.save()
   1182 if created_buffer:
   1183     assert isinstance(path_or_buf, StringIO)

File /usr/local/lib/python3.8/dist-packages/pandas/io/formats/csvs.py:241, in CSVFormatter.save(self)
    237 """
    238 Create the writer & save.
    239 """
    240 # apply compression and byte/text conversion
--> 241 with get_handle(
    242     self.filepath_or_buffer,
    243     self.mode,
    244     encoding=self.encoding,
    245     errors=self.errors,
    246     compression=self.compression,
    247     storage_options=self.storage_options,
    248 ) as handles:
    249 
    250     # Note: self.encoding is irrelevant here
    251     self.writer = csvlib.writer(
    252         handles.handle,
    253         lineterminator=self.line_terminator,
   (...)
    258         quotechar=self.quotechar,
    259     )
    261     self._save()

File /usr/local/lib/python3.8/dist-packages/pandas/io/common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    784 elif isinstance(handle, str):
    785     # Check whether the filename is to be opened in binary mode.
    786     # Binary mode does not support 'encoding' and 'newline'.
    787     if ioargs.encoding and "b" not in ioargs.mode:
    788         # Encoding
--> 789         handle = open(
    790             handle,
    791             ioargs.mode,
    792             encoding=ioargs.encoding,
    793             errors=errors,
    794             newline="",
    795         )
    796     else:
    797         # Binary mode
    798         handle = open(handle, ioargs.mode)

OSError: [Errno 30] Read-only file system: 'Koster_Seafloor_Obs2023-08-28classifications.csv'

Expected behavior

It should download the annotations as CSV-file i think.

Environment

If applicable, add screenshots to help explain your problem.

  • OS: [e.g. Ubuntu]
  • GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

Improve uploading of frames from third parties to Zooniverse

In tutorial #4, we should improve the current approach to uploading your own frames (e.g. not retrieving them from clips classified by Zooniverse volunteers).
This will help other projects that have classified their own videos or collected underwater images.

Developed a point-based segmentation approach

Integrate a ML approach to segment areas of interest from the SGU point-based photo labels. Maybe using Taglab or CVAT?

Relevant literature:
General segmentation:
https://arxiv.org/pdf/2003.06148.pdf
https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/how-to-segment-anything-with-sam.ipynb

Segment Anything Meets Point Tracking:
Repo on a joint model of point tracking and Segmentation

Ocean-related:
On Improving the Training of Models for the Semantic Segmentation of Benthic Communities from Orthographic Imagery
Automatic Semantic Segmentation of Benthic Habitats Using Images from Towed Underwater Camera in a Complex Shallow Water Environment

@jannesgg @biomobst from our last conversation with SGU it's clear to me we should have different ML model approaches and human-in-the-loop annotation labelling (whether CVAT, Taglab,...)

Clean unused "add_project" func

The "add_project" function (in project_utils) is currently not used in any tutorial. Should we add it to tut#1?

It was triggering some pylint errors so I have uncommented it for now.

Image

Tutorial 8 Sampling and processing zooniverse classifications error at pp.process_zoo_classifications()

ts()`.

๐Ÿ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Project: Koster Seafloor observatory
choice: "No, just download the last available information"
Input:

pp.process_zoo_classifications()

Output:

NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 pp.process_zoo_classifications()

File /usr/src/app/kso/kso_utils/kso_utils/project.py:777, in ProjectProcessor.process_zoo_classifications(self, test)
    773     workflow_checks = self.workflow_widget.checks
    775 # Retrieve a subset of the subjects from the workflows of interest and
    776 # populate the sql subjects table
--> 777 zoo_utils.sample_subjects_from_workflows(
    778     project=self.project,
    779     server_connection=self.server_connection,
    780     db_connection=self.db_connection,
    781     workflow_widget_checks=workflow_checks,
    782     workflows_df=self.zoo_info["workflows"],
    783     subjects_df=self.zoo_info["subjects"],
    784 )
    786 # Make sure all the classifications have existing subjects,
    787 # Flatten the classifications provided the cit. scientists
    788 self.processed_zoo_classifications = zoo_utils.process_zoo_classifications(
    789     project=self.project,
    790     db_connection=self.db_connection,
   (...)
    793     subject_type=workflow_checks["Subject type: #0"],
    794 )

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1265, in sample_subjects_from_workflows(project, server_connection, db_connection, workflow_widget_checks, workflows_df, subjects_df)
   1261 drop_table(conn=db_connection, table_name="subjects")
   1263 if len(subjects_series) > 0:
   1264     # Fill or re-fill subjects table
-> 1265     populate_subjects(project, server_connection, db_connection, subjects_series)
   1266 else:
   1267     logging.error("No subjects to populate database from the workflows selected.")

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1048, in populate_subjects(project, server_connection, db_connection, subjects)
   1045 else:
   1046     from kso_utils.koster_utils import process_koster_subjects
-> 1048     subjects = process_koster_subjects(subjects, db_connection)
   1049     # Fix weird bug where Subject_type is used instead of subject_type for the column name for some clips
   1050 #     if "Subject_type" in subjects.columns:
   1051 #         subjects["subject_type"] = subjects[
   (...)
   1077 
   1078 # Check if the Zooniverse project is the Spyfish
   1079 if project.Project_name == "Spyfish_Aotearoa":

File /usr/src/app/kso/kso_utils/kso_utils/koster_utils.py:288, in process_koster_subjects(subjects, conn)
    284 auto_subjects_df = auto_subjects(subjects, auto_date=auto_date)
    286 ## Update subjects manually uploaded
    287 # Select manually uploaded subjects
--> 288 manual_subjects_df = manual_subjects(
    289     subjects, manual_date=manual_date, auto_date=auto_date
    290 )
    292 # Include movie_ids to the metadata
    293 manual_subjects_df = get_movies_id(manual_subjects_df, conn=conn)

File /usr/src/app/kso/kso_utils/kso_utils/koster_utils.py:102, in manual_subjects(subjects_df, manual_date, auto_date)
     99 man_clips_df["subject_type"] = "clip"
    101 # Extract metadata from manually uploaded clips
--> 102 man_clips_df, man_clips_meta = extract_metadata(man_clips_df)
    104 # Process the metadata of manually uploaded clips
    105 man_clips_meta = process_manual_clips(man_clips_meta)

NameError: name 'extract_metadata' is not defined

Expected behavior

Zooniverse classification from workflow of interest should be visibla.

Environment

If applicable, add screenshots to help explain your problem.

  • OS: [e.g. Ubuntu]
  • GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

Dockerfile includes extra python packages but unknown what they do

While re-creating the ci-pipeline to automatically test the notebooks, it was found that the master branch of kso points to commit a306499 of kso_utils, which is branch origin/feat/pyav-backand.
The dev branch of kso points to the dev of kso_utils, commit f2ac787. (I believe these commits were made to try to fix the problem of extracting frames from movies that Emil had just before the summer holidays)
The requirement file in kso_utils is different in both these commits. This creates the error as can be seen in the image below, when the notebook tests are run in a container based on the requirements in dev (commit f2ac787).

Since the tests did work when the container was build based on commit a306499 and the requirements there, these python packages are added to the dockerfile. (temporarily! we do not want these here, they should be or removed or put in the requirements.)
They are not added to the requirements in dev yet, since I do not know why these packages are added and what their function is, and if we want to use them in the end.

So this issue needs to be resolved by finding out what these packages do and if we actually use them. If we do, they should be added to the kso_utils requirements in dev and master/main. If we do not want them, we need to find out why this error from the image occurs and how we can solve it.

Image

Set up Tutorial 9 (Run ML on new footage)

Add workflow that runs the model over a selection of footage, and finally aggregates this by site and returns the maximum count for a given species within the given movies.

We should check that it works for the template project as well as for active projects (e.g. Spyfish)

Tutorial 4 frame extraction issue

Cannot generate more than 10 frames/ movie in tutorial 4 in the KSO project, else error occurs

input:

# Generate suitable frames for upload by modifying initial frames
pp.generate_custom_frames(
    input_path=input_folder.selected,
    output_path=output_folder.selected,
    backend="av",
    skip_start=120,
    skip_end=120,
    num_frames=100,
    frames_skip=None,
)

Output:

WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list: 1 Missing key frame while searching for timestamp: 0
WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list 1 Cannot find an index entry before timestamp: 0.
WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list: 1 Missing key frame while searching for timestamp: 0
WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list: 1 Missing key frame while searching for timestamp: 0
WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list 1 Cannot find an index entry before timestamp: 0.
WARNING:libav.mov,mp4,m4a,3gp,3g2,mj2:st: 0 edit list 1 Cannot find an index entry before timestamp: 0.
---------------------------------------------------------------------------
RemoteTraceback                           Traceback (most recent call last)
RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/usr/src/app/kso/kso_utils/kso_utils/widgets.py", line 1229, in extract_custom_frames
    frames_to_extract = random.sample(
  File "/usr/lib/python3.8/random.py", line 363, in sample
    raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
"""

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
File /usr/src/app/kso/kso_utils/kso_utils/project.py:891, in ProjectProcessor.generate_custom_frames.<locals>.on_button_clicked(b)
    880 def on_button_clicked(b):
    881     movie_files = sorted(
    882         [
    883             f
   (...)
    888         ]
    889     )
--> 891     results = g_utils.parallel_map(
    892         kso_widgets.extract_custom_frames,
    893         movie_files,
    894         args=(
    895             [output_path] * len(movie_files),
    896             [skip_start] * len(movie_files),
    897             [skip_end] * len(movie_files),
    898             [num_frames] * len(movie_files),
    899             [frames_skip] * len(movie_files),
    900             [backend] * len(movie_files),
    901         ),
    902     )
    903     if len(results) > 0:
    904         self.frames_to_upload_df = pd.concat(results)

File /usr/src/app/kso/kso_utils/kso_utils/general.py:84, in parallel_map(func, iterable, args)
     67 """
     68 The function `parallel_map` uses multiprocessing to apply a given function to each element of an
     69 iterable in parallel.
   (...)
     81 is optional and can be used to pass additional arguments to the function `func`.
     82 """
     83 with multiprocessing.Pool() as pool:
---> 84     results = pool.starmap(func, zip(iterable, *args))
     85 return results

File /usr/lib/python3.8/multiprocessing/pool.py:372, in Pool.starmap(self, func, iterable, chunksize)
    366 def starmap(self, func, iterable, chunksize=None):
    367     '''
    368     Like `map()` method but the elements of the `iterable` are expected to
    369     be iterables as well and will be unpacked as arguments. Hence
    370     `func` and (a, b) becomes func(a, b).
    371     '''
--> 372     return self._map_async(func, iterable, starmapstar, chunksize).get()

File /usr/lib/python3.8/multiprocessing/pool.py:771, in ApplyResult.get(self, timeout)
    769     return self._value
    770 else:
--> 771     raise self._value

ValueError: Sample larger than population or is negative

Expected behaviour:

In theory it should extract the specified amount of frames for each movie, but such was not the way

Tutorial 8: Shark_life project issue

Input

pp.process_zoo_classifications()

Output



KeyError                                  Traceback (most recent call last)
Cell In[76], line 1
----> 1 pp.process_zoo_classifications()

File /usr/src/app/kso/kso_utils/kso_utils/project.py:746, in ProjectProcessor.process_zoo_classifications(self, test)
    742     workflow_checks = self.workflow_widget.checks
    744 # Retrieve a subset of the subjects from the workflows of interest and
    745 # populate the sql subjects table
--> 746 selected_zoo_workflows = zoo_utils.sample_subjects_from_workflows(
    747     project=self.project,
    748     server_connection=self.server_connection,
    749     db_connection=self.db_connection,
    750     workflow_widget_checks=workflow_checks,
    751     workflows_df=self.zoo_info["workflows"],
    752     subjects_df=self.zoo_info["subjects"],
    753 )
    755 # Make sure all the classifications have existing subjects,
    756 # Flatten the classifications provided the cit. scientists
    757 self.processed_zoo_classifications = zoo_utils.process_zoo_classifications(
    758     project=self.project,
    759     db_connection=self.db_connection,
   (...)
    763     selected_zoo_workflows=selected_zoo_workflows,
    764 )

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1294, in sample_subjects_from_workflows(project, server_connection, db_connection, workflow_widget_checks, workflows_df, subjects_df)
   1290 drop_table(conn=db_connection, table_name="subjects")
   1292 if len(subjects_series) > 0:
   1293     # Fill or re-fill subjects table
-> 1294     populate_subjects(project, server_connection, db_connection, subjects_series)
   1295 else:
   1296     logging.error("No subjects to populate database from the workflows selected.")

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1147, in populate_subjects(project, server_connection, db_connection, subjects)
   1144     movies_df = movies_df.rename(columns={"id": "movie_id"})
   1146     # Reference the movienames with the id movies table
-> 1147     subjects = pd.merge(subjects, movies_df, how="left", on="filename")
   1149 if subjects["subject_type"].value_counts().idxmax() == "clip":
   1150     # Calculate the clip_end_time
   1151     subjects["clip_end_time"] = (
   1152         subjects["clip_start_time"] + subjects["clip_length"]
   1153     )

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:107, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     90 @Substitution("\nleft : DataFrame or named Series")
     91 @Appender(_merge_doc, indents=0)
     92 def merge(
   (...)
    105     validate: str | None = None,
    106 ) -> DataFrame:
--> 107     op = _MergeOperation(
    108         left,
    109         right,
    110         how=how,
    111         on=on,
    112         left_on=left_on,
    113         right_on=right_on,
    114         left_index=left_index,
    115         right_index=right_index,
    116         sort=sort,
    117         suffixes=suffixes,
    118         copy=copy,
    119         indicator=indicator,
    120         validate=validate,
    121     )
    122     return op.get_result()

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:700, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    693 self._cross = cross_col
    695 # note this function has side effects
    696 (
    697     self.left_join_keys,
    698     self.right_join_keys,
    699     self.join_names,
--> 700 ) = self._get_merge_keys()
    702 # validate the merge keys dtypes. We may need to coerce
    703 # to avoid incompatible dtypes
    704 self._maybe_coerce_merge_keys()

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:1110, in _MergeOperation._get_merge_keys(self)
   1108     right_keys.append(rk)
   1109 if lk is not None:
-> 1110     left_keys.append(left._get_label_or_level_values(lk))
   1111     join_names.append(lk)
   1112 else:
   1113     # work-around for merge_asof(left_index=True)

File /usr/local/lib/python3.8/dist-packages/pandas/core/generic.py:1848, in NDFrame._get_label_or_level_values(self, key, axis)
   1846     values = self.axes[axis].get_level_values(key)._values
   1847 else:
-> 1848     raise KeyError(key)
   1850 # Check for duplicates
   1851 if values.ndim > 1:

KeyError: 'filename'

Tutorial 8 Issue in Model_registry project

Input

# Save the name of the project
project = p_utils.find_project(project_name=project_name.value)

# Initiate pp
pp = ProjectProcessor(project)

Output:

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[80], line 5
      2 project = p_utils.find_project(project_name=project_name.value)
      4 # Initiate pp
----> 5 pp = ProjectProcessor(project)

File /usr/src/app/kso/kso_utils/kso_utils/project.py:64, in ProjectProcessor.__init__(self, project)
     61 self.connect_to_server()
     63 # Map initial csv files
---> 64 self.map_init_csv()
     66 # Create empty db and populate with local csv files data
     67 self.setup_db()

File /usr/src/app/kso/kso_utils/kso_utils/project.py:107, in ProjectProcessor.map_init_csv(self)
    105 # Create the folder to store the csv files if not exist
    106 if not os.path.exists(self.project.csv_folder):
--> 107     Path(self.project.csv_folder).mkdir(parents=True, exist_ok=True)
    108     # Recursively add permissions to folders created
    109     [
    110         os.chmod(root, 0o777)
    111         for root, dirs, files in os.walk(self.project.csv_folder)
    112     ]

File /usr/lib/python3.8/pathlib.py:1288, in Path.mkdir(self, mode, parents, exist_ok)
   1286     self._raise_closed()
   1287 try:
-> 1288     self._accessor.mkdir(self, mode)
   1289 except FileNotFoundError:
   1290     if not parents or self.parent == self:

OSError: [Errno 38] Function not implemented: 'None'

Reminder: Uncomment the if statement in the build-and-test-container

In PR #240, we commented out the if statement in the build-and-push, so that a new image gets build for dev every time. Since we want the code to be updated there now that Emil is using it to test things out. When Emil can work from master again, we need to uncomment this if statement again. This issue is just a reminder for that

Improve CI pipeline build docker and test

As described in the PR #235 of the docker file and the CI pipeline, the following logic is applied now:

  • Run the pipeline in "dev", "master", and PRs to "dev", and "master"
  • If any of the files we related to the container has changed, then rebuild it.
  • On a PR, this new image gets the tag of the current branch, in a push it gets the tag of what we push to (dev or master)
  • To fetch the correct image for the tests: If we're in PR, and the files changed, or we're in dev and master, then fetch current branch. Otherwise, the PR target
  • Run the tests unconditionally

This results in that the dev or master docker image only gets updated on a push, independent on if the tests pass/fail during that push. (That they will pas should be checked first in a PR).

So this is how it is now. Things that can be optimized are:

Notebook 5+6: Error while importing panoptes_client

When you try to run notebook 5 in Colab with a clean runtime, you get an error in the 2nd cell.

FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.8/dist-packages/urllib3-1.24.3.dist-info/METADATA'

This comes from: import kso_utils.tutorials_utils as t_utils
In which it tries to import: from panoptes_client import Project

If you look in the google colab files, it does indeed not contain this file. However, it does contain a different version:
/usr/local/lib/python3.8/dist-packages/urllib3-1.26.14.dist-info

The strange thing is, that when you run the command again: import kso_utils.tutorials_utils as t_utils
Then there is no problem and it just runs.
I do not understand why this is the case and I cannot find it online. Does anyone know this??

I do have 2 work arounds to prevent the error from occurring:

  1. Install from the requirements.txt file from the data-management repository. Then the error is not occurring, so probably the packages in that requirements list are better in agreement with each other. (This also brings up the question for me: Why are there 2 different repositories and 2 different requirements files? Or is this just due to historical reasons?)
  2. Import the panoptes_client in the tutorials_utils with a try except, since it can import it the second time

try:
from panoptes_client import Project
except:
from panoptes_client import Project

Rename main repo and update all links

We should come up with a new name for the repo, now that we only have 1 repo left. Then we should update the name everywhere

  • update the name in the ReadMe (also in the links to all images)
  • update the name in the link of the repo / docker image (not koster-yolo4 anymore)
  • update the name in the links in the Dockerfile
  • update the name in the Jupyter notebooks
  • use ghcr instead of dockerhub for image builds

Notebook 5 / training models gives error: run() got unexpected keyword argument batch_size

This issue is already solved and this is just a description of the error for documentation, since the error is not very clear and it was hard to find its cause.

Description of the error
When training a model in Notebook 5, with

mlp.train_yolov5(
exp_name.value,
weights.artifact_path,
epochs=epochs.value,
batch_size=batch_size.value,
img_size=(img_h.value, img_w.value),
)

which indirectly calls

yolov5.train()

you get the error from the image.

Image

This error complains about the batch_size argument, while the code that is printed shows that the batch_size argument exist. And also if you go into the val.py file from the yolov5 repository, this batch_size argument exists.

Trace back of the origin of the error and solution
This error occurs since the commit where the project.py was created in commit 7c0d287.
The error only occurs when yolov5.train(epochs=1) (or our code) is run after that the MLProjectProcessor class is created. Without this class, the code runs properly.
In this class, t6_utils gets imported, which on its turn does: import yolov5_tracker.track as track.
In that code, some paths are appended to the sys.path. As a result of this, when the validation.run() is called in the tolov5.train(), it does not run the val.py from yolov5, but it runs the val.py from the tracker.

This val.py from the tracker is never imported in the entire repository, so it is just not used. Therefore the solution to this problem is to delete the val.py in the tracker repository. This makes it possible to run Notebook 5 / train models without errors.

Remaining issue
One remaining issue is that you cannot train 2 models in the same notebook session. For now this will be mentioned in the notebook and people are instructed to simply restart the notebook.

Enable users to download raw and aggregated Zooniverse classifications in tut#8

Researchers from Spyfish Aotearoa would like to download the processed (i.e. json unnested) classifications from Zooniverse to analyse them in R.
This will be possible in tutorial https://github.com/ocean-data-factory-sweden/kso-data-management/pull/9, which is in a draft state at the moment.
Sections to be developed are in the checklist below

Widget to select date range of the classifications of interest
Process the classifications (i.e. unnest the json classifications) to have a label/species per row
Widget to select the columns users want to download

Dockerfile nvidia starting image

The dockerfile (the one that is for both repositories combined (data-management) and (object-detection)) currently first loads the nvidia cuda devel docker image to start with, in order to build the ffmpeg from scratch.

Then it starts over from a new image, copies the final installation of the ffmpeg and builds up the rest of the environment. In theory this should be possible to do with the runtime image, which has the advantage that it is smaller. However, when trying that, the dockerfile could not get through the builder test on github since it ran out of disk. This is the error that occurred:

Image

This error is resolved by using the devel image for a second time, instead of loading in this new image. However, now we end up with a larger image at the end (which is not a problem), but it is not the neatest solution. So this is something we can take a look at again in the future.

Create requirements.txt file for 1 env (works on local, SNIC, Colab)

Currently there are multiple requirement files in the 3 different repos and also 2 extra in the yolov5 and yolov5_tracker that we all pip install on separate lines. This causes that pip cannot manage that everything is compatible with each other. Therefor all the requirements should be installed on 1 line instead.

On top of that, our 3 requirement files contradict each other. The goal is now to remove these contradictions and final minimum combination of packages that makes everything work.

This should work with the same requirement files on Colab, SNIC and locally.

Add support for YOLOv8

Description:

YOLO has been upgraded and we risk deprecations causing issues down the line if we do not keep up. Yolov8 is now available for both the Ultralytics version of YOLO and the tracker submodule.

Fix testing GA workflow integration

The first version of testing has been implemented but needs to be improved by testing in the same environment and better managing the images that are created.

Improve table next to frame display

In the launch_viewer function when displaying the frames there is a table on the side but it's not very useful. It should have the name of the actual labels instead of the colors

Tutorial 3 issue with producing sample clips

Before submitting a bug report, please be aware that your issue must be reproducible with all of the following, otherwise it is non-actionable, and we can not help you:

If this is a custom dataset/training question you must include your train*.jpg, test*.jpg and results.png figures, or we can not help you. You can generate these with utils.plot_results().

๐Ÿ› Bug

Seems to be something that has not been defined.

To Reproduce (REQUIRED)

pp.generate_zoo_clips(
movie_name=pp.movie_selected,
movie_path=pp.movie_path,
is_example=True,
use_gpu=gpu_available.result,
)
Output:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
File ~/.local/lib/python3.8/site-packages/ipywidgets/widgets/interaction.py:257, in interactive.update(self, *args)
    255     value = widget.get_interact_value()
    256     self.kwargs[widget._kwarg] = value
--> 257 self.result = self.f(**self.kwargs)
    258 show_inline_matplotlib_plots()
    259 if self.auto_display and self.result is not None:

File /usr/src/app/kso/kso_utils/kso_utils/widgets.py:1182, in n_random_clips(clip_length, n_clips)
   1179 def n_random_clips(clip_length, n_clips):
   1180     # Create a list of starting points for n number of clips
   1181     # movie_df is currently missing here
-> 1182     duration_movie = int(movie_df["duration"].values[0])
   1183     starting_clips = random.sample(range(0, duration_movie, clip_length), n_clips)
   1185     # Seave the outputs in a dictionary

NameError: name 'movie_df' is not defined

Number of modifications:
0

image

Expected behavior

the expected behaviour is to be able to extract sample clips from the video

Update view_subject function to display subject frames in tut#8 straight from Zooniverse

We should standardise the "view_subject" function from tutorial_utils. Right now, if we want to display a clip subject, we retrieve the Zooniverse https link but if we want to display a frame, we extract the frame from the original movie (instead of getting the actual Zooniverse link to the frame). @jannes do you remember the reasoning for this approach?

I think the view_subject function should retrieve the zooniverse link for the subject. If that subject doesn't exist we could temporarily extract the images from the original movies but we should somehow let the user know that the subject doesn't exist in Zooniverse anymore and that the images displayed have been just retrieved from the original movie.

Image

Improve Docker Image

PR #240 introduces a hack to update the code in the master container. This works and is fast, but it does increase the size of the image every time it is done, since it adds an extra layer. But since we sometimes rebuild the whole image due to which all extra layers get reset, and we do not know how big 1 extra layer is, this maybe does not matter at all. To not use this hack but solve this rebuilding of the master docker image without rebuilding everything, we need some improvements in the Dockerfile, which are listed here below. These improvements are in general good improvements and can also be done while keeping this hack.
(Because this Hack might be quicker than the solution, since we have the container of the ffmpeg that is big and we need to rebuild all the time.)

Some improvements that can be made are:

  • Cache the ffmpeg part in image and push this so that we can simply pull in on a rebuild. Right now it has to rebuild this container every time. (See this on how to push 2 images in 1 docker file: https://docs.docker.com/engine/reference/commandline/build/#target)
  • Move the COPY from builder to the end, so that we can reuse the layer with all pip installations.
  • Move the COPY . ./kso to the end, and in the beginning only copy the requirements, so that the layer with all the pip install does not need to be rebuild on every code change.
  • split the run apt-get into 2: 1 with the run apt-get and 1 with the pip installs.
  • Look for another option than building the ffmpeg container ourselves, because or rebuilding, or pulling it in both takes more time than if we could just get it from somewhere else.

The environment variables can stay at the end, they are small and quick.

The goal is to get the COPY . ./kso as the last layer, so that when we pull and build an image, everything except for this copy layer can be reused. If we make that possible, then we can remove the Hack. (If it is equally fast as the hack)

Notebook 5 cannot read in the files from the ml-template-project correctly on a windows computer (or in google colab)

When you run Notebook 5 in Google Colab on a windows computer, the file names of the images contain Saโ• รชcken instead of sรคcken. This is because windows decodes the file names during the unzipping with CP437 instead of utf-8, which Linux does automatically. You can see that difference with the code below.

b'a\xcc\x88'.decode('CP437')
b'a\xcc\x88'.decode('utf-8')
This causes that there is no data at all available for the training of the model. WandB prints a warning for this during the training and evaluation, but does run.

Tutorial 8 spyfish B

๐Ÿ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

# Save the name of the project
project = p_utils.find_project(project_name=project_name.value)

# Initiate pp
pp = ProjectProcessor(project)

Output:

FileNotFoundError                         Traceback (most recent call last)
File /usr/lib/python3.8/pathlib.py:1288, in Path.mkdir(self, mode, parents, exist_ok)
   1287 try:
-> 1288     self._accessor.mkdir(self, mode)
   1289 except FileNotFoundError:

FileNotFoundError: [Errno 2] No such file or directory: 'E:/SpyFish/BOPRC sites'

During handling of the above exception, another exception occurred:

FileNotFoundError                         Traceback (most recent call last)
File /usr/lib/python3.8/pathlib.py:1288, in Path.mkdir(self, mode, parents, exist_ok)
   1287 try:
-> 1288     self._accessor.mkdir(self, mode)
   1289 except FileNotFoundError:

FileNotFoundError: [Errno 2] No such file or directory: 'E:/SpyFish'

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
Cell In[79], line 5
      2 project = p_utils.find_project(project_name=project_name.value)
      4 # Initiate pp
----> 5 pp = ProjectProcessor(project)

File /usr/src/app/kso/kso_utils/kso_utils/project.py:64, in ProjectProcessor.__init__(self, project)
     61 self.connect_to_server()
     63 # Map initial csv files
---> 64 self.map_init_csv()
     66 # Create empty db and populate with local csv files data
     67 self.setup_db()

File /usr/src/app/kso/kso_utils/kso_utils/project.py:107, in ProjectProcessor.map_init_csv(self)
    105 # Create the folder to store the csv files if not exist
    106 if not os.path.exists(self.project.csv_folder):
--> 107     Path(self.project.csv_folder).mkdir(parents=True, exist_ok=True)
    108     # Recursively add permissions to folders created
    109     [
    110         os.chmod(root, 0o777)
    111         for root, dirs, files in os.walk(self.project.csv_folder)
    112     ]

File /usr/lib/python3.8/pathlib.py:1292, in Path.mkdir(self, mode, parents, exist_ok)
   1290     if not parents or self.parent == self:
   1291         raise
-> 1292     self.parent.mkdir(parents=True, exist_ok=True)
   1293     self.mkdir(mode, parents=False, exist_ok=exist_ok)
   1294 except OSError:
   1295     # Cannot rely on checking for EEXIST, since the operating system
   1296     # could give priority to other errors like EACCES or EROFS

File /usr/lib/python3.8/pathlib.py:1292, in Path.mkdir(self, mode, parents, exist_ok)
   1290     if not parents or self.parent == self:
   1291         raise
-> 1292     self.parent.mkdir(parents=True, exist_ok=True)
   1293     self.mkdir(mode, parents=False, exist_ok=exist_ok)
   1294 except OSError:
   1295     # Cannot rely on checking for EEXIST, since the operating system
   1296     # could give priority to other errors like EACCES or EROFS

File /usr/lib/python3.8/pathlib.py:1288, in Path.mkdir(self, mode, parents, exist_ok)
   1286     self._raise_closed()
   1287 try:
-> 1288     self._accessor.mkdir(self, mode)
   1289 except FileNotFoundError:
   1290     if not parents or self.parent == self:

OSError: [Errno 38] Function not implemented: 'E:'

Expected behavior

Environment

If applicable, add screenshots to help explain your problem.

  • OS: [e.g. Ubuntu]
  • GPU [e.g. 2080 Ti]

Additional context

Add any other context about the problem here.

Re-organise releases of stable versions

Currently (after PR #235) we are using the master branch as the 'stable' version. That is also why currently there is a new docker image created, every time we push to master. (The docker file contains the code that the users are using on SNIC. it does not only contain the requirements, but also the actual code) Because of the use of master as a stable version, we should not push changes to often to this branch.

However, we can in the future maybe move towards having new stable releases with tags. In that case, we only need to build a new docker image when we have such a new release. (It is desired to not build and push a new docker image too often, since it is quite big). And then we can more freely push new changes to master.

We should discuss what we think to be a good way forward.

For development, we can use the dev docker-image on SNIC and our own mounted clone of the code.

Tutorial 3 upload clips issue

๐Ÿ› Bug

A clear and concise description of what the bug is.

To Reproduce (REQUIRED)

Input:

pp.upload_zoo_subjects("clip")

Output:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 pp.upload_zoo_subjects("clip")

File /usr/src/app/kso/kso_utils/kso_utils/project.py:692, in ProjectProcessor.upload_zoo_subjects(self, subject_type)
    684 """
    685 This function uploads clips or frames to Zooniverse, depending on the subject_type argument
    686 
   (...)
    689 :type subject_type: str
    690 """
    691 if subject_type == "clip":
--> 692     upload_df, sitename, created_on = zoo_utils.set_zoo_clip_metadata(
    693         project=self.project,
    694         generated_clipsdf=self.generated_clips,
    695         sitesdf=self.local_sites_csv,
    696         moviesdf=self.local_movies_csv,
    697     )
    698     zoo_utils.upload_clips_to_zooniverse(
    699         project=self.project,
    700         upload_to_zoo=upload_df,
    701         sitename=sitename,
    702         created_on=created_on,
    703     )
    704     # Clean up subjects after upload

File /usr/src/app/kso/kso_utils/kso_utils/zooniverse_utils.py:1303, in set_zoo_clip_metadata(project, generated_clipsdf, sitesdf, moviesdf)
   1301 # Combine site info to the generated_clips df
   1302 if "site_id" in generated_clipsdf.columns:
-> 1303     upload_to_zoo = generated_clipsdf.merge(sitesdf, on="site_id")
   1304     sitename = upload_to_zoo["#siteName"].unique()[0]
   1305 else:

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:9329, in DataFrame.merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
   9310 @Substitution("")
   9311 @Appender(_merge_doc, indents=2)
   9312 def merge(
   (...)
   9325     validate: str | None = None,
   9326 ) -> DataFrame:
   9327     from pandas.core.reshape.merge import merge
-> 9329     return merge(
   9330         self,
   9331         right,
   9332         how=how,
   9333         on=on,
   9334         left_on=left_on,
   9335         right_on=right_on,
   9336         left_index=left_index,
   9337         right_index=right_index,
   9338         sort=sort,
   9339         suffixes=suffixes,
   9340         copy=copy,
   9341         indicator=indicator,
   9342         validate=validate,
   9343     )

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:107, in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
     90 @Substitution("\nleft : DataFrame or named Series")
     91 @Appender(_merge_doc, indents=0)
     92 def merge(
   (...)
    105     validate: str | None = None,
    106 ) -> DataFrame:
--> 107     op = _MergeOperation(
    108         left,
    109         right,
    110         how=how,
    111         on=on,
    112         left_on=left_on,
    113         right_on=right_on,
    114         left_index=left_index,
    115         right_index=right_index,
    116         sort=sort,
    117         suffixes=suffixes,
    118         copy=copy,
    119         indicator=indicator,
    120         validate=validate,
    121     )
    122     return op.get_result()

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:704, in _MergeOperation.__init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
    696 (
    697     self.left_join_keys,
    698     self.right_join_keys,
    699     self.join_names,
    700 ) = self._get_merge_keys()
    702 # validate the merge keys dtypes. We may need to coerce
    703 # to avoid incompatible dtypes
--> 704 self._maybe_coerce_merge_keys()
    706 # If argument passed to validate,
    707 # check if columns specified as unique
    708 # are in fact unique.
    709 if validate is not None:

File /usr/local/lib/python3.8/dist-packages/pandas/core/reshape/merge.py:1257, in _MergeOperation._maybe_coerce_merge_keys(self)
   1251     # unless we are merging non-string-like with string-like
   1252     elif (
   1253         inferred_left in string_types and inferred_right not in string_types
   1254     ) or (
   1255         inferred_right in string_types and inferred_left not in string_types
   1256     ):
-> 1257         raise ValueError(msg)
   1259 # datetimelikes must match exactly
   1260 elif needs_i8_conversion(lk.dtype) and not needs_i8_conversion(rk.dtype):

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

Expected behavior

Uploading of created clips, error might be due to large file size?

Environment

image
image
image
image

Combine dm and ml repos

  • Unification of Dockerfile for single image build
  • Unification of ReadME files for both repos
  • Transfer notebooks from dm repo to ml repo
  • Change the jupyter.sh files on SNIC, we now only need 1

Notebook 4+8: Generalize workflow and species selection

To select the species and aggregation factors is done in a different order in Notebook 4 and 8, and they always display all options that do not give any annotations. The idea is to make it the same for both notebooks, and filter the options first on if there are annotations available or not. This can be done on zoo_info_dict. I am working on this.

Google Colab package dependency error

While working on issue #191, colab gives the following error during the instalation of all the packages.

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. google-colab 1.0.0 requires ipython==7.34.0, but you have ipython 8.11.0 which is incompatible. google-colab 1.0.0 requires pandas==1.5.3, but you have pandas 1.4.0 which is incompatible.

This comes from the code where all requirements are stated in 1 line. The code to reproduce this error can be found in commit:
778aaadc18076834617aa53e2636db432723ce58

Since we currently only use google-colab to clear the output and this still works. And since the pip list shows our versions of the packages (8.11.0 and 1.4.0), we will ignore this error for now. But it is something to keep in mind if something does not work in the future on google colab.

Update the workflow names

The Zooniverse workflows from the KSO project are not very informative. We should rename them with the name of the project at the beginning (e.g. SGU_mussel_detection)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.