patball1 / detectree2 Goto Github PK

View Code? Open in Web Editor NEW

154.0 154.0 38.0 155.01 MB

Python package for automatic tree crown delineation based on the Detectron2 implementation of Mask R-CNN

Home Page: https://patball1.github.io/detectree2/

License: MIT License

Shell 0.01% Makefile 0.02% Python 0.55% Jupyter Notebook 99.15% R 0.27% Dockerfile 0.01%

deep-learning detectron2 python pytorch

detectree2's People

Contributors

Stargazers

Watchers

detectree2's Issues

tutorial.html

"If you would just like to make predictions on an orthomosaic with a pre-trained model from the model_garden, skip to part 4."

2.3 and 2.4 are missing from this document. When will they be updated?

Do I need to run this file now if I want to use the pre-trained model provided by the author for image prediction ?

forest-modelling-detectree-v4.ipynb

Thanks again to the author's intention to answer, I wish you a happy life!

Westerly geo predictions offset from imagery

Why do some predictions at the west of a orthomosaic become offset when reprojected into geo coordinates?

Implementation of examples demonstrating end-to-end use of detectree2.

Tiling, training, prediction on different datasets.

Workflow dependencies

On ubuntu-latest (20.04)
Rasterio errors (fixed by updating to 1.3.0 (from 1.2.10)

GDAL errors
Workflow attempts to build wheel (latest version 3.5.1) for python3.10. Can't find a way to search for pre-existing GDAL wheels. There may be wheels available for older GDAL versions?
error: ‘GDT_UInt64’ undeclared (first use in this function); did you mean ‘GDT_UInt32’?
Have installed all development headers.

On ubuntu-22.04
Commands fail:
sudo add-apt-repository -y ppa:ubuntugis/ppa

Add new rpn layer and proprocessing

Add tests

Create a suite of tests:

Unit testing:
Test small components of detectree2:

Regression testing:

CI using github actions:

linting / style etc.
build tests (conda environment, pip)
use pre-built wheel / tar.gz in github actions using artifacts. (mentioned in docs issue)
run prediction on CPU
run prediction on GPU (not currently supported on github actions - is it possible to set up a runner to use CSD3 resources instead?)

TypeError: 'float' object is not subscriptable during training

Hello,

I would like to include four locations for training my model.

Error

Here is the error code:

Traceback (most recent call last):
  File "driver.py", line 75, in <module>
    trainer = MyTrainer(cfg, patience=4)
  File "/home/nieding/detectree2/detectree2/models/train.py", line 182, in __init__
    super().__init__(cfg)
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 378, in __init__
    data_loader = self.build_train_loader(cfg)
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 547, in build_train_loader
    return build_detection_train_loader(cfg)
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/config/config.py", line 207, in wrapped
    explicit_args = _get_args_from_config(from_config, *args, **kwargs)
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/config/config.py", line 245, in _get_args_from_config
    ret = from_config_func(*args, **kwargs)
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/data/build.py", line 344, in _train_loader_from_config
    dataset = get_detection_dataset_dicts(
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/data/build.py", line 241, in get_detection_dataset_dicts
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/data/build.py", line 241, in <listcomp>
    dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in names]
  File "/home/nieding/mambaforge/envs/detectree2/lib/python3.8/site-packages/detectron2/data/catalog.py", line 59, in get
    return f()
  File "/home/nieding/detectree2/detectree2/models/train.py", line 489, in <lambda>
    DatasetCatalog.register(name + "_" + d, lambda d=d: combine_dicts(train_location,
  File "/home/nieding/detectree2/detectree2/models/train.py", line 443, in combine_dicts
    tree_dicts += get_tree_dicts(d, classes=classes, classes_at=classes_at)
  File "/home/nieding/detectree2/detectree2/models/train.py", line 388, in get_tree_dicts
    px = [a[0] for a in anno["coordinates"][0]]
  File "/home/nieding/detectree2/detectree2/models/train.py", line 388, in <listcomp>
    px = [a[0] for a in anno["coordinates"][0]]
TypeError: 'float' object is not subscriptable

Code snippet

The following code describes how I register the datasets:

# register hain
train_location = '../data/Bamberg_Hain/training/training_tiles/train/'
register_train_data(train_location, 'Bamberg_Hain', 1) # registers train and val sets
print("Hain datasets registered")

# register stadtwald
train_location = '../data/Bamberg_Stadtwald/training/training_tiles/train/'
register_train_data(train_location, 'Bamberg_Stadtwald', 1) # registers train and val sets
print("Stadtwald datasets registered")

[...]

# set models
base_model = "COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml" #with api
pre_trained_model = site_folder + 'models/220723_withParacouUAV.pth'

# registered sets
# registered sets
trains = ("Bamberg_Hain_train", "Bamberg_Stadtwald_train" "Tretzendorf_train", "Schiefer_train")
tests = ("Bamberg_Hain_val", "Bamberg_Stadtwald_val" "Tretzendorf_val", "Schiefer_val")

Update:

Ok with a fresh mind, I found that the error message was not misleading and that a sudden float object was the reason for the failure.

Automatically correct invalid geometries from predictions

Invalid polygon geometries can cause errors in evaluation

Investigate whether installing detectron2 is necessary

It is possible to use a minimal set of detectron2 features without installing the entire library. facebookresearch/detectron2#3909

Improve code style across codebase.

Adopt numpy docstring convention:
https://numpydoc.readthedocs.io/en/latest/format.html
static type checking with mypy
Write contribution guide.

Add tutorial / worked examples

pytorch-lightning 1.8.1 requires PyYAML>=5.4, but you have pyyaml 5.1 which is incompatible.

Hi there, I appreciate the gorgeous work built by the authors.
I faced an error that occurred when I used pip to install detectree2.

Environment: Ubuntu20.04 with Conda
Here is my code: pip install git+https://github.com/PatBall1/detectree2.git
ERROR: pytorch-lightning 1.8.1 requires PyYAML>=5.4, but you have pyyaml 5.1 which is incompatible.

Can somebody provide any suggestions? Thanks.

Understand poor generalisation across image scales

Models do not generalise well across image scales. Can this be fixed by adjusting the data augmentation parameters?

detectree2/detectree2/models/train.py

Lines 155 to 180 in be81a18

 def build_train_loader(self, cls, cfg): 

 """Summary_. 

  Args: 

  cfg (_type_): _description_ 

  Returns: 

  _type_: _description_ 

  """ 

 return build_detection_train_loader( 

 cfg, 

 mapper=DatasetMapper( 

 cfg, 

 is_train=True, 

 augmentations=[ 

 T.Resize((800, 800)), 

 T.RandomBrightness(0.8, 1.8), 

 T.RandomContrast(0.6, 1.3), 

 T.RandomSaturation(0.8, 1.4), 

 T.RandomRotation(angle=[90, 90], expand=False), 

 T.RandomLighting(0.7), 

 T.RandomFlip(prob=0.4, horizontal=True, vertical=False), 

 T.RandomFlip(prob=0.4, horizontal=False, vertical=True), 

 ], 

 ), 

 )

Move models from `model_garden` to hugging face or equivalent to reduce bandwidth costs

Currently hitting monthly github LFS bandwidth limits

Host demo data/models using Git LFS

https://docs.github.com/en/repositories/working-with-files/managing-large-files

`setup_cfg` output directory not practical

It needs to only create an output directory when training is being done - not necessary for landscape level predictions

Multiclass training/prediction

Upgrade system so that multiple classes can be predicted rather than the simple single class delineation ('tree')

deal with empty geometries in `clean_crowns`

Currently throws an error - maybe deal with in stitch_crowns?

Failure to install on 64-bit Windows 10 system, and Python version 3.8.10

Collecting rasterio==1.3a3
Using cached rasterio-1.3a3.tar.gz (401 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\xxxx\AppData\Local\Programs\Python\Python38\python.exe' 'C:\Users\xxxx\AppData\Local\Programs\Python\Python38\lib\site-packages\pip_vendor\pep517\in_process_in_process.py' get_requires_for_build_wheel 'C:\Users\xxxx\AppData\Local\Temp\tmprw8grws6'
cwd: C:\Users\xxxx\AppData\Local\Temp\pip-install-ozyk4_bv\rasterio_97d0b87c61a740218c42e09690312378
Complete output (2 lines):
INFO:root:Building on Windows requires extra options to setup.py to locate needed GDAL files. More information is available in the README.
ERROR: A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.

Distribute on conda-forge or pypi

To enable easy use of detectree2 we should consider distributing on conda-forge or PyPI.

Packaging on PyPI is made difficult as detectron2 is not available there.

Consider maintaining detectron2 release for detectree2.

outputs.stitch_crowns is registering the same CRS as being different

Hi there, am trying to stitch predictions together and am getting the following error:

ValueError: Cannot determine common CRS for concatenation inputs, got ['WGS 84 / UTM zone 30N', 'WGS 84 / UTM zone 30N']. Use to_crs() to transform geometries to the same CRS before merging.

I have attempted to force a different crs onto the predictions using a for_loop (shown below but error persists). The two "different" crs's shown are identical!

specify the input and output folders

input_folder = tiles_path + "predictions_geo/"
output_folder = tiles_path + "predictions_geo_crs32630/"

specify the target CRS

target_crs = 'EPSG:32630'

loop through all files in the input folder

for filename in os.listdir(input_folder):
# construct the full file path for the input and output files
input_filepath = os.path.join(input_folder, filename)
output_filepath = os.path.join(output_folder, filename)

# read the file using geopandas
gdf = gpd.read_file(input_filepath)

# apply the to_crs() function to the file
gdf_reprojected = gdf.to_crs(target_crs)

# write the reprojected file to the output folder
gdf_reprojected.to_file(output_filepath)

Environment, dependency management and packaging repo.

We need to choose a environment management tool (virtual env, conda, pipenv), package dependency resolver (conda, pipenv, poetry), and package repository (PyPI, anaconda, etc..).

Currently pip install git+https://github.com/PatBall1/detectree2.git does not work on clusters or systems without the GDAL headers installed (pip install GDAL). We can install GDAL headers using package managers (apt, yum etc) or load modules on clusters (module load GDAL), but there is no guarantee that the location is always in the same place. A typical ubuntu install looks like:

sudo add-apt-repository ppa:ubuntugis/ppa
sudo apt update
sudo apt install libgdal-dev
sudo apt install gdal-bin
export CPLUS_INCLUDE_PATH=/usr/include/gdal
export C_INCLUDE_PATH=/usr/include/gdal
pip install GDAL

Requiring users to manually specify the GDAL location; it might, however, be possible to install development headers from source and include as part of the pip build process?

Alternatively, we can obtain all detectree2 dependencies with Conda (detectron2, GDAL and openCV etc), and then package to conda forge. The environment is specified in an environment.yaml file. The environment is easily reproducible due to the conda-lock file which ensures that dependencies are transitively pinned. Packaging is done using conda-build (meta.yml file), i.e. https://github.com/conda-forge/staged-recipes with conda-build tutorial: https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/building-conda-packages.html. The distribution (archived package) (.whl / .tar) will sit on conda-forge. More on the merits of using Conda here: https://pythonspeed.com/articles/conda-dependency-management/

As another alternative, it is possible to combine Conda with Poetry as shown here: https://stackoverflow.com/questions/70851048/does-it-make-sense-to-use-conda-poetry

Poetry makes packaging easy, and dependency management (using pyproject.toml) is faster than Conda. https://www.youtube.com/watch?v=QX_Nhu1zhlg&t=676s

Neaten packaging by either adopting Conda or investigate Poetry as an alternative.

Publish trained model

Make crs (epsg) handling automatic and consistent

pip install detectree2

when I use pip install detectree2 ，it ERROR:

ERROR: Could not find a version that satisfies the requirement detectron2 (from detectree2) (from versions: none)
ERROR: No matching distribution found for detectron2

Add checks to run on CPU if GPU is unavailable

Modify integration test to run on GPU if available.

Transfer learning with detectree2

Hi guys,
I am considering whether we can use the pre-trained model like "220723_withParacouUAV.pth" to train with local datasets continuously.
I had tried the idea but got lots of bugs that I couldn't fix. Did anyone try and have better solutions?

cannot import model_zoo

ModuleNotFoundError Traceback (most recent call last)
in
10 from PIL import Image
11 from pathlib import Path
---> 12 from detectron2 import model_zoo
13 from detectron2.engine import DefaultPredictor, DefaultTrainer
14 from detectron2.config import get_cfg

9 frames
/usr/local/lib/python3.7/dist-packages/detectron2/utils/tracing.py in
2 from typing import Union
3 import torch
----> 4 from torch.fx._symbolic_trace import _orig_module_call
5 from torch.fx._symbolic_trace import is_fx_tracing as is_fx_tracing_current
6

ModuleNotFoundError: No module named 'torch.fx._symbolic_trace'

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

github repo size has become bloated due to notebooks

Sphinx documentation

Sphinx

Commenting
Type hinting
Docstrings

Unnecessary install of openCV in workflows + docker?

OpenCV is required for integration test.

flake8 reports that `tile_data_train` function is too complex (C901)

https://www.flake8rules.com/rules/C901.html

Modify CI to test detectree2 on python 3.7, 3.8, 3.9 and 3.10.

Error arises with mypy syntax in train.py get_tree_dicts function. Fix is to use typing package predefined types.

Add further tests in CI.

Dataclasses for F1_score / evaluation.py

Would Record class be better implemented as a dataclass?

Rework tiling

Tiling function needs to be broken down into smaller functions. Introduce pathlib.

Automatically update cookiecutter project tree in README.md

Adopt pathlib or os.path.join across project

Still a few places where directory paths are being composed by concatenating using '+'. This is an anti-pattern and should be eliminated.

Error downloading model_garden

Following error on attempted install

Downloading model_garden/220723_withParacouUAV.pth (503 MB)
Error downloading object: model_garden/220723_withParacouUAV.pth (b2fc7ff): Smudge error: Error downloading model_garden/220723_withParacouUAV.pth (b2fc7ff3006b429ddbae947ead4149efb74370df7a5686de0af8573b13f62537): batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access

GPU utilisation during training.

Validation step takes up most time during training, and GPU looks under utilised.

New training with early stopping fails for multiple training sets @JB/dev

Rasterio 1.3a3 unable to build wheel

I can install rasterio using conda install -c conda-forge rasterio.

The setup file does not recognize it as being installed and attempts to use rasterio 1.3a3 and cannot build the associated wheel.

Running windows using pip install git+url instructions

Collecting rasterio==1.3a3
Using cached rasterio-1.3a3.tar.gz (401 kB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [2 lines of output]
INFO:root:Building on Windows requires extra options to setup.py to locate needed GDAL files. More information is available in the README.
ERROR: A GDAL API version must be specified. Provide a path to gdal-config using a GDAL_CONFIG environment variable or use a GDAL_VERSION environment variable.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

Fix test loader - why does it select just the first test

Does testing look at all test data when this has [0]??

detectree2/detectree2/models/train.py

Line 149 in 08851c8

self.cfg, self.cfg.DATASETS.TEST[0], DatasetMapper(self.cfg, True)

Tiling step - Malaysia dataset

detectree2/detectree2/preprocessing/tiling.py

Line 293 in 463e260

"dtype": "uint8", # this causes issue - comment out for the Malaysia data

CONTRIBUTING.md and sphinx docs

Write a first attempt at contributing guide

Suggestions to improve clarity of tutorial

check for missing function calls - project_to_geojson, stitch_crowns, clean_crowns
clarify how to connect with personal google drive
clarify Confidence_scores and how filtering should be applied as a post-processing step
note to avoid redownloading models as this impacts bandwidth
add illustrations of training data, outputs etc
explain the early stopping mechanism in more detail

Rename branch master to main

A procedure exists for doing this. Admin privileges are required.

`clean_crowns` sometimes outputs a pandas dataframe rather than geopandas GeoDataFrame

Not sure why but a conversion after clean_crowns is sometimes necessary

Investigate Jupyterbook

https://bempp.com/handbook/intro.html

`get_filenames` concatenates two absolute paths

The get_filenames() function of detectree2.models.train uses glob method to filter the files with a .png extension in the directory given as parameter. The glob method returns the absolute path of all the files. Then, the function uses os.path.join() to concatenate the directory and the glob result; hence the result is a non-existent path.

For instance, if the following chunk is run:

in_dir = Path("drive/Shareddrives/detectree2_Cambridge/data/Cambridge/tiles_0.25m_160_20_0_samp")
tiles_dir = in_dir / 'tiles'

files = get_filenames(str(tiles_dir) + '/')

This is the result:

[{'file_name': 'drive/Shareddrives/detectree2_Cambridge/data/Cambridge/tiles_0.25m_160_20_0_samp/drive/Shareddrives/detectree2_Cambridge/data/Cambridge/tiles_0.25m_160_20_0_samp/160_20_0CityCentre_2017_32630_712515_5789228_160_20_32630.png'},
 {'file_name': 'drive/Shareddrives/detectree2_Cambridge/data/Cambridge/tiles_0.25m_160_20_0_samp/drive/Shareddrives/detectree2_Cambridge/data/Cambridge/tiles_0.25m_160_20_0_samp/160_20_0CityCentre_2017_32630_713155_5788908_160_20_32630.png'}]

So the path is duplicated, which causes the predict_on_data() function to give NULL results because it couldn't read the png files.

Geopackage not displaying geometry on QGIS

After stitching a set of predictions with the following lines of code, I create a geopackage file containing the polygons of the predictions. Still, when uploading the file in QGIS, the polygons are not present, even though they can be visualised as a geopandas data frame in Python. The attributes are present once uploaded in QGIS but cannot be seen on the map.

crowns_final = stitch_crowns(test_pred_geo_folder, 1)
crowns_final = crowns_final[crowns_final.is_valid]
crowns_final = clean_crowns(crowns_final, 0.6)
crowns_final = crowns_final[crowns_final["Confidence_score"] > 0.5]
crowns_final = crowns_final.set_geometry('geometry')
crowns_final.to_file(train_out_dir + f"test_crowns_out.gpkg")

test_pred_geo_folder and train_out_dir are predefined variables pointing to existing folders.

	def build_train_loader(self, cls, cfg):
	"""Summary_.

	Args:
	cfg (_type_): _description_

	Returns:
	_type_: _description_
	"""
	return build_detection_train_loader(
	cfg,
	mapper=DatasetMapper(
	cfg,
	is_train=True,
	augmentations=[
	T.Resize((800, 800)),
	T.RandomBrightness(0.8, 1.8),
	T.RandomContrast(0.6, 1.3),
	T.RandomSaturation(0.8, 1.4),
	T.RandomRotation(angle=[90, 90], expand=False),
	T.RandomLighting(0.7),
	T.RandomFlip(prob=0.4, horizontal=True, vertical=False),
	T.RandomFlip(prob=0.4, horizontal=False, vertical=True),
	],
	),
	)