Python 4.03% Jupyter Notebook 71.02% HTML 24.95%

detection_nappe_hydrocarbures_imt_cefrem's People

Contributors

Watchers

detection_nappe_hydrocarbures_imt_cefrem's Issues

Resolution of the images

cf folder data_out/2021-06-08_15h41min22s_ and folder data_out/2021-06-08_15h47min38s_

Always the same global distribution:

Oil discharge statistics

Statistics on:

Max length (px)
Min length (px)
Area (px)

Get the resolution of rasterio transform

When opened with

with rasterio.open(path) as file_object:
    xres = file_object.transform[0]
    yres = file_object.transform[-4]

what is the unit of xres and yres ?

--> .... per px

RGB_overlay too slow

Use numba
Use cython ?? -> readability ?

Rasterio and gdal installation

To be able to read images.

For windows users, follow the following steps :

download GDAL-3.2.3-cp37-cp37m-win_amd64.whl rasterio on this website installation file and rasterio-1.2.3-cp37-cp37m-win_amd64.whl gdal installation file on this website
with a python 3.7 conda environment
pip install GDAL-3.2.3-cp37-cp37m-win_amd64.whl
rasterio-1.2.3-cp37-cp37m-win_amd64.whl

class values interverted from SegmentationDataset with cache

We will vizualize the 027481_0319CB_0EB7 annotation on matplotlib and on qgis

Violet = 0 ; Green = 1 ; Yellow = 2

We find that
Green = Seep = 1
Yellow = Spill = 2

User code usage

Generate patch with preprocessing
test it on the model

Question:

get image cache

1 image, annotation, taille -> ensemble des patchs, segmentations, classification
nouvelle base de donnée faire des patchs et refaire unet

Découpe,apprentissage,predict

Passer en paramètre

supply custom dataset as object with methods

Create a new hdf5 with augmented patches

Main ideas:

Problems:
Currently f4290b7
:

generating patches is too slow (especially loading large images into ram memory.
GPU is largelly underused
CPU is overloaded by large arrays preprocessing
Training on 100 epochs takes around 1 day 10h (with the risk of memory error if the ram is also used by other programs)

Solution:
Put a balanced augmented dataset patches of 1000 px width (square).

Problem:
We would like to have rotated source images on which we compute patches.
Too much memory required

Solution:
Inverse transformation

Balance the dataset

We want to provide a globally equal number of images to the model with seep and spill than with nothing on the patch.

Problem: to provide data to the model we iterate over a predefined ensemble of images.

One easy but impracticable solution:
Add a file that stores the classes of each patch.
Problem: fix the patches:

cannot make global rotation augmentations of the original image
fix the size of the patches

Composite stats of seep_spill_patches and other_patches

$mean = \frac{\sum k}{n}$

$mean_{seepSpill} = \frac{\sum k_1}{n_1}$

with r the ratio > 1 image seepSpill image other

$mean_{other} = \frac{\sum k_2}{n_2}$

$\frac{\frac{mean_{seepSpill}}{n_2}+\frac{mean_{other}}{rn_2}}{rn_1+n_2}rn_1n_2 = \frac{r\sum k_1 + \sum k_2}{rn_1+n_2}$

Test read speed of the raster images versus hdf5 file

What is the time required to open one raster image versus to open it if it is in a hdf5 file ?

General architecture

For commit feef35f , general architecture:

Choose an order to process patches of the same image

Problem: 1 image = x patches. How to reduce memory and computations consumption ?

Note: patches are generated on the fly at commit cadd357

Resize the image

Do we need interpolation ? If yes which one ?

Error of calibration of the annotations ?

The annotations seems to be misplaced on qgis:

To communicate between processes, pickle is used by pytorch. It transmits data to other python instances by pickling the data and operations. Concretely it converts data into string and pass them to the process.
💣 cannot be changed ; 🔍 could maybe be optimized with tradeoffs ;
Problems:

🔍 Today multiple objects transforms the data depending of the option: it allows code rusability and split functionnalities. But it complexify the work of multiprocessing by adding multiple stateful objects to pickle.
💣 The hdf5 file cannot be pickled (tthey are references of datasets of the hdf5 file)
💣 It is not a good idea to pickle the data from the hdf5 file (too much memory used)
🔍 We have today only one hdf5 file and so it can limit the number of paralllel reading
One hdf5 file allows better compression even if maybe splitting it in half or four part can be possible memory wise. However, it will increase the complexity of the code because we would have to indicate in which file to get the image.

Affine transformations create class artifacts

Second cause of problems

Interpolation on labels cause class shifts:

On this image at the origin there was probably only the brightest class of value 2, but with interpolation (due to augmentations), the class 1 has been added (darker gray pixels)

Extract th captured image: transform matrix problem

Make the inverse transform:

Exemple:
Transform matrix:
M = [[0.00017966305682390432, -0.0, 23.70719634263147],
[-0.0, -0.00017966305682390432, 40.38637265914473],
[0.0, 0.0, 1.0 ]]

Image input shape (with margins): (10600, 18441)

Cut the last line (as there are no 3rd dimension in the input)
Point (0,0) is mapped to (0,0)
Point (10599, 18440) is mapped to ( 1.90424874e+00, -3.31298677e+00, 9.95997286e+05)
Point (10599, 0) is mapped to [1.90424874e+00, 0.00000000e+00, 2.51272574e+05]
Point (0, 18440) is mapped to [ 0.00000000e+00, -3.31298677e+00, 7.44724712e+05]

Transform raster images into numpy arrays

Cut the image into patches

Problem to avoid: too much oil discharges are cut due to the patches.

Pass dataset as parameter

Fucntionnality required:

ClassificationCache:

hdf5 file access (image and annotation)
-> problem of point annotations
access to json info file
access to TwoWayDict

ClassificationPatch

access to TwoWayDict
hdf5 access without numpy conversion

Show the parameters and the results

Potential solutions

Solution	👍	👎
Tensorboard	Already ready	Curves cannot be used directly in papers (too small axes, curves, values)
Dash+plotly	Graph easy to do	Interface to make. Potentially no regex filtering
Vuejs+node	Entirely personnalized	Time consuming

Transform into real factories

Confusion matrix for valid batch

html code example

<!DOCTYPE html>
<html lang="en">

<meta charset="UTF-8">
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.8/jquery.min.js"></script>

<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <style>
        body {
            font-family: Arial, Helvetica, sans-serif;
            background-color: transparent;
            text-align: center;
        }

        table {
            overflow: hidden;
        }

        tr:hover {
            background-color: rgba(0, 0, 0, 0.5);
        }

        td,
        th {
            position: relative;
            padding: 2em;
            text-align: center;
            background-color: transparent;
        }

        td:hover::after{
            content: "";
            position: absolute;
            background-color: rgba(0, 0, 0, 0.5);
            left: 0;
            top: -5000px;
            height: 10000px;
            width: 100%;
            z-index: -1;
        }
        thead tr:last-child th:last-child, tbody th:last-child, tbody tr:last-child th {
            background-color: #ff9a03;
        }
        thead tr:last-child th div,tbody tr:last-child th:nth-child(2) div,tbody tr:last-child th:last-child div {
            font-weight: bold;
            color:white
        }
        thead tr:last-child th:last-child {
            border-top: 1px solid white;
            border-left: 1px solid white;
        }
        tbody tr:last-child th:nth-child(2) {
            border-left: 1px solid white;

        }
        tbody tr th:last-child {
            border-left: 1px solid white;
        }
        tbody tr:last-child th {
            border-top: 1px solid white;
            border-left: none;

        }
        thead th, tr th:nth-child(1),tr th:nth-child(2), tbody tr:last-child th:first-child {
            color:white;
            background-color: #0366FF;
            border: none
        }
        tbody tr:last-child th:last-child {
            border-top: 1px solid white;
            border-left: 1px solid white;
        }
    </style>
</head>

<body>
    <table cellspacing="0" cellpadding="0">
        <thead>
            
            <tr>
                <th></th>
                <th></th>
                <th colspan="3"> <div>True classes</div> </th>
                <th></th>
            </tr>
            <tr>
                <th></th>
                <th></th>
                <th>Class1</th>
                <th>Class2</th>
                <th>Class3</th>
                <th><div>Totals <br> predictions</div></th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <th rowspan="3" style="padding-left: 0;padding-right: 0;"><div style="transform: rotate(-90deg);">Predicted classes</div></th>
                <th>Class1</th>
                <td>43<br>43%</td>
                <td>28<br>28%</td>
                <td>91<br>91%</td>
                <th>102<br>10%</td>
            </tr>
            <tr>
                <th>Class1</th>
                <td>32<br>32%</td>
                <td>44<br>44%</td>
                <td>62<br>62%</td>
                <th>40<br>10%</td>
            </tr>
            <tr>
                <th>Class1</th>
                <td>46<br>46%</td>
                <td>24<br>24%</td>
                <td>35<br>35%</td>
                <th>40<br>10%</td>
            </tr>
            <tr>
                <th></th>
                <th><div>Totals <br> true</div></th>
                <th>40<br>40%</th>
                <th>40<br>40%</th>
                <th>40<br>40%</th>
                <th><div>Correct</div><div>40<br>10%</div></td>
            </tr>
        </tbody>
    </table>
    <script>
        $(document).ready(function () {
            $("table tr td").each(function () {
                let [value, percent] = $(this).html().split("<br>");
                let value_color = parseInt(percent)/100;
                $(this).css("background-color", `rgba(255, 0, 0, ${value_color})`);
                console.log('Value ' + value + " with " + percent);
            });
        });
    </script>
</body>

</html>

What is the unit of the transform matrix

Parameters to keep track for trainings

Data:

range of images used
images excluded ⚠️ more raw than preprocessed
resolution distribution
number of source images used
classes available and mappings
distribution of max/min length of oil discharges
type de valeurs prédites
Preprocessing:
Grid / patchsize
number of patches
Preprocessed images or not
- which preprocessing pipeline (parse it to json)
data augmentations (with parameters)
AI:
loss
metrics
optimizer
model
confusion matrix

Objects used:

make an id to log the modules used for the augmentations, the model.... Write in the logs the id and the name. Change the id for each version of the code: use the commit hash
log the commit hash

Stat on number of pixels classified

Classify the images

To classify the images we will use the efficientnetv4 model. We transfert its knowledge to classify the images.
We will use this repository. It seems that (no precise information on the repository) the model has been pretrained on ImageNet with 1 000 classes

Data copy

Folders to copy:

.[DATA]\Satellite\Sentinel1
.\Stage_Chihab\Cartographie_Hydrocarbures\OilSlicks*WGS84

Class imbalance in filtered_cache_...

Problem

Patch with no annotations

not given to the model ➡️ understandable that the model does not well on these patches

Spill class misclassified

According to Statistics of number of classes present on patches, there are less patches with spill maybe due to the fact that several spill polygons can be on the same image

But according to Compared with original polygons statistics, there are 196 spill and 533 seep on all rasters

Which format is the best for input images

[REFLEXIONS]

Raster images or hdf5 file faster ? Is a hdf5 file a problem for other usages ?
Do we make a preprocessing algorithm to convert all annotations to a segmentation map and put it with the rasters in the hdf5 ?

Suggestion :

1 script takes as input the raster images and put them as numpy arrays in the images.hdf5
1 script takes as input the raster images header and put them as images_infos.json
1 script takes as input the annotations, creates the segmentation map from the polygons and save it into annotations_labels.hdf5

We will put the name of the raster image as the name of each dataset contained in each hdf5 file. It will allow us to access easily to data.

What is the structure of the names of raster images ?

For example, in the name S1A_IW_GRDH_1SDV_20190601T042305_20190601T042330_027481_0319CB_0EB7_NR_Cal_ML_EC_dB.data
027481_0319CB_0EB7 ensure (maybe less also does) an unique name
Would it be better to use the location of the upper left pixel (for instance) as a name for each dataset in the hdf5 file ?
This value can be seen twice

RGB overlay bug supposition

After printing the confusion matrix of a 100 images batches we obtain the following result:

But the network labels with the overlay all images as background (no annotation)

Supposition:

As we want all images from the same image, we use a different dataset than the datasetcache object. Therefore, we could have different preprocessing operations applied between these datasets

Reduce training time by storing clusters of shapes only

Observation

For 100 epochs with only patch augmentation 1 day 16h necessary
We waste a lot of time opening patches not containing interesting classes (seep, spill)

Potential solutions:

Extract seep and spill zones into a hdf5 cache and take that as new images
Resize input image so that patches are already at the correct size for the model

Rotate the image to cut the borders

Original idea : rotate the image so that the grid cut follow the axis of the real image without the padding already include in the imgae

problem: how to get the rotation angle

--> maybe with the transform matrix of the raster

Check pytorch ok on both computers

Note : install in an 3.7 environment (for windows it allows to install rasterio)

pc : ok
working station : todo

Get the annotations from the polygons

In order to extract the polygons from the shapefiles .shp, we will use the package pyshp denoted as shapefile in python imports. The QGIS python package does not work (on my computer) on Windows.

The Readerobject allows to open the shapefile and returns an object. Then we can access to its metadata thanks to the property record of this object.

We can extract the points of each polygon shape by looping over the values of the object and getting the shape.pointattribute. It allows us to get the list of the points of the polygon.
The points seems to be using the geographical coordinates rather than pixel coordinates.

Example of coordinate : (25.078123755232415, 38.92436547557481)

Allow to add multiple layers

For example:

land
sea
boats
....

Problem: memory consumption

we can maybe store the of points in px or gps

filtered_cache_images.hdf5 training not working

Context

Dataset with patches

not in margins
always with seep and/or spill annotation
augmentation factor of 100

Problem

➡️ Model converges

But unsatisfactory result

Prediction

Compared to the reference

➡️ 2 problems:

Background (= class other = with no annotation) : is always confused with the seep class
Spill : is not recognized

Diagnosis

Statistics of number of classes present on patches

spill_only: 50 patches
seep_only: 13248 patches
seep_spill: 9828 patches

Compared with original polygons statistics

To get them we have used

the DB Manager to execute SQL queries to get
a. the number of rasters with each amount of seep possible
b. the number of rasters with each amount of spill possible
a python script to merge the two tables (gotten with copy paste) and count the number of raster with each amount of seep and spill possible

Number of seep	Number of spill	Number of raster	Number of seep	Number of spill	Number of raster
0	0	357	2	0	4
0	1	10	2	1	1
0	2	11	2	5	1
0	3	10	2	8	1
0	4	5	20	0	2
0	5	2	3	0	9
0	6	3	3	1	1
0	7	1	3	3	1
0	9	1	4	0	9
1	0	11	4	3	1
1	3	2	5	0	6
1	6	1	5	1	2
1	7	1	5	2	1
10	0	2	6	0	3
11	0	3	6	2	2
11	4	1	7	0	5
12	0	1	8	0	3
13	0	1	8	1	1
14	7	1	8	2	1
15	0	2	8	7	1
15	1	1	9	0	2

(with 0,0 point excluded)
Interactive vizualization

Statistics of the shapes of the seeps, spills

We will use th minAreaRect of opencv to build a rectangle rotated so that it has the minimum area that allows to include all of the shape.

Number of seep	Number of spill	Number of raster	Number of seep	Number of spill	Number of raster
0	0	357	2	0	4
0	1	10	2	1	1
0	2	11	2	5	1
0	3	10	2	8	1
0	4	5	20	0	2
0	5	2	3	0	9
0	6	3	3	1	1
0	7	1	3	3	1
0	9	1	4	0	9
1	0	11	4	3	1
1	3	2	5	0	6
1	6	1	5	1	2
1	7	1	5	2	1
10	0	2	6	0	3
11	0	3	6	2	2
11	4	1	7	0	5
12	0	1	8	0	3
13	0	1	8	1	1
14	7	1	8	2	1
15	0	2	8	7	1
15	1	1	9	0	2

Number of seep	Number of spill	Number of raster	Number of seep	Number of spill	Number of raster
0	0	357	2	0	4
0	1	10	2	1	1
0	2	11	2	5	1
0	3	10	2	8	1
0	4	5	20	0	2
0	5	2	3	0	9
0	6	3	3	1	1
0	7	1	3	3	1
0	9	1	4	0	9
1	0	11	4	3	1
1	3	2	5	0	6
1	6	1	5	1	2
1	7	1	5	2	1
10	0	2	6	0	3
11	0	3	6	2	2
11	4	1	7	0	5
12	0	1	8	0	3
13	0	1	8	1	1
14	7	1	8	2	1
15	0	2	8	7	1
15	1	1	9	0	2

rob174 / detection_nappe_hydrocarbures_imt_cefrem Goto Github PK

detection_nappe_hydrocarbures_imt_cefrem's People

Contributors

Watchers

detection_nappe_hydrocarbures_imt_cefrem's Issues

General architecture

Second cause of problems

html code example

Problem

Context

Problem

But unsatisfactory result

Prediction

Compared to the reference

Diagnosis

Statistics of number of classes present on patches

Compared with original polygons statistics

Recommend Projects

Recommend Topics

Recommend Org

Number of seep	Number of spill	Number of raster	Number of seep	Number of spill	Number of raster
0	0	357	2	0	4
0	1	10	2	1	1
0	2	11	2	5	1
0	3	10	2	8	1
0	4	5	20	0	2
0	5	2	3	0	9
0	6	3	3	1	1
0	7	1	3	3	1
0	9	1	4	0	9
1	0	11	4	3	1
1	3	2	5	0	6
1	6	1	5	1	2
1	7	1	5	2	1
10	0	2	6	0	3
11	0	3	6	2	2
11	4	1	7	0	5
12	0	1	8	0	3
13	0	1	8	1	1
14	7	1	8	2	1
15	0	2	8	7	1
15	1	1	9	0	2