Giter VIP home page Giter VIP logo

geotracknet's People

Contributors

dnguyengithub avatar texify[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

geotracknet's Issues

Using my own dataset.

Dear author, thanks for your awesome work.

However, I encountered some issue when I tried to use my own dataset.
I would like to make sure that I am doing it correctly, so I will try to verify my steps here with you first.

==========
Here are the steps and changes that I applied, by following some of your replies on other issues.

For csv2pkl.py:

  • Modified the LAT_MIN, LAT_MAX, LON_MIN, LON_MAX according to my dataset,
    LAT_MIN = 40.33 LAT_MAX = 40.77 LON_MIN = -74.30 LON_MAX = -73.50
  • Modified the file names/ path names/ time ranges
  • Changed the dataset format so that when csv2pkl read my dataset, it will fit in in the format and order of
    LAT, LON, SOG, COG, HEADING, ROT, NAV_STT, TIMESTAMP, MMSI, SHIPTYPE, D2C
  • Zero-ed out ROT, NAV_STT and D2C, also removed the D2C filter in the code below.
  • Set CARGO_TANKER_ONLY = False
  • Removed the UNIX_TIME /1000 stuff

For dataset_preprocessing.py:

  • Modified the LAT_MIN, LAT_MAX, LON_MIN, LON_MAX, dataset+path name

After the modification steps on csv2pkl.py and dataset_preprocessing mentioned above,
using my datasets, I was able to create 3 pkl files (train, test, valid),
alongside 3 png (train, test, valid) in which I can see the filtered tracks inside the ROI.

==========

However, after this, I am not sure what is the next step, please help!

  1. What is the next step?
  2. For the next step, What are the files that I will need to change? What value should I change it to?
  3. Especially since I am using the default trained model
    chkpt/elbo-ct_2017010203_10_20_train.pkl-data_dim-602-latent_size-100-batch_size-50.zip,
    do I need to make sure my dataset have the same (LAT_RANGE and LON_RANGE), for the pretrained model to work?

Also some extra question here:
Does the default pretrained model work well on new area that it was not trained on?

Thanks for your help in advance!

Validation data explanation / strategy?

Hi again,

I'm curious as to the data you've entered into your model and what your validation dataset's purpose is. Are you providing examples of "valid" (normal) tracks or "invalid"(anomalous) tracks exclusively? Is there a strategy that you've done, or would my suggested datasets make sense as a validation dataset?

4 hot vector with dataset

Dear @dnguyengithub,

Could you please clarify as to how to change the 4 hot vectors so that the pre-processing works for another dataset?
For example if I were to use a dataset with:
LAT_MIN = 25.00, LAT_MAX = 30.00 ,LON_MIN = -89.00, LON_MAX = -84.00

From what I understood the values for the LAT_BINS and LON_BINS would be as follows:
LAT_BINS = 500; LON_BINS = 500;

What about the SOG_BINS and the COG_BINS? Do they stay 30 and 72?
SOG_BINS = 30; COG_BINS = 72

I ran the codes in that order:
csv2pkl.py -> dataset_preprocessing.py ->calculate_AIS_mean.py

calculate_AIS_mean is yeilding the following error:
Traceback (most recent call last):
File "c:/Users/User/Desktop/geotracknet/data/calculate_AIS_mean.py", line 102, in
current_sparse_matrix,, = sparse_AIS_to_dense(tmp,0,0)
File "c:/Users/User/Desktop/geotracknet/data/calculate_AIS_mean.py", line 68, in sparse_AIS_to_dense
dense_msgs.append(create_dense_vect(msg,
File "c:/Users/User/Desktop/geotracknet/data/calculate_AIS_mean.py", line 58, in create_dense_vect
dense_vect[int(lat*lat_bins)] = 1.0
IndexError: index 14709 is out of bounds for axis 0 with size 1102

Looking forward to hearing from you
Thank you.

An error when run "save_logprob"

Hello everyone.
First of all, I would like to thank the author for his magnificent work.
Now I expose you my problem.
I have managed to train the model with my own dataset, but then when I try to run "save_logprob" the following error occurs:

File "geotracknet.py", line 157, in <module>
     D["seq"] = np.zeros(tar[:seq_len_d,d_idx_inbatch])[1].reshape(-1,4)
ValueError: cannot reshape array of size 574 into shape(4)

It seems that the array to which we apply the reshape method does not have the necessary elements to be able to group 4 by 4. Does anyone know to what can be due?

I would be very grateful if someone answers me.
Best regards and thanks!

Using new dataset

Dear Dr. Nguyen

Thank you for your awesome work!

Right now, I can get results based on the data you provided. However, when I use my own dataset, there are some problems.

First of all, I listed all the setups I made:
Lat_min=50.6, Lat_max=54.6, Lon_min=-180.0, Lon_max=-174.0
Lat_bins=400, Lon_bins=600
the data I used is from 2017 Jan 1st to Jan 31th, 10 days for training, 10 days for valid, 11 days for tset.

Then I run cvs2pkl, dataset_preprocessing, calculate_AIS_mean, I got the data file similar with the data you provided, but I did not use the ''Loading coastline polygon'' function in the dataset_preprocessing.py

then I start to train the model, it works, but I got two files under the chkpt path with different data_dim, one is 602, one is 1102, I believe 1102 should me mine dataset.

then I run save_logprob, it did not creat errors too

However when I run local_logprob, I can not get any result, even that data/ct_2017010203_10_20/local_logprob-ct_2017010203_10_20_train.pkl-ct_2017010203_10_20_valid.pkl-100-missing_data-False-step-80002 and other results.

So I want to know, why this error happened? is that because I used not enough data?

I will be so grateful if you can reply!

Regards

Yu

Question on results

Thank you for sharing your software. I have re-run the example test data, ct_201701020310_30, and I am finding 26 anomalies. The code ran to 80003 steps and had 100 missing data points.

Question:
In your paper, "GeoTrackNet A Maritime Anomaly Detector using Probabilistic Neural Network Rep. of AIS Tracks and a Contrario Detection" you report than you found only 25 anomalies. Please let me know if the software has been updated for this paper since I have 26 anomalies.

I would like to look at the SOG and COG for the anomalous vessels. I have written code to plot each track individually but the SOG does not look correct when I read from the pickle test file. Is v_sog = tmp[:,2]/float(onehot_sog_bins) correct? I believe that the exact SPEED_MAX may not be available based on the data_preprocessing.py code. Any suggestions on how to extract SOG or COG?
My goal is to understand the results. Any suggestions would be appreciated. I am will to share anything finding that I obtain.

Thank You

LAT_MIN, LAT_MAX, LON_MIN, LON_MAX Values

I'm sorry I'm just confused about the value of LAT_MIN, LAT_MAX, LON_MIN, LON_MAX and SPEED_MAX
Where do you get it from? is it from the dataset or you just initialize it as boundaries of the project?
Thank you

Suitable Cuda and Graphic card used with environment?

I am having issues when training the RNN model. I am using Cuda 11.7 and NVIDIA GeForce RTX 3090.
Googling around about the error I am getting, the most similar issues I am encountering say that TensorFlow v1 (which you use 1.12) may not be compatible with Cuda 11 and RTX 3 series.

For GPU usage, what graphic card series and Cuda toolkit version did your group use in combination with the environment described in requirements.yml?

I hope someone can answer this issue as soon as possible.
Thank you!

Logits Vs Probs. Are they being mixed up?

I have run through the code (with actual data) and it seems to me that there is a mixing up of "probs" and "logits" in the code, but perhaps someone can correct me.

In runners.py we see model(cur_inputs, rnn_sate, cur_mask, return_value = "probs") which is current line 93 but then the only way the model is created is the create_vrnn function on line 250 of vrnn.py and then when this is called in runners.py in currently line 213 it is called with vrnn.ConfitionalBernoulliDistribution which is formed using the logits argument. So when I run, I get a NoneType because we are really using logits while we are trying to evaluate using probs.

So first, I ran through in train mode and that completed successfully. This problem that I have just reported arises then when I am trying to later run save_logprob mode.

Data_dim Error

The data dimension is statically set when calculate_AIS_mean.py (line 40) is run whereas in flags_config.py its based on the ROI and resolution(~ line 195). Either make the data_dim fixed in flags_config.py or make calculate_AIS_mean.py dependent on the ROI/resolution.

Dataset Details

Hi,

Your IEEE brings me here (really well-written by the way ). I am new to the topic but really would love to repicate your work to gain more understanding. Let start with the datasets:

  1. You mentioned that you used 2 datasets: the Gulf of Mexico and the Brittany coast datasets.
    1.1 For the Gulf of Mexico downloaded from https://marinecadastre.gov/ais/. I have followed the link but there are 19 zones. Where should I start to get the same dataset you used in the paper?
    1.2 For the Brittany datasets: I have contacted G. Hajduch, [email protected] a few weeks ago for the full dataset, but no response from him. Could you assist me on this matter?

After obtaining the datasets, I may have more questions and hope you would not mind me asking you again in the future.

Thank you!

SW

请问这个csv文件在哪里

FileNotFoundError: [Errno 2] No such file or directory: './Est-aruba_5x5deg_2018001_2018120.csv'
Est-aruba_5x5deg_2018001_2018120.csv'-------这个文件在哪里呢(Where is the file?)

contrario detection Issues

Dear Nguyen,

I just want to make sure something..
in save_logprob and local_logprob mode, the test set is valid dataset right?
but for the contrario_detection we use test dataset right?

if it's true then I think there's some bug in your code because when I run the contrario I got an error

Environment setup difficulties

Any docker images / conda documentation for creating a suitable environment? I'm currently trying to hack apart the yaml file, and I've only attained partial success using the following docker image tensorflow/tensorflow:1.12.0-py3 with the training portion. Tips would be appreciated!

GPU Power required to run GeoTrackNet

Hi! I'm working on anomaly detection on AIS data and I'm trying to replicate the results from the GeoTrackNet article.
While training the VRNN on my local machine I get stuck because of an Out of Memory error. My GPU is RTX3060 for laptops with 6144MiB of memory.

  1. I was wondering on what machine did you run the model? What was the GPU?
  2. What part of the code should I change in order to lower the batch size?

Ship type

Hi,
Your code and research is really good and I have tried your code to my dataset and it's good.

but actually, the track for different shiptype(cargo, fishing, passenger, etc) is also different right?
I just want to know your opinion. Do you think it is a good idea to add ship type to 4-hot vector(it will be 5-hot vector then)?
or do you think we just should run it separately for each ship type?

what do you think?

I'm really grateful that you always responds to my questions hehe :)
it's great to meet you in this repository

thank you..

Inputs for csv2pkl.py

Hello!

I already ran your preprocessed data for the Embedding layer, save_logprob, local_logprob, and contrario_detection. Ran through some issues but I got it to work.

Now, I am trying to run my own dataset. I believe I have to run the csv2pkl.py and datasetpreprocessing.py. I have questions trying to use csv2pkl.py with my data:

  1. I don't have ROT, or Nav_stt. Do I need them for the analysis? The paper doesn't mention those as the pillar of the analysis.
  2. What does D2C stands for? I also don't have it...
  3. line 195: Where the CARGO_TANKER_ONLY Boolean variable came from?
  4. If I want to keep the fishing vessels for the analysis (Cargo, Tanker, and Fishing), should I be executing this block of code with this variable (from Question 3)?
  5. Just to make sure, the "LOADING CSV FILES" section, its purpose is to load the data and assign the data type to each column?
  6. line 244: Dividing unix time by 1000. Is that specific of your dataset or should also include it?

Thanks in advance for any time that you could spare to help me with these questions.

VRNN Model

Hello,

I want to explore more about your VRNN code. Actually I am not familiar about the Sequential model like RNN or LSTM. But I want to learn about it.

From what I know we use timestep as a variable of how much we look back to previous sequence right? but i don't find it in your code. Could you explain about it a little or if there is any diagram/architecture about the model that you develop it'll be useful
Because I am a little bit confused because the timesteps for each tracks is not the same right?

And I also want to know what is the output of your VRNN model?

Thank you

Anomaly output metric

Hi again. Hopefully this is a relatively easy question. What metric (or metrics) would you recommend to quantify the output of each individual track per the A Contrario (or preceding) output? I'm referring to something that could be relatively easily extracted from the code. Would this be the average log probability weight per track? How might a representational value get extracted per MMSI for all tested tracks?

ValueError while training the embedding layer

An error occurred while training the embedding layer:ValueError:Dimensions must be equal,but are 702 and 602 for 'sub' with input shapes:[?,?,702],[1,1,602].
2020-09-15 13-55-42屏幕截图
As a beginner, I can’t solve this problem, hope your reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.