bnn-upc / ignnition Goto Github PK

View Code? Open in Web Editor NEW

50.0 50.0 17.0 24.41 MB

Framework for fast prototyping of Graph Neural Networks

License: Apache License 2.0

Python 52.83% Jupyter Notebook 46.85% Makefile 0.14% Batchfile 0.17%

ignnition's People

Contributors

Stargazers

Watchers

Forkers

rmongeca real-lhj miquelferriol archo48 yangwang92 davidpujol otmanjai salmanmohebi axelwass superyyran aaronfderybel sourilaki xxccb ztz1989 imengby lapsus22

ignnition's Issues

The process finishes without giving any error if the nomalization functions are not defined.

If you add a custom normalization function to the model_description that is not defined, the program does not raise any error and finishes.

2020-11-19 15:44:22.543203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2985 MB memory) -> physical GPU (device: 0, name: GeForce GTX 970, pci bus id: 0000:01:00.0, compute capability: 5.2)
2020-11-19 15:44:22.545986: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x1f956720cb0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-19 15:44:22.546151: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 970, Compute Capability 5.2
Starting the training and evaluation process...
---------------------------------------------------------------------------
Number of devices: 1
Process finished with exit code 1

Can I use two readout functions for two different nodes and get a loss from these two functions?

Thanks for sharing such a helpful platform that frees us from the complexity of implementation.

Can I use two readout functions for two different nodes and get a loss from these two functions? For example, in RouteNet, can I read link and path states with two readout functions?

Thanks a lot!

Yang

Errors are being reported even if the training goes well

I am getting the following errors at the start of a new epoch even if the training is running without TF errors:

Epoch 3/3

 There was an unexpected error: 
The entity "link" was used in the model_description.json file but was not defined in the dataset. A list should be defined with the names (string) of each node of type link.
E.g., "link": ["n1", "n2", "n3" ...]
Please make sure that all the names used for the definition of the model are defined in your dataset. For instance, you should define a list for: 
1) A list for each of the entities defined with all its nodes of the graph
2) Each of the features used to define an entity
3) Additional lists/values used for the definition
4) The label aimed to predict
---------------------------------------------------------
100/100 [==============================] - 1s 9ms/step - val_loss: 0.5586 - val_mean_absolute_error: 0.3851 - val_mean_absolute_percentage_error: 144.1184 - sample_num: 200.0000

epoch_size not an optional parameter

The documentation for the epoch_size training parameter says that leaving it blank, it will consider the entire dataset as one epoch.
https://ignnition.org/doc/train_and_evaluate.html#epoch-size

However leaving this blank produces following error:

ValueError: When providing an infinite dataset, you must specify the number of steps to run (if you did not intend to create an infinite dataset, make sure to not call repeat() on the dataset)

This seems to be a data generator related issue, see tf docs: https://www.tensorflow.org/api_docs/python/tf/data/Dataset#repeat
Looking at the source code it seems that the datagenerator repeat is hardcoded to True:

ignnition/ignnition/ignnition_model.py

Line 761 in 905e4aa

train_dataset = self.__input_fn_generator(filenames_train,

Right now leaving the epoch_size blank does not give the desired behaviour of training

Update the repository link in README.md

Update this part:
wget 'https://github.com/BNN-UPC/ignnition'
pip install -r requirements.txt
python setup.py install

Using shape_invariant argument

Hello,

I'm facing a issue when running ignnition with the aggregation type parameter "ordered" (it works fine with min, max, etc.). I'm receiving this message:

ValueError: Input tensor 'ignnition_model/states_creation/actions/build_state0/concat_1:0' enters the loop with shape (1, 32), but has shape (None, 32) after one iteration. To allow the shape to vary across iterations, use the shape_invariants argument of tf.while_loop to specify a less-specific shape.

Where in the can I change this parameter?
Thanks!

GPU and cuda support

Hello,

Is there currently a way to use GPU instead of CPU ?
I started training GNN using the following example GIT project: https://github.com/BNN-UPC/GNN-NIDS
using the function model.train_and_validate() and receive following output:

Hyperparameter optimization

Hello,

I am currently working with ignnition models, and I am interested in performing hyperparameter optimization for them. I was wondering if it is possible to do so and if there are any examples available that could help me get started.

Thank you for your time, and any help you can provide would be greatly appreciated!

Ognjen

HDF5 saving error

After correctly training I am receiving the following error:

NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

The full traceback is the following:

Traceback (most recent call last): File "main_ignnition.py", line 7, in <module> model.train_and_validate() File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/ignnition/ignnition_model.py", line 678, in train_and_validate self.gnn_model.fit(train_dataset, File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 1229, in fit callbacks.on_epoch_end(epoch, epoch_logs) File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 435, in on_epoch_end callback.on_epoch_end(epoch, logs) File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1369, in on_epoch_end self._save_model(epoch=epoch, logs=logs) File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/callbacks.py", line 1433, in _save_model self.model.save(filepath, overwrite=True, options=self._options) File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/engine/training.py", line 2111, in save save.save_model(self, filepath, overwrite, include_optimizer, save_format, File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/keras/saving/save.py", line 139, in save_model raise NotImplementedError( NotImplementedError: Saving the model to HDF5 format requires the model to be a Functional model or a Sequential model. It does not work for subclassed models, because such models are defined via the body of a Python method, which isn't safely serializable. Consider saving to the Tensorflow SavedModel format (by setting save_format="tf") or using save_weights.

Is there a parameter that must be defined to solve this issue?

edge types when passing messages

Hello,

Is it possible to have multiple types of edges and/or is it possible to distinguish edge types during message passing phase (in model_description)?

thanks in advance.

Graphs with 1 edge generate errors

Training with graphs containing 1 edge generates the following error:

File "[...]/.pyenv/versions/miniconda3-latest/envs/py38/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError:  ConcatOp : Expected concatenating dimensions in the range [-1, 1), but got 1
	 [[{{node gnn_model/StatefulPartitionedCall/ignnition_model/message_passing/iteration_0/stage_0/MP_to_AS/message_phase/AS_to_AS/create_message_AS_to_AS/apply_nn_0/concat_2}}]] [Op:__inference_train_function_8025]

TensorFlow version is 2.5.0 and Python version is 3.8.5

"AttributeError: 'NoneType' object has no attribute 'replace' " when using convolution as aggregation

Hello!

When I try to use convolution as an aggregation function I get an error listed at the bottom of this issue. I'm not sure if there are some prerequirements that should be satisfied before using convolutions. Message passing stage looks like:

- stage_message_passings:
  - destination_entity: variableNode
    source_entities:
      - name: factorNode
        message:
          - type: direct_assignment
    aggregation:
      - type: convolution
    update:
      type: neural_network
      nn_name: recurrent1

You can find the whole code here, if needed:
code.zip

Best regards,

Ognjen

Epoch 1/1000
100/100 [==============================] - 8s 40ms/step - loss: 0.0266 - mean_absolute_error: 0.1001 - val_loss: 0.0027 - val_mean_absolute_error: 0.0316
Traceback (most recent call last):
File "main.py", line 38, in
main()
File "main.py", line 11, in main
model.train_and_validate()
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\ignnition\ignnition_model.py", line 751, in train_and_validate
verbose=1)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1145, in fit
callbacks.on_epoch_end(epoch, epoch_logs)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\keras\callbacks.py", line 428, in on_epoch_end
callback.on_epoch_end(epoch, logs)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1344, in on_epoch_end
self._save_model(epoch=epoch, logs=logs)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\keras\callbacks.py", line 1406, in _save_model
filepath, overwrite=True, options=self._options)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\keras\engine\training.py", line 2124, in save_weights
self._trackable_saver.save(filepath, session=session, options=options)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1217, in save
file_prefix_tensor, object_graph_tensor, options)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1154, in _save_cached_when_graph_building
object_graph_tensor=object_graph_tensor)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\util.py", line 1120, in _gather_saveables
feed_additions) = self._graph_view.serialize_object_graph()
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\graph_view.py", line 408, in serialize_object_graph
trackable_objects, path_to_root)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\graph_view.py", line 363, in _serialize_gathered_objects
object_names[obj] = _object_prefix_from_path(path)
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\graph_view.py", line 64, in _object_prefix_from_path
for trackable in path_to_root))
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\graph_view.py", line 64, in
for trackable in path_to_root))
File "C:\Users\OgnjenKundacina\miniconda3\envs\gnn_env\lib\site-packages\tensorflow\python\training\tracking\graph_view.py", line 57, in _escape_local_name
return (name.replace(_ESCAPE_CHAR, _ESCAPE_CHAR + _ESCAPE_CHAR)
AttributeError: 'NoneType' object has no attribute 'replace'

Problems with making predictions using the already trained model

Hi ignnition team!

Briefly the problem is that model.predict() function doesn't return good results when it is called without model.train_and_validate() function.

I've trained a GNN model successfully (training and validation losses were converging to zero) and predictions from the predict() methods were aligned with the labels in the prediction set. I've used the following code in the main() method:
model = ignnition.create_model(model_dir='./')
model.computational_graph()
model.train_and_validate()
predictions = model.predict(num_predictions = 1)

Part of the train_options.yaml file:
train_dataset: ./data/train
validation_dataset: ./data/test
predict_dataset: ./data/test
load_model_path: ./weights.1000-0.00R.hdf5
additional_functions_file: ./main.py
output_path: ./

I copied trained trained model parameters "weights.1000-0.00R.hdf5" from the CheckPoint into the root directory and called the model.predict() in the following way:
model = ignnition.create_model(model_dir='./')
predictions = model.predict(num_predictions = 1)

and in this way also:
model = ignnition.create_model(model_dir='./')
model.computational_graph()
predictions = model.predict(num_predictions = 1)

but the predictions were not fitting the labels in the predict set well. I also tried calling the model.train_and_validate() function with epochs and epoch_size set to 0, but it gave the same results.

I would also note that there is no "weights.1000-0.00R.hdf5" in some location other than root directory, so the correct trained parameters should be loaded:

Console logs:
←[1m
Processing the described model...
←[0m
←[1mCreating the GNN model...
←[0m
Restoring from ./weights.1000-0.00.hdf5
←[1mStarting to make the predictions...

You can find the code attached in code.zip, as well as predicted vs label plots for both working and non-working examples.

Kind regards!

Ognjen

code.zip

Implement new feature that allows the user to pass a list of paths for training/evaluation

This would allow for the user to train, for example, using multiple topologies. Check https://ignnition.net/doc/train_and_evaluate/#definition-of-the-paths for more details

Results visualization

I am a new user of the ignnition framework and I would like to ask the following two questions:

Does ignnition provide a function or other way to compare the predicted loss function of different models on the same validation set?
Also is it possible to customize the output of the loss change image during training. Tensorboard doesn't seem to be able to alias the horizontal and vertical coordinates?

Looking forward to hearing from the community, thanks a lot!

Including global graph information into the loss function

Dear ignnition team,

In our problem we are training a GNN for a regression task - a subset of nodes is labeled by a float value and those labels are learned using a neural network as a readout model. The whole GNN model is trained based on the MSE loss between the labels and the predictions for the mentioned subset of nodes and works well!

We would like to create a new loss function that incorporates some physical laws related to our problem. In each training step, after the predictions are generated for all of the labeled nodes, we would like to add an additional term to the MSE between the labels and the predictions. That additional term would multiply all of the predictions with some coefficients (different for every node), and sum all of the obtained values. So the goal would be to minimize that sum along with the MSE.

Is something like this possible to implement in the ignnition framework? I'm not even sure is it consistent with the logic for creating the mini-batches - I guess that the requirement here would be to have all of the nodes from one training sample in the same mini-batch.

Thanks!

Ognjen

Review RouteNet example and improve documentation

Error when hidden states size is smaller than number of node features

Currently, when the specified hidden state size for a certain entity is smaller than the number of node features, model creation fails with the following error:

Errors may have originated from an input operation.
Input Source operations connected to node
ignnition_model/hidden_states/hidden_state_atom/add_zeros_to_atom/zeros:
    ignnition_model/hidden_states/hidden_state_atom/stack
    (defined at /****/ignnition/ignnition/auxilary_classes.py:155)

Function call stack:
call

This is probably because the framework tries to pad the node features up to the hidden state size. Support for smaller hidden state sizes should be added.

Is it possible to specify a subset of nodes in a graph from which the loss function will be calculated?

Dear ignnition team,

In our problem we use a GNN to learn over graphs with two type of nodes. One type of nodes is intended to be used for inputting data into the GNN, and the other type is used for generating predictions (no nodes are intended for both predictions and inputting data). Furthermore, these graphs are bipartite. So in the ideal case we would like to label only the second type of nodes and calculate the loss functions using their labels and predictions.

Is it possible to specify a subset of nodes in a graph (type of nodes in our case) from which the loss function will be calculated?

Thanks!

Ognjen

Review Radio resource allocation example

Error: Dimensions of inputs should match

This error appear when i try to train my GNN on graph datasets in order to optimize makespan of RCPSP problem.
The error disapear when i clear the dataset and i keep only few graphs (4-5 not more).
I tried to changed every parameter in train_options and model_description but nothing solved my problem.

model_GNN_beta.zip

Image with the full error message in attachment.