Giter VIP home page Giter VIP logo

vitis-ai-tutorials's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vitis-ai-tutorials's Issues

quantize output nodes may have a problem

When I quantize yolo v3 model flow this tutorial, I found that the output nodes in this code block should be conv2d_59/BiasAdd,conv2d_67/BiasAdd,conv2d_75/BiasAdd while not conv2d_59/convolution,conv2d_67/convolution,conv2d_75/convolution.

!vai_q_tensorflow quantize \
  --input_frozen_graph model_data/yolov3_voc.pb \
  --input_nodes input_1 \
  --input_shapes ?,416,416,3 \
  --output_nodes conv2d_59/convolution,conv2d_67/convolution,conv2d_75/convolution \
  --input_fn input_fn.calib_input \
  --method 1 \
  --gpu 0 \
  --calib_iter 100 

Bad accuracy with compiled elf-file for custom DPU on the MNIST tutorial

Hi everybody,
I have first downloaded the Xilinx zcu102 image and flashed it to SD card. Then I executed the MNIST-Classification-TensorFlow with default configurations. Now I can successfully run the resulting program on my ZCU102 board.

Now I want to execute the MNIST-Classification-TensorFlow example on a custom DPU. So I have created a DPU (image) via DPU-TRD Vitis flow. To use the custom DPU I copied BOOT.BIN file and dpu.xclbin file to BOOT-partition (of the previous working xilinx image) and dpu.xclbin to /usr/lib. Thus my DPU is successfully recognized, which I approved by dexplorer -wcommand.

For executing the MNIST program on my custom DPU I took the .hwh file (from DPU generation) and used dlet command to generate .dcf file. In step 6 of the tutorial I added --options "{'dcf':'<my dcf file>'}" to command vai_c_tensorflow to get the .elf file that fits to my custom DPU. Now I copy the .elf file to my board and execute the MNIST program. I do not get an error and here is the according output:

Command line options:
 --image_dir :  images
 --threads   :  1
 --model     :  model_B512_LowPerformance/dpu_customcnn.elf
Pre-processing 10000 images...
Starting 1 threads...
FPS=2822.20, total frames = 10000 , time=3.5433 seconds
Correct: 980 Wrong: 9020 Accuracy: 0.098

The accuracy is very low which is not normal, so where is the problem here?
When I compare configuration of my custom DPU and output of the ddump command the elf file should fit perfectly to my DPU.

On executing the MNIST program with default configuration on my custom DPU I get an error. On executing the MNIST program fitting to my custom DPU I do not get an error. So I assume that the .elf file is correct for my custom DPU.

So, why is the accuracy for the 'custom' elf file on the custom DPU so bad? What am I doing wrong?

Training layers saved in train_save.py

Hello,

it appears that the dropout layers are saved in the train_save.py tutorial,
the quantizer seems to crash finding those. Would be nice if the tutorial
could be updated.

Quantizing yolov4 error

Traceback (most recent call last):
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 501, in _import_graph_def_internal
graph._c_graph, serialized, options) # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT]; attr=U:type,allowed=[DT_FLOAT]; attr=epsilon:float,default=0.0001; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=is_training:bool,default=true>; NodeDef: {{node batch_normalization/FusedBatchNormV3}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/bin/vai_q_tensorflow", line 11, in
sys.exit(run_main())
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/contrib/decent_q/python/decent_q.py", line 1061, in run_main
app.run(main=my_main, argv=[sys.argv[0]] + unparsed)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/contrib/decent_q/python/decent_q.py", line 1060, in
my_main = lambda unused_args: main(unused_args, FLAGS)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/contrib/decent_q/python/decent_q.py", line 676, in main
flags.skip_check, flags.dump_as_xir)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/contrib/decent_q/python/decent_q.py", line 375, in quantize_frozen
check_float_graph(input_graph_def, input_fn, q_config, s_config)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/contrib/decent_q/python/decent_q.py", line 275, in check_float_graph
importer.import_graph_def(input_graph_def, name='')
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 405, in import_graph_def
producer_op_list=producer_op_list)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/tensorflow_core/python/framework/importer.py", line 505, in import_graph_def_internal
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'exponential_avg_factor' not in Op<name=FusedBatchNormV3; signature=x:T, scale:U, offset:U, mean:U, variance:U -> y:T, batch_mean:U, batch_variance:U, reserve_space_1:U, reserve_space_2:U, reserve_space_3:U; attr=T:type,allowed=[DT_HALF, DT

Tensor flow 2 Quantization

Hi I have the following error while quantization. I wrote the following script:

from future import absolute_import
from future import division
from future import print_function

import numpy as np

from tensorflow.python.keras.utils.data_utils import get_file
from tensorflow.python.util.tf_export import keras_export

with np.load('data_capture_qpsk/frame_esn0_0-0.npz') as f:
rx_data_input_real, rx_data_input_imag = f['rx_data_map_real'], f['rx_data_map_imag']
tx_pilot_input_real, tx_pilot_input_imag = f['tx_pilot_map_real'], f['tx_pilot_map_imag']
raw_ch_est_input_real, raw_ch_est_input_imag = f['raw_ch_map_real'], f['raw_ch_map_imag']

Concatenate real and imaginary channel

rx_data_input = np.concatenate((rx_data_input_real, rx_data_input_imag), axis=-1)
tx_pilot_input = np.concatenate((tx_pilot_input_real, tx_pilot_input_imag), axis=-1)
raw_ch_est_input = np.concatenate((raw_ch_est_input_real, raw_ch_est_input_imag), axis=-1)

inputs = [rx_data_input, tx_pilot_input, raw_ch_est_input]

from tensorflow import keras
float_model = keras.models.load_model('float/deep_rx.h5')
from tensorflow_model_optimization.quantization.keras import vitis_quantize
quantizer = vitis_quantize.VitisQuantizer(float_model)
quantized_model = quantizer.quantize_model(calib_dataset=inputs, calib_step=100, calib_batch_size=1)

Response

(vitis-ai-tensorflow2) Vitis-AI /workspace/models/ChEstModel > python3 Deep_RX_ptq.py
2022-05-04 08:28:37.972104: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/xilinx/xrt/lib:/usr/lib:/usr/lib/x86_64-linux-gnu:/usr/local/lib:/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib
2022-05-04 08:28:37.972124: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2022-05-04 08:28:39.737146: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/xilinx/xrt/lib:/usr/lib:/usr/lib/x86_64-linux-gnu:/usr/local/lib:/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib
2022-05-04 08:28:39.737213: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303)
2022-05-04 08:28:39.737257: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (dre-elbe-s04): /proc/driver/nvidia/version does not exist
2022-05-04 08:28:39.737915: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "Deep_RX_ptq.py", line 23, in
float_model = keras.models.load_model('float/deep_rx.h5')
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/saving/save.py", line 201, in load_model
compile)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/saving/hdf5_format.py", line 199, in load_model_from_hdf5
training_config, custom_objects), from_serialized=True)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/saving/saving_utils.py", line 202, in compile_args_from_training_config
optimizer = optimizers.deserialize(optimizer_config)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/optimizers.py", line 99, in deserialize
printable_module_name='optimizer')
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 660, in deserialize_keras_object
config, module_objects, custom_objects, printable_module_name)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow2/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 561, in class_and_config_for_serialized_keras_object
.format(printable_module_name, class_name))
ValueError: Unknown optimizer: Addons>LAMB. Please ensure this object is passed to the custom_objects argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details

Error Running MNIST source 6_compile_zcu102.sh

Hello,

I am getting the following error:

(vitis-ai-tensorflow) Vitis-AI /workspace > source 6_compile_zcu102.sh

COMPILE ZCU102 STARTED..

[INFO] parse raw model : 12%|█▎ | 1/8 [00:00<00:00, 2714.76it/s]
[INFO] Namespace(inputs_shape=None, layout='NHWC', model_files=['./build/quantize/deploy_model.pb'], model_type='tensorflow', out_filename='./build/compile_zcu102/customcnn_org.xmodel', proto=None)
[INFO] tensorflow model: build/quantize/deploy_model.pb
Traceback (most recent call last):
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/bin/xnnc-run", line 33, in
sys.exit(load_entry_point('xnnc==1.3.0', 'console_scripts', 'xnnc-run')())
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/main.py", line 194, in main
normal_run(args)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/main.py", line 178, in normal_run
in_shapes=in_shapes if len(in_shapes) > 0 else None,
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/xconverter.py", line 131, in run
xmodel = CORE.make_xmodel(model_files, model_type, _layout, in_shapes)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/core.py", line 104, in make_xmodel
model_files, layout, in_shapes=in_shapes, model_type=model_t
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 97, in to_xmodel
model_name, raw_nodes, layout, in_shapes, model_fmt, model_type
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 161, in create_xmodel
xmodel = cls.__create_xmodel_from_tf1(name, layers, layout, in_shapes)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 243, in __create_xmodel_from_tf1
xmodel_name, layout, layers, const_layer_dict, super_const_dict, in_shapes
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 1847, in __generate_xmodel
), f"[ERROR] TF Conv2d requires two inputs: actual: {bottom}."
AssertionError: [ERROR] TF Conv2d requires two inputs: actual: ['images_in'].


  • VITIS_AI Compilation - Xilinx Inc.

script doesn't reference the same file as the explanation

In the section https://github.com/Xilinx/Vitis-AI-Tutorials/tree/master/Introduction/03-Basic/Module_2
The file targeted by :
get_image_video_zcu104.sh
are

wget -O vitis_ai_library_r1.3.0_images.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.3.0_images.tar.gz
wget -O vitis_ai_library_r1.3.0_video.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.3.0_video.tar.gz

However in the explanation the file are

root@xilinx-zcu104-2021_1:~# wget -O vitis_ai_library_r1.4.0_images.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.4.0_images.tar.gz
root@xilinx-zcu104-2021_1:~# wget -O vitis_ai_library_r1.4.0_video.tar.gz https://www.xilinx.com/bin/public/openDownload?filename=vitis_ai_library_r1.4.0_video.tar.gz 

14-caffe-ssd-pascal-Quantizing error

When I run ./quantize_and_compile.sh in docker, I find this:

I0215 02:19:57.546900 259 layer_factory.hpp:77] Creating layer data
I0215 02:19:57.546921 259 net.cpp:94] Creating Layer data
I0215 02:19:57.546927 259 net.cpp:409] data -> data
I0215 02:19:57.546942 259 net.cpp:409] data -> label
I0215 02:19:57.547078 259 image_data_layer.cpp:41] Opening file ../calibration.txt
I0215 02:19:57.547338 259 image_data_layer.cpp:51] Shuffling data
I0215 02:19:57.547390 259 image_data_layer.cpp:56] A total of 1000 images.
E0215 02:19:57.547423 259 io.cpp:145] Could not open or find file /data2/datasets/VOCdevkit/VOC2007/JPEGImages/000186.jpg
F0215 02:19:57.547428 259 image_data_layer.cpp:70] Check failed: cv_img.data Could not load 000186.jpg
*** Check failure stack trace: ***
Compiling network: vgg16_ssd
[INFO] Namespace(batchsize=1, inputs_shape=None, layout='NCHW', model_files=['quantize/deploy.caffemodel'], model_type='caffe', named_inputs_shape=None, out_filename='/tmp/vgg16_ssd_org.xmodel', proto='quantize/deploy.prototxt')
[ERROR] Not found the file or directory: /workspace/SSD/VAI/VGG16-SSD/quantize/deploy.caffemodel


  • VITIS_AI Compilation - Xilinx Inc.

(vitis-ai-caffe) Vitis-AI /workspace/SSD/VAI/VGG16-SSD >

The 000186.jpg is in /SSD/data/VOCdevkit/VOC2007/JPEGImages/000186.jpg seems that not in /data2/datasets/VOCdevkit/VOC2007/JPEGImages/000186.jpg. It seems the wrong address.

Another question is that I want to use kv260 arch.json, but arch=/opt/vitis_ai/compiler/arch/DPUCZDX8G/ZCU102/arch.json 2>&1 | tee ${output_dir}/compile.txt in quantize_and_compile.sh could not find anything, even not has /opt/vitis_ai in my ubuntu 20.04.

Could you please help me? Thank you in advance!

Error during the compilation of a neural network in Vitis AI, "Not found op in super_const_dict: name: Decoder_Section_1_UpConv_1/kernel"

I'm following this Xilinx Tutorial about the implementation of a U-Net in the ZCU104 Evaluation Board and I have come up with an error during the compilation step.

I've trained a U-Net in Matlab 2020b and exported to Keras via onnx2keras and followed the steps of the tutorial without any errors:

  • Verification of getting the same test scores in Matlab and in Vitis AI (correct exportation).
  • Transformation of the Keras model into TF checkpoint and inference graph.
  • Freezing the TF graph.
  • Quantization from 32-bit floating point to 8-bit fixed point.
  • Running the compiler (Error while parsing raw model).

The full error message is:

Traceback (most recent call last):

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/bin/xnnc-run", line 33, in module
sys.exit(load_entry_point('xnnc==1.4.0', 'console_scripts', 'xnnc-run')())

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/main.py", line 49, in main
runner.normal_run(args)

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/runner.py", line 123, in normal_run
target=target,

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/xconverter.py", line 145, in run
model_files, model_type, _layout, in_shapes, batchsize

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/core.py", line 123, in make_xmodel
model_type=model_t,

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 107, in to_xmodel
model_type,

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 173, in create_xmodel
name, layers, layout, in_shapes, batchsize

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 289, in __create_xmodel_from_tf1
batchsize,

File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/lib/python3.6/site-packages/xnnc/translator/tensorflow_translator.py", line 3192, in __generate_xmodel
), f"[ERROR] Not found op in super_const_dict: name: {weights_id}"
AssertionError: [ERROR] Not found op in super_const_dict: name: Decoder_Section_1_UpConv_1/kernel

At first, I thought that the compiler may not support certain layers such as Conv2DTransposed (a way of upsampling images) but even though the documentation says that the Tensorflow version needs to be higher than 2.0 and I'm using 1.15.2, the tutorial includes a U-Net made of those layers and I've compiled it without any problem so, that's not the problem, I think.

Then, I've decided to compare both neural networks after freezing and also after quantization, so as to try to find some information that may be missing in my U-Net that does include it the other one.

Inspection results after freezing. Op types used (my U-Net --> tutorial U-Net):

  • Const 78 --> 170
  • Identity 28 --> 127
  • BiasAdd 13 --> 23
  • LeakyRelu 12 --> ---
  • Relu -- --> 19
  • Conv2D 11 --> 19
  • FusedBatchNormV3 -- --> 18
  • Pad 10 --> ---
  • AddV2 8 --> ---
  • Sub 8 --> ---
  • StridedSlice 6 --> 12
  • Mul 4 --> 8
  • ConcatV2 2 --> 4
  • Conv2DBackpropInput 2 --> 4
  • MaxPool 2 --> 4
  • Pack 2 --> 4
  • Shape 2 --> 4
  • Placeholder 1 --> 1

There are differences between the two freezing processes as the two U-Nets are two different modified versions of the original one. However, as I see it, I don't think that LeakyRelu, Pad, AddV2 or Sub (the ones that appear in my model and not in the model of the tutorial) are related to the error.

Similarly, after quantization these are the differences. Op types used (my U-Net --> tutorial U-Net):

  • Const 78 --> 98
  • FixNeuron 48 --> 78
  • BiasAdd 13 --> 23
  • LeakyRelu 12 --> --
  • Relu -- --> 19
  • Convd2D 11 --> 19
  • Pad 10 --> --
  • AddV2 8 --> --
  • Sub 8 --> --
  • StridedSlice 6 --> 12
  • Mul 4 --> 8
  • ConcatV2 2 --> 4
  • Conv2DBackpropInput 2 --> 4
  • MaxPool 2 --> 4
  • Pack 2 --> 4
  • Shape 2 --> 4
  • Placeholder 1 --> 1

I don't know exactly where the error comes from so any kind of help would be highly appreciated.

Thanks in advance,

Jon.

Caffe model trainning dog vs cat

Hello i start with tutorial on Vitis-AI and found some promblem on training state.

###my environment is
Docker : vitis-ai-gpu:2.0.0.1103
GPU : RTX2080
CPU : Ryzen7 2700x
OS : Ubuntu 18.04[WSL2]

Follow from 01-caffe_cats_vs_dogs --> In the 6 topic : Python and shell script , I'm testing script on this step but the .caffemodel isn't here
So I train the model by myself from provided script but I stuck on training process when run this, like this
image
The log provided logfile_caffe_alexnetBNnoLRN.txt

From my continue debugging the tutorial i think it's stuck on the training process from caffe script link

Can you advise me for continue debug or solution for training AI from this state.

Caffe model training +dog cat classification

Hi,
I am performing training procedure for the caffe model i.e. 01-caffe_cats_vs_dogs. I am facing below issue during training.

I0210 09:24:31.278432 2794 caffe.cpp:247] Starting Optimization
I0210 09:24:31.278439 2794 solver.cpp:341] Solving alexnetBNnoLRN m2 (as m3 but less DROP and less BN)
I0210 09:24:31.278442 2794 solver.cpp:342] Learning Rate Policy: step
I0210 09:24:31.279312 2794 solver.cpp:424] Iteration 0, Testing net (#0)
I0210 09:24:32.102056 2794 solver.cpp:523] Test net output #0: accuracy = 0.5
I0210 09:24:32.102087 2794 solver.cpp:523] Test net output #1: loss = 0.693147 (* 1 = 0.693147 loss)
I0210 09:24:32.102092 2794 solver.cpp:523] Test net output #2: top-1 = 0.5
F0210 09:24:32.151126 2794 math_functions.cu:27] Check failed: status == CUBLAS_STATUS_SUCCESS (13 vs. 0) CUBLAS_STATUS_EXECUTION_FAILED
*** Check failure stack trace: ***
@ 0x7f60598814dd google::LogMessage::Fail()
@ 0x7f6059889071 google::LogMessage::SendToLog()
@ 0x7f6059880ecd google::LogMessage::Flush()
@ 0x7f605988276a google::LogMessageFatal::~LogMessageFatal()
@ 0x7f605863c24a caffe::caffe_gpu_gemm<>()
@ 0x7f60585e248c caffe::InnerProductLayer<>::Backward_gpu()
@ 0x7f6058458be3 caffe::Net<>::BackwardFromTo()
@ 0x7f6058458d3f caffe::Net<>::Backward()
@ 0x7f60584bdc4c caffe::Solver<>::Step()
@ 0x7f60584be791 caffe::Solver<>::Solve()
@ 0x55d5cd3a35ce train()
@ 0x55d5cd39ca59 main
@ 0x7f6056c29bf7 __libc_start_main
@ 0x55d5cd39d6a8 (unknown)
Aborted (core dumped)

Elapsed time for Caffe training (s): 1077.31017

How can I solve this issue?

Problem testing Desing_Tutorial/07-yolov4-tutorial's Yolov4

Dear all,
I am trying to follow the instruction in Desing_Tutorial/07-yolov4-tutorial to run the network on my board (DPUCZDX8G_ISA0_B4096_MAX_BG2). Up to step 2.4 (Model Deployment) everything seems to work fine and I am able to evaluate the network using the tf_eval_yolov4_coco_2017.py script. Here, the results are not good, but I get no errors.
The quantization and compilation processes finish correctly. The problem is when I try to run the network on the board. Specifically, in the program test_jpeg_yolov4, the execution seems to stall when it gives the image as input to the network. I have read the code that executes the network, here is the code of the function:

// Entrance of jpeg demo
template <typename FactoryMethod, typename ProcessResult>
int main_for_jpeg_demo(int argc, char *argv[],
                       const FactoryMethod &factory_method,
                       const ProcessResult &process_result, int start_pos = 1) {
  if (argc <= 1) {
    usage_jpeg(argv[0]);
    exit(1);
  }
  auto model = factory_method();
  for (int i = start_pos; i < argc; ++i) {
    auto image_file_name = std::string{argv[i]};
    auto image = cv::imread(image_file_name);
    if (image.empty()) {
      LOG(FATAL) << "cannot load " << image_file_name << std::endl;
      abort();
    }
    auto result = model->run(image);
    image = process_result(image, result, true);
    auto out_file =
        image_file_name.substr(0, image_file_name.size() - 4) + "_result.jpg";
    cv::imwrite(out_file, image);
    LOG_IF(INFO, ENV_PARAM(DEBUG_DEMO)) << "result image write to " << out_file;
  }
  LOG_IF(INFO, ENV_PARAM(DEBUG_DEMO)) << "BYEBYE";
  return 0;
}

When it enters result = model->run(image); the program seems to enter an infinite loop. I tried to wait more than 24 hours to see if the execution could calculate the results, but the program never reaches the next instruction (image = process_result(image, result, true);).

What can cause this problem? Has anyone already experienced similar problems?

Many thanks

[Keras-GoogleNet-ResNet] compile_target.sh

hey I'd like to ask something..
I check out this tutorial and found out something to strange in compile_target.sh

copy file to same file... is that right? please check.. help me!!

CNN=miniResNet

compile the executable for target board

top5

cp ./src/top5_tf_main.cc ./tf_main.cc
cp ./model/dpu_${CNN}0.elf ./model/dpu${CNN}0.elf
make clean
make
mv ./${CNN} ./top5
${CNN}

fps

cp ./src/fps_tf_main.cc ./tf_main.cc
cp ./model/dpu_${CNN}0.elf ./model/dpu${CNN}0.elf
make clean
make
mv ./${CNN} ./fps
${CNN}

~
~

Train Darknet yolov4 voc-based

Dear all.
I'm following the tutorial for object detection on a voc based yolov4 Darknet (https://github.com/Xilinx/Vitis-AI-Tutorials/tree/master/Design_Tutorials/07-yolov4-tutorial#31-darknet-model-training-on-voc) and trying to train the net, but this time using a gtsdb dataset (German traffic lights signs), with the command

./darknet detector train cfg/voc.data cfg/yolov4.cfg /yolov4.weights -map -dont_show -show_imgs

of course I edited the "voc.data" in order to point the right gtsdb files, I just forgot to rename that file.
I edited the cfg files as requested, and the voc.data too.
I'm working on a ubuntu VM (god..) I'd need some hints and answers about the training process:
1)After running the train command, should I stop it manually (Ctrl-C) just after I realized the training converged properly, or not?
2)Training convergence in this case means that the loss (or mAP ?) stops decreasing? I used -map parameter but I honestly don't understand where that information is. This is a piece of output:

v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 133 Avg (IOU: 0.419995), count: 34, class_loss = 3764.068115, iou_loss = 28.210449, total_loss = 3792.278564 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 144 Avg (IOU: 0.253230), count: 5, class_loss = 1026.847412, iou_loss = 0.271484, total_loss = 1027.118896 
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 155 Avg (IOU: 0.000000), count: 1, class_loss = 268.681885, iou_loss = 0.000000, total_loss = 268.681885 
 total_bbox = 39, rewritten_bbox = 0.000000 % 

500504: 1169.513062, 1602.209351 avg loss, 0.000013 rate, 15818.413367 seconds, 32032256 images, 2120438.711756 hours left

  1. MY MAIN PROBLEM: how can I save the weights during training?
    3.1) and where can I find those .weights files? in the voc.data file I specified "backup = ./backup"
    I've let that process run for all night but still I can't see any weight file saved during training. Maybe is it just a matter of time?

  2. In the output, which is the number of the current iteration?
    4.1) 1 iteration == 1 epoch ?

Thank you for your time

Can't use custom Yolov4 model with the Vitis AI Library (no prototxt file)

Hi,

I am working on using the Vitis AI library with a custom Yolov4 model.

I have followed the steps of this tutorial (convert Darknet to TensorFlow, freeze, quantize, compile) : https://github.com/Xilinx/Vitis-Tutorials/tree/master/Machine_Learning/Design_Tutorials/07-yolov4-tutorial

I am using an Alveo U280 card, the Vitis AI Docker Image for CPU, and the TensorFlow 1 framework.

To deploy the model, I copied the folder obtained from the compilation step to the path " /usr/share/vitis_ai_library/models" (let "yolov4" be the name of the custom model and output folder) in order to be read by the Vitis AI library.

Here is the content of the folder :

image

And here is the content of a standard model from Model Zoo (https://github.com/Xilinx/Vitis-AI/blob/master/models/AI-Model-Zoo/model-list/dk_yolov3_bdd_288_512_53.7G_1.3/model.yaml) :

image

It seems that the meta.json "replaces" the model.prototxt.

I then ran the example code from https://github.com/Xilinx/Vitis-AI/tree/master/demo/Vitis-AI-Library/samples/yolov4

cd /usr/share/vitis_ai_library/samples/yolov4 
./test_video_yolov4 yolov4

Here is the error message when I try to run the application.

image

The model name is the parameter of the following line of code :

vitis::ai::YOLOv3::create(model);

Maybe I am missing an argument when I run the vai_c_tensorflow command when compiling the model.

vai_c_tensorflow \
		--frozen_pb  ${QUANT}/quantize_eval_model.pb \
		--arch       ${ARCH} \
		--output_dir ${COMPILE} \
		--net_name   ${MODEL_NAME} \
		--options "{'mode':'normal','save_kernel':'', 'input_shape':'1,416,416,3'}"  

I would greatly appreciate your help.
Best regards,

Luc

The compiled model failed in "overly.load_model"

Hi all,

I followed the example in https://github.com/Xilinx/Vitis-AI-Tutorials/tree/master/Design_Tutorials/09-mnist_pyt and successfully quantized my model and then compiled it to be CNN_zcu102.xmodel.

However, when I load the xmodel to the pynq dpu overlay (KV260), it showed the following error. Any advice on the problem?

926cdd1709148168e2024ccb8ed3baa

It is noteworthy that I can successfully load the xmodel compiled from the example model.
The problem comes only when I change to my model, and there are no warnings during quantization and compiling.

I attach the quantized model for your reference.
`

GENETARED BY NNDCT, DO NOT EDIT!

import torch
import pytorch_nndct as py_nndct
class PoseResNet(torch.nn.Module):
def init(self):
super(PoseResNet, self).init()
self.module_0 = py_nndct.nn.Input() #PoseResNet::input_0
self.module_1 = py_nndct.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Conv2d[conv1]/input.2
self.module_3 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/ReLU[relu]/3604
self.module_4 = py_nndct.nn.MaxPool2d(kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], ceil_mode=False) #PoseResNet::PoseResNet/MaxPool2d[maxpool]/input.4
self.module_5 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv1]/input.5
self.module_7 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/input.7
self.module_8 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv2]/input.8
self.module_10 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/input.9
self.module_11 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/input.10
self.module_12 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv1]/input.11
self.module_14 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/input.13
self.module_15 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv2]/input.14
self.module_17 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/input.15
self.module_18 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/input.16
self.module_19 = py_nndct.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv1]/input.17
self.module_21 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/input.19
self.module_22 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv2]/input.20
self.module_24 = py_nndct.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.21
self.module_26 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/input.22
self.module_27 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/input.23
self.module_28 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv1]/input.24
self.module_30 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/input.26
self.module_31 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv2]/input.27
self.module_33 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/input.28
self.module_34 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/input.29
self.module_35 = py_nndct.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv1]/input.30
self.module_37 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/input.32
self.module_38 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv2]/input.33
self.module_40 = py_nndct.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.34
self.module_42 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/input.35
self.module_43 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/input.36
self.module_44 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv1]/input.37
self.module_46 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/input.39
self.module_47 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv2]/input.40
self.module_49 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/input.41
self.module_50 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/input.42
self.module_51 = py_nndct.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv1]/input.43
self.module_53 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/input.45
self.module_54 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv2]/input.46
self.module_56 = py_nndct.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.47
self.module_58 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/input.48
self.module_59 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/input.49
self.module_60 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv1]/input.50
self.module_62 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/input.52
self.module_63 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv2]/input.53
self.module_65 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/input.54
self.module_66 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/4125
self.module_67 = py_nndct.nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[0]/input.55
self.module_69 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[2]/4151
self.module_70 = py_nndct.nn.ConvTranspose2d(in_channels=256, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[3]/input.57
self.module_72 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[5]/4177
self.module_73 = py_nndct.nn.ConvTranspose2d(in_channels=256, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[6]/input.59
self.module_75 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[8]/input.61
self.module_76 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/Conv2d[0]/input.62
self.module_77 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/ReLU[1]/input.63
self.module_78 = py_nndct.nn.Conv2d(in_channels=64, out_channels=3, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/Conv2d[2]/4242
self.module_79 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/Conv2d[0]/input.64
self.module_80 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/ReLU[1]/input.65
self.module_81 = py_nndct.nn.Conv2d(in_channels=64, out_channels=2, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/Conv2d[2]/4281
self.module_82 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[direction]/Conv2d[0]/input.66
self.module_83 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[direction]/ReLU[1]/input.67
self.module_84 = py_nndct.nn.Conv2d(in_channels=64, out_channels=2, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[direction]/Conv2d[2]/4320
self.module_85 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[z_coor]/Conv2d[0]/input.68
self.module_86 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[z_coor]/ReLU[1]/input.69
self.module_87 = py_nndct.nn.Conv2d(in_channels=64, out_channels=1, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[z_coor]/Conv2d[2]/4359
self.module_88 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[dim]/Conv2d[0]/input.70
self.module_89 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[dim]/ReLU[1]/input
self.module_90 = py_nndct.nn.Conv2d(in_channels=64, out_channels=3, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[dim]/Conv2d[2]/4398

def forward(self, *args):
    output_module_0 = self.module_0(input=args[0])
    output_module_0 = self.module_1(output_module_0)
    output_module_0 = self.module_3(output_module_0)
    output_module_0 = self.module_4(output_module_0)
    output_module_5 = self.module_5(output_module_0)
    output_module_5 = self.module_7(output_module_5)
    output_module_5 = self.module_8(output_module_5)
    output_module_5 = self.module_10(input=output_module_5, other=output_module_0, alpha=1)
    output_module_5 = self.module_11(output_module_5)
    output_module_12 = self.module_12(output_module_5)
    output_module_12 = self.module_14(output_module_12)
    output_module_12 = self.module_15(output_module_12)
    output_module_12 = self.module_17(input=output_module_12, other=output_module_5, alpha=1)
    output_module_12 = self.module_18(output_module_12)
    output_module_19 = self.module_19(output_module_12)
    output_module_19 = self.module_21(output_module_19)
    output_module_19 = self.module_22(output_module_19)
    output_module_24 = self.module_24(output_module_12)
    output_module_19 = self.module_26(input=output_module_19, other=output_module_24, alpha=1)
    output_module_19 = self.module_27(output_module_19)
    output_module_28 = self.module_28(output_module_19)
    output_module_28 = self.module_30(output_module_28)
    output_module_28 = self.module_31(output_module_28)
    output_module_28 = self.module_33(input=output_module_28, other=output_module_19, alpha=1)
    output_module_28 = self.module_34(output_module_28)
    output_module_35 = self.module_35(output_module_28)
    output_module_35 = self.module_37(output_module_35)
    output_module_35 = self.module_38(output_module_35)
    output_module_40 = self.module_40(output_module_28)
    output_module_35 = self.module_42(input=output_module_35, other=output_module_40, alpha=1)
    output_module_35 = self.module_43(output_module_35)
    output_module_44 = self.module_44(output_module_35)
    output_module_44 = self.module_46(output_module_44)
    output_module_44 = self.module_47(output_module_44)
    output_module_44 = self.module_49(input=output_module_44, other=output_module_35, alpha=1)
    output_module_44 = self.module_50(output_module_44)
    output_module_51 = self.module_51(output_module_44)
    output_module_51 = self.module_53(output_module_51)
    output_module_51 = self.module_54(output_module_51)
    output_module_56 = self.module_56(output_module_44)
    output_module_51 = self.module_58(input=output_module_51, other=output_module_56, alpha=1)
    output_module_51 = self.module_59(output_module_51)
    output_module_60 = self.module_60(output_module_51)
    output_module_60 = self.module_62(output_module_60)
    output_module_60 = self.module_63(output_module_60)
    output_module_60 = self.module_65(input=output_module_60, other=output_module_51, alpha=1)
    output_module_60 = self.module_66(output_module_60)
    output_module_60 = self.module_67(output_module_60)
    output_module_60 = self.module_69(output_module_60)
    output_module_60 = self.module_70(output_module_60)
    output_module_60 = self.module_72(output_module_60)
    output_module_60 = self.module_73(output_module_60)
    output_module_60 = self.module_75(output_module_60)
    output_module_76 = self.module_76(output_module_60)
    output_module_76 = self.module_77(output_module_76)
    output_module_76 = self.module_78(output_module_76)
    output_module_79 = self.module_79(output_module_60)
    output_module_79 = self.module_80(output_module_79)
    output_module_79 = self.module_81(output_module_79)
    output_module_82 = self.module_82(output_module_60)
    output_module_82 = self.module_83(output_module_82)
    output_module_82 = self.module_84(output_module_82)
    output_module_85 = self.module_85(output_module_60)
    output_module_85 = self.module_86(output_module_85)
    output_module_85 = self.module_87(output_module_85)
    output_module_88 = self.module_88(output_module_60)
    output_module_88 = self.module_89(output_module_88)
    output_module_88 = self.module_90(output_module_88)
    return output_module_76,output_module_79,output_module_82,output_module_85,output_module_88

`

running tutorials on ZCU102 DPU-TRD

I'm trying to run the MNIST-Classification-TensorFlow tutorial on the ZCU102.
I went through all the steps and generated the .elf file for the DPU. I've loaded a pre-compiled DPU-TRD image from https://www.xilinx.com/member/forms/download/design-license-xef.html?filename=zcu102-dpu-trd-2019-1-190809.zip which boots on my ZCU102 without issues.
The problem is that it appears that the above DPU-TRD image doesn't contain python packages needed by the script in Vitis-AI-Tutorials/files/build/target_zcu102/app_mt.py. Should I use a different precompiled DPU-TRD image?

root@zcu102-dpu-trd-2019:~# python3 app_mt.py -m model_dir/dpu_customcnn.elf 
Traceback (most recent call last):
  File "app_mt.py", line 17, in <module>
    import runner
ImportError: No module named 'runner'

The resnet50 example, which is part of the DPU-TRD image works fine

The compiled model failed to load with Pynq "overly.load_model"

Hi all,

I followed the example in https://github.com/Xilinx/Vitis-AI-Tutorials/tree/master/Design_Tutorials/09-mnist_pyt and successfully quantized my model and then compiled it to be CNN_zcu102.xmodel.

However, when I load the xmodel to the pynq dpu overlay (KV260), it showed the following error. Any advice on the problem?

926cdd1709148168e2024ccb8ed3baa

It is noteworthy that I can successfully load the xmodel compiled from the example model.
The problem comes only when I change to my model, and there are no warnings during quantization and compiling.
In addition, I am using the same arch as in the example.

I attach the quantized model for your reference.

`

GENETARED BY NNDCT, DO NOT EDIT!

import torch
import pytorch_nndct as py_nndct
class PoseResNet(torch.nn.Module):
def init(self):
super(PoseResNet, self).init()
self.module_0 = py_nndct.nn.Input() #PoseResNet::input_0
self.module_1 = py_nndct.nn.Conv2d(in_channels=3, out_channels=64, kernel_size=[7, 7], stride=[2, 2], padding=[3, 3], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Conv2d[conv1]/input.2
self.module_3 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/ReLU[relu]/3604
self.module_4 = py_nndct.nn.MaxPool2d(kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], ceil_mode=False) #PoseResNet::PoseResNet/MaxPool2d[maxpool]/input.4
self.module_5 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv1]/input.5
self.module_7 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/input.7
self.module_8 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/Conv2d[conv2]/input.8
self.module_10 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/input.9
self.module_11 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[0]/ReLU[relu]/input.10
self.module_12 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv1]/input.11
self.module_14 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/input.13
self.module_15 = py_nndct.nn.Conv2d(in_channels=64, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/Conv2d[conv2]/input.14
self.module_17 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/input.15
self.module_18 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer1]/BasicBlock[1]/ReLU[relu]/input.16
self.module_19 = py_nndct.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv1]/input.17
self.module_21 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/input.19
self.module_22 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Conv2d[conv2]/input.20
self.module_24 = py_nndct.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.21
self.module_26 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/input.22
self.module_27 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[0]/ReLU[relu]/input.23
self.module_28 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv1]/input.24
self.module_30 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/input.26
self.module_31 = py_nndct.nn.Conv2d(in_channels=128, out_channels=128, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/Conv2d[conv2]/input.27
self.module_33 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/input.28
self.module_34 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer2]/BasicBlock[1]/ReLU[relu]/input.29
self.module_35 = py_nndct.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv1]/input.30
self.module_37 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/input.32
self.module_38 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Conv2d[conv2]/input.33
self.module_40 = py_nndct.nn.Conv2d(in_channels=128, out_channels=256, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.34
self.module_42 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/input.35
self.module_43 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[0]/ReLU[relu]/input.36
self.module_44 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv1]/input.37
self.module_46 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/input.39
self.module_47 = py_nndct.nn.Conv2d(in_channels=256, out_channels=256, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/Conv2d[conv2]/input.40
self.module_49 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/input.41
self.module_50 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer3]/BasicBlock[1]/ReLU[relu]/input.42
self.module_51 = py_nndct.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=[3, 3], stride=[2, 2], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv1]/input.43
self.module_53 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/input.45
self.module_54 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Conv2d[conv2]/input.46
self.module_56 = py_nndct.nn.Conv2d(in_channels=256, out_channels=512, kernel_size=[1, 1], stride=[2, 2], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/Sequential[downsample]/Conv2d[0]/input.47
self.module_58 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/input.48
self.module_59 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[0]/ReLU[relu]/input.49
self.module_60 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv1]/input.50
self.module_62 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/input.52
self.module_63 = py_nndct.nn.Conv2d(in_channels=512, out_channels=512, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/Conv2d[conv2]/input.53
self.module_65 = py_nndct.nn.Add() #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/input.54
self.module_66 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[layer4]/BasicBlock[1]/ReLU[relu]/4125
self.module_67 = py_nndct.nn.ConvTranspose2d(in_channels=512, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[0]/input.55
self.module_69 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[2]/4151
self.module_70 = py_nndct.nn.ConvTranspose2d(in_channels=256, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[3]/input.57
self.module_72 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[5]/4177
self.module_73 = py_nndct.nn.ConvTranspose2d(in_channels=256, out_channels=256, kernel_size=[4, 4], stride=[2, 2], padding=[1, 1], output_padding=[0, 0], groups=1, bias=True, dilation=[1, 1]) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ConvTranspose2d[6]/input.59
self.module_75 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[deconv_layers]/ReLU[8]/input.61
self.module_76 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/Conv2d[0]/input.62
self.module_77 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/ReLU[1]/input.63
self.module_78 = py_nndct.nn.Conv2d(in_channels=64, out_channels=3, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[hm_cen]/Conv2d[2]/4242
self.module_79 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/Conv2d[0]/input.64
self.module_80 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/ReLU[1]/input.65
self.module_81 = py_nndct.nn.Conv2d(in_channels=64, out_channels=2, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[cen_offset]/Conv2d[2]/4281
self.module_82 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[direction]/Conv2d[0]/input.66
self.module_83 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[direction]/ReLU[1]/input.67
self.module_84 = py_nndct.nn.Conv2d(in_channels=64, out_channels=2, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[direction]/Conv2d[2]/4320
self.module_85 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[z_coor]/Conv2d[0]/input.68
self.module_86 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[z_coor]/ReLU[1]/input.69
self.module_87 = py_nndct.nn.Conv2d(in_channels=64, out_channels=1, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[z_coor]/Conv2d[2]/4359
self.module_88 = py_nndct.nn.Conv2d(in_channels=256, out_channels=64, kernel_size=[3, 3], stride=[1, 1], padding=[1, 1], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[dim]/Conv2d[0]/input.70
self.module_89 = py_nndct.nn.ReLU(inplace=True) #PoseResNet::PoseResNet/Sequential[dim]/ReLU[1]/input
self.module_90 = py_nndct.nn.Conv2d(in_channels=64, out_channels=3, kernel_size=[1, 1], stride=[1, 1], padding=[0, 0], dilation=[1, 1], groups=1, bias=True) #PoseResNet::PoseResNet/Sequential[dim]/Conv2d[2]/4398

def forward(self, *args):
    output_module_0 = self.module_0(input=args[0])
    output_module_0 = self.module_1(output_module_0)
    output_module_0 = self.module_3(output_module_0)
    output_module_0 = self.module_4(output_module_0)
    output_module_5 = self.module_5(output_module_0)
    output_module_5 = self.module_7(output_module_5)
    output_module_5 = self.module_8(output_module_5)
    output_module_5 = self.module_10(input=output_module_5, other=output_module_0, alpha=1)
    output_module_5 = self.module_11(output_module_5)
    output_module_12 = self.module_12(output_module_5)
    output_module_12 = self.module_14(output_module_12)
    output_module_12 = self.module_15(output_module_12)
    output_module_12 = self.module_17(input=output_module_12, other=output_module_5, alpha=1)
    output_module_12 = self.module_18(output_module_12)
    output_module_19 = self.module_19(output_module_12)
    output_module_19 = self.module_21(output_module_19)
    output_module_19 = self.module_22(output_module_19)
    output_module_24 = self.module_24(output_module_12)
    output_module_19 = self.module_26(input=output_module_19, other=output_module_24, alpha=1)
    output_module_19 = self.module_27(output_module_19)
    output_module_28 = self.module_28(output_module_19)
    output_module_28 = self.module_30(output_module_28)
    output_module_28 = self.module_31(output_module_28)
    output_module_28 = self.module_33(input=output_module_28, other=output_module_19, alpha=1)
    output_module_28 = self.module_34(output_module_28)
    output_module_35 = self.module_35(output_module_28)
    output_module_35 = self.module_37(output_module_35)
    output_module_35 = self.module_38(output_module_35)
    output_module_40 = self.module_40(output_module_28)
    output_module_35 = self.module_42(input=output_module_35, other=output_module_40, alpha=1)
    output_module_35 = self.module_43(output_module_35)
    output_module_44 = self.module_44(output_module_35)
    output_module_44 = self.module_46(output_module_44)
    output_module_44 = self.module_47(output_module_44)
    output_module_44 = self.module_49(input=output_module_44, other=output_module_35, alpha=1)
    output_module_44 = self.module_50(output_module_44)
    output_module_51 = self.module_51(output_module_44)
    output_module_51 = self.module_53(output_module_51)
    output_module_51 = self.module_54(output_module_51)
    output_module_56 = self.module_56(output_module_44)
    output_module_51 = self.module_58(input=output_module_51, other=output_module_56, alpha=1)
    output_module_51 = self.module_59(output_module_51)
    output_module_60 = self.module_60(output_module_51)
    output_module_60 = self.module_62(output_module_60)
    output_module_60 = self.module_63(output_module_60)
    output_module_60 = self.module_65(input=output_module_60, other=output_module_51, alpha=1)
    output_module_60 = self.module_66(output_module_60)
    output_module_60 = self.module_67(output_module_60)
    output_module_60 = self.module_69(output_module_60)
    output_module_60 = self.module_70(output_module_60)
    output_module_60 = self.module_72(output_module_60)
    output_module_60 = self.module_73(output_module_60)
    output_module_60 = self.module_75(output_module_60)
    output_module_76 = self.module_76(output_module_60)
    output_module_76 = self.module_77(output_module_76)
    output_module_76 = self.module_78(output_module_76)
    output_module_79 = self.module_79(output_module_60)
    output_module_79 = self.module_80(output_module_79)
    output_module_79 = self.module_81(output_module_79)
    output_module_82 = self.module_82(output_module_60)
    output_module_82 = self.module_83(output_module_82)
    output_module_82 = self.module_84(output_module_82)
    output_module_85 = self.module_85(output_module_60)
    output_module_85 = self.module_86(output_module_85)
    output_module_85 = self.module_87(output_module_85)
    output_module_88 = self.module_88(output_module_60)
    output_module_88 = self.module_89(output_module_88)
    output_module_88 = self.module_90(output_module_88)
    return output_module_76,output_module_79,output_module_82,output_module_85,output_module_88

`

Running on ZCU102 board with See3CAM

Typed:

root@xilinx-zcu102-2021_2:~/Vitis-AI/demo/Vitis-AI-Library/samples/yolov4# ./test_video_yolov4 dpu_yolov4 0 -t 6
[ WARN:0] global /usr/src/debug/opencv/4.4.0-r0/git/modules/videoio/src/cap_gstreamer.cpp (935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1

Just locks up / doing nothing that I can see

Monitor is connected via the Display port.

07-yolov4-tutorial Issue

I noticed that the TensorFlow yolov4 example is removed from the repo. In the READ ME file, the instructions said to use the pre-trained weights yolov4-leaky_best.weights.7z.001 . I could not find them in this repo. Plus, can you provide me with the pre-trained weights for the yolov4.cfg model ? Also, I wanted to know, why this example is removed from the VitisAI Design tutorials ?

https://github.com/Xilinx/Vitis-AI-Tutorials/tree/c04bd9ae660b90deb4ff88c81c95642e863817fe/Design_Tutorials/07-yolov4-tutorial

Thanks.

Timing Failure in Vitis Build

Bitstream Generation fails to complete with the error:

VPL-4: design did not meet timing - Design failed to meet timing.

Error from the Vivado (v2019.2.1) build log:

ERROR: [runtcl-1] design did not meet timing - Design failed to meet timing.
Failed timing checks (paths):
{ultra96v2_mipi_i/dpu_xrt_top_1/inst/u_631818d4/m_43dd20ae/u_b2263e3b/s_189e67da_reg[0]/C --> ultra96v2_mipi_i/axi_intc_0/U0/INTC_CORE_I/INTR_DETECT_GEN[0].LVL_DETECT_GEN.hw_intr_reg[0]/D}

Xilinx vitis_ai 1.2 tutorials and docker

Hello,

could you please confirm whether running

./docker_run.sh xilinx/vitis-ai-cpu:1.2.82

rather than

:latest

which is 1.3.x is the proper way to work in vitis_ai 1.2 ?

Seems that "lastest" is hard coded in some of the scripts.

Thx Gerd

Vitis tutorial from scratch

Hi
I downloaded Vitis IDE and docker. Is there any tutorial that can guide us to deploy our own model from scratch using the IDE or docker?

Thanks.

08-tf2_flow tutorial question

Hi,

After testing this design example, I noticed that any image not containing a cat or a dog will be miss classified as a cat. Is there a way to fix this in the model? for instance

car001

snowby

xilinx-k26-starterkit-2021_2:~/target_kv260$ python3 app_single.py -i car001.jpg
Command line options:
 --image_dir :  images
 --image     :  car001.jpg
 --threads   :  1
 --model     :  customcnn.xmodel
Starting 1 threads...
image classified as : cat
xilinx-k26-starterkit-2021_2:~/target_kv260$ python3 app_single.py -i snowby.jpg
Command line options:
 --image_dir :  images
 --image     :  snowby.jpg
 --threads   :  1
 --model     :  customcnn.xmodel
Starting 1 threads...
image classified as : cat

Here is my modified script to classify a single image

'''
Copyright 2020 Xilinx Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
'''

from ctypes import *
from typing import List
import cv2
import numpy as np
import vart
import os
import pathlib
import xir
import threading
import time
import sys
import argparse

divider = '------------------------------------'

def preprocess_fn(image_path, fix_scale):
    '''
    Image pre-processing.
    Rearranges from BGR to RGB then normalizes to range 0:1
    and then scales by input quantization scaling factor
    input arg: path of image file
    return: numpy array
    '''
    image = cv2.imread(image_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = image * (1/255.0) * fix_scale
    image = image.astype(np.int8)
    return image


def get_child_subgraph_dpu(graph: "Graph") -> List["Subgraph"]:
    assert graph is not None, "'graph' should not be None."
    root_subgraph = graph.get_root_subgraph()
    assert (root_subgraph is not None), "Failed to get root subgraph of input Graph object."
    if root_subgraph.is_leaf:
        return []
    child_subgraphs = root_subgraph.toposort_child_subgraph()
    assert child_subgraphs is not None and len(child_subgraphs) > 0
    return [
        cs
        for cs in child_subgraphs
        if cs.has_attr("device") and cs.get_attr("device").upper() == "DPU"
    ]


def runDPU(id,start,dpu,img):
    '''get tensor'''
    inputTensors = dpu.get_input_tensors()
    outputTensors = dpu.get_output_tensors()
    input_ndim = tuple(inputTensors[0].dims)
    output_ndim = tuple(outputTensors[0].dims)

    # we can avoid output scaling if use argmax instead of softmax
    #output_fixpos = outputTensors[0].get_attr("fix_point")
    #output_scale = 1 / (2**output_fixpos)

    batchSize = input_ndim[0]
    n_of_images = len(img)
    count = 0
    write_index = start
    ids=[]
    ids_max = 50
    outputData = []
    for i in range(ids_max):
        outputData.append([np.empty(output_ndim, dtype=np.int8, order="C")])
    while count < n_of_images:
        if (count+batchSize<=n_of_images):
            runSize = batchSize
        else:
            runSize=n_of_images-count

        '''prepare batch input/output '''
        inputData = []
        inputData = [np.empty(input_ndim, dtype=np.int8, order="C")]

        '''init input image to input buffer '''
        for j in range(runSize):
            imageRun = inputData[0]
            imageRun[j, ...] = img[(count + j) % n_of_images].reshape(input_ndim[1:])
        '''run with batch '''
        job_id = dpu.execute_async(inputData,outputData[len(ids)])
        ids.append((job_id,runSize,start+count))
        count = count + runSize 
        if count<n_of_images:
            if len(ids) < ids_max-1:
                continue
        for index in range(len(ids)):
            dpu.wait(ids[index][0])
            write_index = ids[index][2]
            '''store output vectors '''
            for j in range(ids[index][1]):
                # we can avoid output scaling if use argmax instead of softmax
                # out_q[write_index] = np.argmax(outputData[0][j] * output_scale)
                out_q[write_index] = np.argmax(outputData[index][0][j])
                write_index += 1
        ids=[]


def app(image_dir, image, threads,model):

    global out_q
    out_q = [None]
    g = xir.Graph.deserialize(model)
    subgraphs = get_child_subgraph_dpu(g)
    all_dpu_runners = []
    for i in range(threads):
        all_dpu_runners.append(vart.Runner.create_runner(subgraphs[0], "run"))

    # input scaling
    input_fixpos = all_dpu_runners[0].get_input_tensors()[0].get_attr("fix_point")
    input_scale = 2**input_fixpos
    

    ''' preprocess images '''
    img = []
    path = os.path.join(image_dir,image)
    img.append(preprocess_fn(path, input_scale))

    '''run threads '''
    print('Starting',threads,'threads...')
    threadAll = []
    start=0
    for i in range(threads):
        if (i==threads-1):
            end = len(img)
        else:
            end = start+(len(img)//threads)
        in_q = img[start:end]
        t1 = threading.Thread(target=runDPU, args=(i,start,all_dpu_runners[i], in_q))
        threadAll.append(t1)
        start=end

    for x in threadAll:
        x.start()
    for x in threadAll:
        x.join()
    
    classes = ['dog','cat']
    prediction = classes[out_q[0]]
    print("image classified as : %s" % prediction)

    return



# only used if script is run as 'main' from command line
def main():

  # construct the argument parser and parse the arguments
  ap = argparse.ArgumentParser()  
  ap.add_argument('-d', '--image_dir', type=str, default='images', help='Path to folder of images. Default is images')  
  ap.add_argument('-i', '--image', type=str, default='cat.27.jpg', help='Path to  image. Default is 001.jpg')  
  ap.add_argument('-t', '--threads',   type=int, default=1,        help='Number of threads. Default is 1')
  ap.add_argument('-m', '--model',     type=str, default='customcnn.xmodel', help='Path of xmodel. Default is customcnn.xmodel')

  args = ap.parse_args()  
  
  print ('Command line options:')
  print (' --image_dir : ', args.image_dir)
  print (' --image     : ', args.image)
  print (' --threads   : ', args.threads)
  print (' --model     : ', args.model)

  app(args.image_dir,args.image,args.threads,args.model)

if __name__ == '__main__':
  main()

Thanks,

14-caffe-ssd-pascalnot converg

I have done a training on this model for VOC, follow every step of the tutorial and after some long time it seems the training did not converge, is that the right term. After running the score.sh script on the snapshot_iter_120000.caffemodel I am getting (end of log)

I0311 19:11:02.286180   515 net.cpp:284] Network initialization done.
I0311 19:11:02.610352   515 net.cpp:823] Ignoring source layer mbox_loss
I0311 19:11:02.610754   515 caffe.cpp:574] Running for 4952 iterations.
I0311 19:20:58.740268   515 caffe.cpp:438]     Test net output #0: detection_eval = 0.00244108

I have an RTX 3060, nvidia-smi output

Fri Mar 11 20:59:39 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
|  0%   45C    P8    25W / 170W |   1150MiB / 12288MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1122      G   /usr/lib/xorg/Xorg                640MiB |
|    0   N/A  N/A      1465      G   /usr/bin/gnome-shell              139MiB |
|    0   N/A  N/A      3933      G   ...AAAAAAAAA= --shared-files      107MiB |
|    0   N/A  N/A    443952      C   caffe                             191MiB |
|    0   N/A  N/A   2159972      G   ...952486002011016735,131072       65MiB |
+-----------------------------------------------------------------------------+

My training stop accidentally a little after iteration 50000 so I use the following command to resume

caffe train -solver solver.prototxt -snapshot /workspace/SSD/workspace/Mobilenetv2-SSD/snapshots/snapshot_iter_50000.solverstate -gpu 0 2>&1 | tee SSD_train_2.log

but I run score at snapshot 20000 and it is worse

I0311 21:09:44.394701   591 net.cpp:284] Network initialization done.
I0311 21:09:45.296046   591 net.cpp:823] Ignoring source layer mbox_loss
I0311 21:09:45.296661   591 caffe.cpp:574] Running for 4952 iterations.
I0311 21:21:22.749675   591 caffe.cpp:438]     Test net output #0: detection_eval = 0.000904203

How can I solve this ?

Compile issues with /Introduction/03-Basic/Module_6

Hello, I just want to report some issues I had when trying to follow the Vitis™ 2020.2 / Vitis-AI™ 1.3 - Machine Learning Tutorial for the ZCU104, more specifically the demo application in module 6: 3.6 Usb Camera Input and Multi-Threads base on Vitis AI Library

When I tried compiling the application running the build_app.sh script I encountered the following issues:

  1. The compiler could not find the glog library in my petalinux SDK environment so I had to install it using sudo apt-get install libgoogle-glog-dev libgflags-dev and set up the SDK again.
  2. I needed to update the petalinux SDK environment set-up file to enable c++17: CXXFLAGS= (...)-std=c++17
  3. "fatal error: vart/dpu/dpu_runner_ext.hpp: No such file or directory", I had to comment out this header file that was no where to be found
  4. "cannot find -lxnnpp-xnnpp", I had to edit /Module_6/CMakeLists.txt to instead link to ~/petalinux_sdk_2021.1/sysroots/cortexa72-cortexa53-xilinx-linux/usr/lib/libvitis_ai_library-xnnpp.so by replacing xnnpp-xnnpp with vitis_ai_library-xnnpp

after all these fixes I managed to compile the app and run it on my ZCU102 using a webcam!

YOLOv4 - convert_yolov4.sh

File "/home/unilincoln/anaconda3/envs/yolov4a/lib/python3.6/site-packages/tensorflow/python/keras/activations/init.py", line 22, in
from tensorflow.python.keras._impl.keras.activations import elu
ImportError: cannot import name 'elu'

Working now - I expect it was a issue with different versions.

Incompatibility Issue

I found that the cpp files from which we generate the binary are outdated and are using Vitis1.0,
for example from test_video_yolov3.cpp

#include <xilinx/ai/demo.hpp>
#include <xilinx/ai/yolov3.hpp>
#include <xilinx/ai/nnpp/yolov3.hpp>

Also the build.sh script refers to old libraries like ldpyolov3. Can the owner change the project to new Vitis1.2 base

Questions about "Test net output #0: detection_eval = 0"

Hello,
I download the Vitis-AI-ssd/SSD and retrain the Mobilenetv2-SSD in the Vitis-AI-ssd/SSD/workspace/Mobilenetv2-SSD. the log as follow:

I0521 00:17:13.147367 1523 solver.cpp:772] Iteration 6000, Testing net (#0)
I0521 00:17:13.148303 1523 net.cpp:743] Ignoring source layer mbox_loss
I0521 00:19:13.689599 1523 solver.cpp:885] Test net output #0: detection_eval = 0
I0521 00:19:19.519392 1523 solver.cpp:270] Iteration 6000 (0.149189 iter/s, 670.289s/100 iter), loss = 3.38898, remaining 212 hours and 14 minutes
I0521 00:19:19.519454 1523 solver.cpp:291] Train net output #0: mbox_loss = 3.49844 (* 1 = 3.49844 loss)
I0521 00:19:19.519470 1523 sgd_solver.cpp:106] Iteration 6000, lr = 0.001
...........................................

why "Test net output #0: detection_eval = 0". it confuse me for almost all the day, I can't find any useful solution in google.
( I use caffe-xilinx 1.1 )

train.sh:
/workspace/caffe-xilinx/build/tools/caffe train -solver="/workspace/Vitis-AI-ssd/SSD/workspace/Mobilenetv2-SSD/solver.prototxt"
-weights="/workspace/Vitis-AI-ssd/SSD/workspace/Mobilenetv2-SSD/pretrained.caffemodel" -gpu 0,1,2,3 2>&1 | tee train.log:q

quant tf.pb with 1.2.1

Hello, the following error occurred when I was quantifying the tf.pb file with 1.2.1
It should be noted that I converted the PyTorch model to TF, and after the conversion, I tested the output of the TF model, which was the same as the output of the PyTorch. There is no problem with the tf.pb file;Only the quantize_eval_model.pb file is generated under the vai_q_output folder
image
image

Which .prototxt while generating caffemodel?

Dear all,
I'm not sure about which prototxt should I use when i attempt to convert the darknet model to a caffemodel. Since I've cloned the repo, I possess 2 prototxts into the 2 directories:

dpu_yolov4/dpu_yolov4.prototxt
dpu_yolov4_voc/dpu_yolov4_voc.prototxt

Should I consider one on these 2 files, or should I generate my prototxt from my weights file and cfg?
in the second case; how to generate it?

Many thanks ☺️

The file '/opt/vitis_ai/compiler/arch/DPUCAHX8H/U50/arch.json' not found

I got the error message when I do the step 6 "source 6_compile.sh".

Traceback (most recent call last):
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/bin/vai_c_tensorflow", line 186, in
compiler = VAI_TensorFlow_Frontend(args)
File "/opt/vitis_ai/conda/envs/vitis-ai-tensorflow/bin/vai_c_tensorflow", line 76, in init
with open(args.arch) as json_data:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/vitis_ai/compiler/arch/DPUCAHX8H/U50/arch.json'

It seem like the file "arch.json" did not generate when I start this docker.
How should I do to fix that? thx

Pretrained models

I would like to do the simple inference application on my board and I wanted to use the pre-trained model available for the VGG-16 network. However the pre-trained models seem to be corrupted or I am unable to evaluate on the host, though using the corresponding 'deploy.prototxt' file. Can you provide a link or any other resource for the pretrained model. Thanks in advance

Using DPU as a feature extractor

Hello,
I was wondering is it possible to use the DPU as a feature extractor? how can I retrieve and copy the final feature vectors (the last feature map) ?
I tried this in the main cc file:

feature = dpuGetOutputTensorInHWCFP32(taskResnet50, OUTPUT_NODE, FCResult, channel);
printf("features = %f\n\r", feature);

But I only obtained one value.

Any help please?

pytorch flow VAIQ_ERROR

i was dong the Quantize part in pytorch flow, with torch==1.7 (my prj required) vitis-ai ==1.4 and come across the error
error info here:
[VAIQ_ERROR]: /opt/vitis_ai/conda/envs/vitis-ai-pytorch/lib/python3.6/site-packages/pytorch_nndct/nn/_kernels.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe28TypeMeta21_typeMetaDataInstanceIN3c108BFloat16EEEPKNS_6detail12TypeMetaDataEv

i have tried to import pytorch_nndct before torch, but it did not work.
did anyone have the same problem, and could someone help to solve? thanks!

Core dump on AWS F1 running 02-MNIST_classification_tf example

After having built everything successfully from step 0 to 7 from the tutorial ML/02-MNIST_classification_tf

But I want to run the xmodel on AWS F1 f1.2xlarge, I get the errow below. Can you please confirm U50 is ok to run on f1.2xlarge otherwise which target ?

(vitis-ai-tensorflow) Vitis-AI /workspace/build/target_u50 > /usr/bin/python3 app_mt.py -m model_dir/customcnn.xmodel
Command line options:
--image_dir : images
--threads : 1
--model : model_dir/customcnn.xmodel
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0401 08:21:48.498520 440 dpu_controller.cpp:44] Check failed: !the_factory_methods.empty()
*** Check failure stack trace: ***
Aborted (core dumped)

Yolov4 Model Compilation issue wrt change in Num_classes

we referring to https://github.com/Xilinx/Vitis-AI-Tutorials/tree/master/Design_Tutorials/07-yolov4-tutorial.
"When Num_classes is 16 changed in dpu_yolov4.prototxt. then model is not working properly."
In our case we are doing transfer learning on model pre-trained on coco dataset.
We are not getting any error or crashes. But when we deployed our model, it is not working. It is not detecting classes in runtime. But our model before conversion worked fine on CPU.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.