Giter VIP home page Giter VIP logo

dnnweaver2's People

Contributors

elbehery95 avatar h-blake avatar hsharma35 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dnnweaver2's Issues

cl_wrapper: Incompatible Module

When trying to add the cl_wrapper module, I get an Incompatible Module issue that prevents me from adding it. I also see that obuf, pe, banked_ram, obuf_mem_wrapper, mux_n_1, signed_adder, systolic_array, and dnnweaver2_controller are deemed to be incompatible, so I am guessing by virtue of them not being compatible, it prevents the top-level module from being compatible. I am using Vivado 2018.3.

Can you provide for the rtl testbench?

I want to perform hardware simulation on the accelerator first, then modify the hardware architecture based on this development environment to further improve performance.

Synthesis failed

I followed the tutorial for this project strictly, using Vivado 2018.2 on Ubuntu 18.04. According to the synthesis report, the resource usage ratio is zero. If you proceed with the implementation, an odd error will occur and the program will exit!

Copyright 1986-2018 Xilinx, Inc. All Rights Reserved.
-----------------------------------------------------------------------------------------------------------------------
| Tool Version : Vivado v.2018.2 (lin64) Build 2258646 Thu Jun 14 20:02:38 MDT 2018
| Date         : Tue Mar 12 11:33:45 2024
| Host         : gzzyyxh-System-Product-Name running 64-bit Ubuntu 18.04.6 LTS
| Command      : report_utilization -file kcu115_wrapper_utilization_synth.rpt -pb kcu115_wrapper_utilization_synth.pb
| Design       : kcu115_wrapper
| Device       : xcku115flvb2104-2
| Design State : Synthesized
-----------------------------------------------------------------------------------------------------------------------

Utilization Design Information

Table of Contents
-----------------
1. CLB Logic
1.1 Summary of Registers by Type
2. BLOCKRAM
3. ARITHMETIC
4. I/O
5. CLOCK
6. ADVANCED
7. CONFIGURATION
8. Primitives
9. Black Boxes
10. Instantiated Netlists
11. SLR Connectivity and Clocking Utilization
12. SLR Connectivity Matrix
13. SLR CLB Logic and Dedicated Block Utilization
14. SLR IO Utilization

1. CLB Logic
------------

+-------------------------+------+-------+-----------+-------+
|        Site Type        | Used | Fixed | Available | Util% |
+-------------------------+------+-------+-----------+-------+
| CLB LUTs*               |    0 |     0 |    663360 |  0.00 |
|   LUT as Logic          |    0 |     0 |    663360 |  0.00 |
|   LUT as Memory         |    0 |     0 |    293760 |  0.00 |
| CLB Registers           |    0 |     0 |   1326720 |  0.00 |
|   Register as Flip Flop |    0 |     0 |   1326720 |  0.00 |
|   Register as Latch     |    0 |     0 |   1326720 |  0.00 |
| CARRY8                  |    0 |     0 |     82920 |  0.00 |
| F7 Muxes                |    0 |     0 |    331680 |  0.00 |
| F8 Muxes                |    0 |     0 |    165840 |  0.00 |
| F9 Muxes                |    0 |     0 |     82920 |  0.00 |
+-------------------------+------+-------+-----------+-------+
* Warning! The Final LUT count, after physical optimizations and full implementation, is typically lower. Run opt_design after synthesis, if not already completed, for a more realistic count.


1.1 Summary of Registers by Type
--------------------------------

+-------+--------------+-------------+--------------+
| Total | Clock Enable | Synchronous | Asynchronous |
+-------+--------------+-------------+--------------+
| 0     |            _ |           - |            - |
| 0     |            _ |           - |          Set |
| 0     |            _ |           - |        Reset |
| 0     |            _ |         Set |            - |
| 0     |            _ |       Reset |            - |
| 0     |          Yes |           - |            - |
| 0     |          Yes |           - |          Set |
| 0     |          Yes |           - |        Reset |
| 0     |          Yes |         Set |            - |
| 0     |          Yes |       Reset |            - |
+-------+--------------+-------------+--------------+


2. BLOCKRAM
-----------

+----------------+------+-------+-----------+-------+
|    Site Type   | Used | Fixed | Available | Util% |
+----------------+------+-------+-----------+-------+
| Block RAM Tile |    0 |     0 |      2160 |  0.00 |
|   RAMB36/FIFO* |    0 |     0 |      2160 |  0.00 |
|   RAMB18       |    0 |     0 |      4320 |  0.00 |
+----------------+------+-------+-----------+-------+
* Note: Each Block RAM Tile only has one FIFO logic available and therefore can accommodate only one FIFO36E2 or one FIFO18E2. However, if a FIFO18E2 occupies a Block RAM Tile, that tile can still accommodate a RAMB18E2


3. ARITHMETIC
-------------

+-----------+------+-------+-----------+-------+
| Site Type | Used | Fixed | Available | Util% |
+-----------+------+-------+-----------+-------+
| DSPs      |    0 |     0 |      5520 |  0.00 |
+-----------+------+-------+-----------+-------+


4. I/O
------

+------------+------+-------+-----------+-------+
|  Site Type | Used | Fixed | Available | Util% |
+------------+------+-------+-----------+-------+
| Bonded IOB |    2 |     0 |       702 |  0.28 |
+------------+------+-------+-----------+-------+


5. CLOCK
--------

+----------------------+------+-------+-----------+-------+
|       Site Type      | Used | Fixed | Available | Util% |
+----------------------+------+-------+-----------+-------+
| GLOBAL CLOCK BUFFERs |    0 |     0 |      1248 |  0.00 |
|   BUFGCE             |    0 |     0 |       576 |  0.00 |
|   BUFGCE_DIV         |    0 |     0 |        96 |  0.00 |
|   BUFG_GT            |    0 |     0 |       384 |  0.00 |
|   BUFGCTRL*          |    0 |     0 |       192 |  0.00 |
| PLLE3_ADV            |    0 |     0 |        48 |  0.00 |
| MMCME3_ADV           |    0 |     0 |        24 |  0.00 |
+----------------------+------+-------+-----------+-------+
* Note: Each used BUFGCTRL counts as two global buffer resources. This table does not include global clocking resources, only buffer cell usage. See the Clock Utilization Report (report_clock_utilization) for detailed accounting of global clocking resource availability.


6. ADVANCED
-----------

+------------------+------+-------+-----------+-------+
|     Site Type    | Used | Fixed | Available | Util% |
+------------------+------+-------+-----------+-------+
| GTHE3_CHANNEL    |    0 |     0 |        64 |  0.00 |
| GTHE3_COMMON     |    0 |     0 |        16 |  0.00 |
| IBUFDS_GTE3      |    0 |     0 |        32 |  0.00 |
| OBUFDS_GTE3      |    0 |     0 |        32 |  0.00 |
| OBUFDS_GTE3_ADV  |    0 |     0 |        32 |  0.00 |
| PCIE_3_1         |    0 |     0 |         6 |  0.00 |
| SYSMONE1         |    0 |     0 |         2 |  0.00 |
| LAGUNA Registers |    0 |     0 |     34560 |  0.00 |
|   as TX_REG      |    0 |       |           |       |
|   as RX_REG      |    0 |       |           |       |
+------------------+------+-------+-----------+-------+


7. CONFIGURATION
----------------

+-------------+------+-------+-----------+-------+
|  Site Type  | Used | Fixed | Available | Util% |
+-------------+------+-------+-----------+-------+
| BSCANE2     |    0 |     0 |         8 |  0.00 |
| DNA_PORTE2  |    0 |     0 |         2 |  0.00 |
| EFUSE_USR   |    0 |     0 |         1 |  0.00 |
| FRAME_ECCE3 |    0 |     0 |         1 |  0.00 |
| ICAPE3      |    0 |     0 |         2 |  0.00 |
| MASTER_JTAG |    0 |     0 |         2 |  0.00 |
| STARTUPE3   |    0 |     0 |         1 |  0.00 |
+-------------+------+-------+-----------+-------+


8. Primitives
-------------

+----------+------+---------------------+
| Ref Name | Used | Functional Category |
+----------+------+---------------------+
| INBUF    |    2 |                 I/O |
| IBUFCTRL |    2 |              Others |
+----------+------+---------------------+


9. Black Boxes
--------------

+------------------------------+------+
|           Ref Name           | Used |
+------------------------------+------+
| kcu115_xdma_0_0              |    1 |
| kcu115_util_vector_logic_0_0 |    1 |
| kcu115_util_ds_buf_0_0       |    1 |
| kcu115_rst_ddr4_0_300M_0     |    1 |
| kcu115_rst_ddr4_0_100M_0     |    1 |
| kcu115_ddr4_0_0              |    1 |
| kcu115_cl_wrapper_0_0        |    1 |
| kcu115_axi_smc_0             |    1 |
| kcu115_auto_cc_0             |    1 |
+------------------------------+------+


10. Instantiated Netlists
-------------------------

+----------+------+
| Ref Name | Used |
+----------+------+


11. SLR Connectivity and Clocking Utilization
---------------------------------------------

+----------+-----------------+---------+-----------------+--------------+-------+-------+
|          | Total SLLs Used | (%)SLLs | BUFGs/BUFGCTRLs | BUFH/BUFHCEs | BUFRs | MMCMs |
+----------+-----------------+---------+-----------------+--------------+-------+-------+
| SLR1     |                 |         |               0 |            0 |     0 |     0 |
| ||||||-> |               0 |    0.00 |                 |              |       |       |
| SLR0     |                 |         |               0 |            0 |     0 |     0 |
+----------+-----------------+---------+-----------------+--------------+-------+-------+
| Total    |               0 |         |               0 |            0 |     0 |     0 |
+----------+-----------------+---------+-----------------+--------------+-------+-------+


12. SLR Connectivity Matrix
---------------------------

+------+------+------+
|      | SLR1 | SLR0 |
+------+------+------+
| SLR1 |    0 |    0 |
| SLR0 |    0 |    0 |
+------+------+------+


13. SLR CLB Logic and Dedicated Block Utilization
-------------------------------------------------

+-----------+------+---------+------------+-------------+---------------+-----------+-------+------+------+
| SLR Index | CLBs | (%)CLBs | Total LUTs | Memory LUTs | (%)Total LUTs | Registers | BRAMs | URAM | DSPs |
+-----------+------+---------+------------+-------------+---------------+-----------+-------+------+------+
| SLR1      |    0 |    0.00 |          0 |           0 |          0.00 |         0 |     0 |    0 |    0 |
| SLR0      |    0 |    0.00 |          0 |           0 |          0.00 |         0 |     0 |    0 |    0 |
+-----------+------+---------+------------+-------------+---------------+-----------+-------+------+------+
| Total     |    0 |         |          0 |           0 |               |         0 |     0 |    0 |    0 |
+-----------+------+---------+------------+-------------+---------------+-----------+-------+------+------+


14. SLR IO Utilization
----------------------

+-----------+-------------+---------+--------------+----------+--------------+----------+-----+
| SLR Index | Bonded IOBs | (%)IOBs | Bonded IPADs | (%)IPADs | Bonded OPADs | (%)OPADs | GTs |
+-----------+-------------+---------+--------------+----------+--------------+----------+-----+
| SLR1      |           0 |    0.00 |            0 |     0.00 |            0 |     0.00 |   0 |
| SLR0      |           0 |    0.00 |            0 |     0.00 |            0 |     0.00 |   0 |
+-----------+-------------+---------+--------------+----------+--------------+----------+-----+
| Total     |           0 |         |            0 |          |            0 |          |   0 |
+-----------+-------------+---------+--------------+----------+--------------+----------+-----+

After following your tutorial strictly, I'm still unable to get any further suggestions. Are you confident that this project will run correctly? My Vivado-generated block design differs from the tutorial's.

2024-03-12 11-50-27屏幕截图

I have tried it for at least 7 days. Could you provide me with more information?

Demo is unavailable

The Yolov2-demo for your dnnweaver which you display on the website "dnnweaver.org" is probably unavailable. It says "page not found" when I try to check it. Could you tell me the correct address?

MMAP OSError: [Errno 22] Invalid argument

Hello,

I managed to generate the bitstream on a KCU105 and now I try to run the tutorial but I have an error when the memory map is called :

"self.pci_cl_ctrl_mmap = mmap.mmap(self.pci_cl_ctrl_fd.fileno(), 32*1024, prot=mmap.PROT_READ|mmap.PROT_WRITE)"

"OSError: [Errno 22] Invalid argument"

This error seems to be related to the first argument self.pci_cl_ctrl_fd.fileno()

I tried to use the os.open function instead but I obtain the same error.

Do you have an Idea to solve this issue ?

Thank you,

Best regards

modifing c0_ddr4_ui_clk ???

Q1: the READ.md in hardware folder says "We will also use the DDR1 IP to create a clock for DnnWeaver. To do this, specify 150MHz as the frequency for c0_ddr4_ui_clk by double-clicking the IP and then specifying 150 MHz in the Advanced Clocking tab." But according to Xilinx, ratio between c0_ddr4_ui_clk and memory interface speed must be 1:4, and the minimum frequency of memory interface speed is 625MHz.
So, how to modify c0_ddr4_ui_clk to 150MHz?

Example or Tutorial

Hi,
I'm comparativley new to Vivado and FPGA programming. I wanted to test this framework and just tried to run the example dnnweaver2-tutorial-notebook. I ran into some problems and tried to figure out what's wrong. I got stuck at Part 1 Step 3 of the notebook with an IOError during the initialization of the FPGAManager - "No such file or directory: '/dev/xdma0_user'".
Well I see that this file is not there - but where does it come from or where do I generate it? Do I have to run the Vivado part described in the Hardware Readme first?

Could not find /dev/xdma0_user

Hi! I encountered an issue when I finished installing the xdma driver to connect my Xilinx vc707 evaluation board with my host computer through PCIe. I want to use dnnweaver2 to run yolo on my fpga board, and the dnnweaver needs a device which is /dev/xdma0_user to interact with the hardware. However, when I loaded the xdma driver, there isn't xdma0_user in the /dev.
When I run the bash in linux terminal:

'$lspci | grep Xilinx'

the output is as follows:
'03:00.0 Serial controller: Xilinx Corporation Device 7021'
which indicates that my vc707 board has been physically connected to the host through PCIe.
However when I finished installing the xdma driver and run the following bash:
'$ cd /dev'
'$ ls | grep xdma'
the outputs are as follows:
'xdma xdma0_c2h_0 xdma0_control xdma0_events_0 xdma0_events_1 xdma0_events_10 xdma0_events_11 xdma0_events_12 xdma0_events_13 xdma0_events_14 xdma0_events_15 xdma0_events_2 xdma0_events_3 xdma0_events_4 xdma0_events_5 xdma0_events_6 xdma0_events_7 xdma0_events_8 xdma0_events_9 xdma0_h2c_0'
which indicates that there doesn't exist xdma0_user so that dnn_weaver could not interact with my fpga board correctly.
Do you guys know what is going wrong with this situation? Thank you in advance!

porting dnnweaver to xilinx zynq xc7z020 board,I have encountered some problems.

hi Hardik Sharma
I recently wanted to port dnnweaver to the xilinx zynq xc7z020 board, according to the previous answer:
//------------------------------------------------------------------------------------//
The pci_cl_ctrl* AXI-Lite interface is used by CPU to write to registers in the FPGA.
The pci_cl_data* AXI4-Full interface is used to write from CPU to the BRAM in the FPGA.
The cl_ddr* AXI4-Full interface is used by dnnweaver accelerator on the FPGA fabric to write to/read from a shared DDR space.
You won't need to change any RTL for this. If the FPGA doesn't have enough resources, you can reduce the systolic array dimensions from the 32x32 default value to 16x16 or 8x8.
//------------------------------------------------------------------------------------//
There are a few questions below:
1.pci_cl_ctrl (AXI-Lite interface) What data is sent by the CPU to the FPGA through this interface, is it a control instruction? I see the document. Is this the instructions of the macro-dataflow virtual machine?

2.pci_cl_data (AXI4-Full interface)Is this the CPU that sends image data to the BRAM in the FPGA? What is the format of this image data? (such as YUV420)
Can it to be recognition if the image is 1080 resolution? How speed is the recognition?

3.The cl_ddr (AXI4-Full interface)This explanation is clearer, mainly in communication with DDR.

  1. Can you talk about how does the upper layer software call the pci_cl_ctrl and pci_cl_data?
    What is the process?? Give an example, thank you.

5, if the FPGA resources are limited, I see your previous answer is to put systolic array dimensions from the 32x32 default value to 16x16 or 8x8, as follows, can I modify the code of cl_wrapper.v below?
  // Systolic Array
    Parameter integer ARRAY_N = 64, modified to 16
    Parameter integer ARRAY_M = 64, modified to 16

6.Because my image is directly in DDR, if I don't transfer images to dnnweaver through Xilinx PCIe DMA interface, I directly send image data to dnnweaver FPGA in memory.Don't know if this is ok?Also I don't know what image format you are using? I don't know what format your image is in memory?

thank you for helping me in advance.

AttributeError: 'collections.OrderedDict' object has no attribute 'iteritems'

Hi,

When I run the code that you have provided in dnnweaver2-tutorial.ipynb file. I get the below error:

AttributeError: 'collections.OrderedDict' object has no attribute 'iteritems'

I have changed the iteritems() to items() in your code as below:
for tname, t in graph.tensor_registry.iteritems(): ----> for tname, t in graph.tensor_registry.items():

Is this correct?

Some DSP area constraints are over utilized

I am using Vivado 2019.2 and it fails to place the design, Any idea that what is the problem?

[Place 30-859] Some DSP area constraints are over utilized.

18 or more DSP failed to place. The unplaced DSP are constrained as below: (listing maximum of 20 DSPs per constraint)
Area constraint: Tool:Shape
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[15].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[19].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[61].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[11].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[13].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[9].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[58].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[59].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[60].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[26].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[28].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[62].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[63].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[57].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
Tile rectangles examined:
Rect: ((0, 155), (579, 310))

  Number of DSP required by this constraint: 2368
  Number of DSP available in this constraint region: 1380
  Utilization = 171%

Area constraint: Tool:Shape
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[56].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[36].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[33].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
  ku115_i/cl_wrapper_0/inst/u_bf_wrap/sys_array/LOOP_INPUT_FORWARD[52].LOOP_OUTPUT_FORWARD[0].pe_inst/reg_inst/out_reg_reg
Tile rectangles examined:
Rect: ((0, 466), (579, 621))

  Number of DSP required by this constraint: 1731
  Number of DSP available in this constraint region: 1380
  Number of DSP blocked in this constraint region: 14
  Utilization = 125%


Also when I tried to add the module, it was incompatible and I couldn't add it to the block diagram. So I made generated an IP and then added it to the block diagram.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.