Acknowledgment: This tutorial is based on the HLS Tutorial by Yihan Jiang & Akshay Kamath.
Connect to one VLSI lab machine.
Source the environment variables to run Vivado_HLS and Vivado.
$ source /home/vlsilab/xilinx/vivado_2018.2/Vivado/2018.2/settings64.sh
To avoid this ritual every time you login, add these commands to your ~/.my-bashrc
file. This file gets automatically sourced every time you login.
Install faketime due to this issue. We need to set up a proxy server to get to off-campus sites (either proxy1.ece.umn.edu or proxy2.ece.umn.edu and port 3128) using your UMN ID.
$ export HTTPS_PROXY=http://username:"passcode"@proxy1.ece.umn.edu:3128/
Install Miniconda to use pip: 1) Download Miniconda3 Linux 64-bit
from this webpage locally; 2) Upload to your VLSI account; and 3) install using bash. (Katie: I've tried wget, but it did not work even after setting a proxy server. If anyone has a solution, please share.)
$ bash ~/miniconda3/miniconda.sh
Install faketime using conda from inside bash shell.
$ conda install conda-forge::libfaketime
Launch Vivado_HLS GUI by invoking the following command. (Katie: The latest versions of Vivado/Vitis don't require faketime. I'll notify you once Vitis 2022.1 is installed on our VLSI machines.)
$ faketime -f "-4y" vivado_hls
Vivado HLS GUI should open as shown below:
Let’s design a vector addition module using C/C++ in Vivado HLS. We will first look at working with Vivado HLS in GUI mode and then in CLI mode.
- Create a new project and specify the project name
vector_add
. - It is not necessary to specify the top function nor the testbench now. Click on
Next
twice. - Change part selection to Ultra96V2 using the part number
xczu3eg-sbva484-1-e
as shown below. This is the FPGA board we will be using for our lab assignments.Ultra96-V2 is an Arm-based, AMD Xilinx Zynq UltraScale+ ™ MPSoC development board, supporting the Pynq (Python for Zynq) framework that makes it easier to run host applications on the board using Python language and libraries.
- No need to change the
Solution Name
or thePeriod
. We will continue with 100 MHz default clock frequency. - You should see a window like the one shown below.
- Now we can design our accelerator in C++ and simulate with Vivado. To do so, start by creating a new source file named
top.c
in your desired folder as the following:// top.c void top(int a[100], int b[100], int sum[100]) { #pragma HLS interface m_axi port=a depth=100 offset=slave bundle = A #pragma HLS interface m_axi port=b depth=100 offset=slave bundle = B #pragma HLS interface m_axi port=sum depth=100 offset=slave bundle = SUM #pragma HLS interface s_axilite register port=return for (int i = 0; i < 100; i++) { sum[i] = a[i] + b[i]; } }
- Next, create a testbench named
main.c
as the following:Testbench does not get synthesized. So you are free to use any C/C++ construct for your testing purposes!// main.c #include <stdio.h> void top( int a[100], int b[100], int sum[100]); int main() { int a[100]; int b[100]; int c[100]; for(int i = 0; i < 100; i++) { a[i] = i; b[i] = i * 2; c[i] = 0; } // Call the DUT function, i.e., your adder top(a, b, c); // verify the results int pass = 1; for(int j = 0; j < 100; j++) { if(c[j] != (a[j] + b[j])) { pass = 0; } printf("A[%d] = %d; B[%d] = %d; Sum C[%d] = %d\n", j, a[j], j, b[j], j, c[j]); } if(pass) printf("Test Passed! :) \n"); else printf("Test Failed :( \n"); return 0; }
- Let’s run C simulation for our adder module.
Note: Vivado C/C++ compiler is rather slow. We recommend using
g++
to run simulations!$ g++ main.c top.c -o vadd $ ./vadd
- Now that the simulation has passed, let’s run high-level synthesis and generate the RTL for our adder. Go to
Project Settings > Synthesis
, and specifytop (top.c)
as the top function. - Run synthesis.
- Check the console to know when the synthesis finishes.
- We can now view the performance reports and resource utilization.
- C/RTL co-simulation can also be run at this stage. Vivado uses the same test bench
main.c
to test the RTL generated. This is left as an exercise. We will now export our adder “IP” for integration in Vivado. - That’s it. We should now move to Vivado to generate the bitstream with our exported adder IP!
When working with larger designs, it may be easier to simply work on the command line.
- Use
g++
compiler for functional verification.$ g++ main.c top.c -o vadd $ ./vadd
- Use the following TCL script
synth.tcl
to run the synthesis and export RTL.# TCL commands for batch-mode HLS # Create project open_project proj # Set top-level design file (DUT) set_top top # Add source code files add_files top.c # Add test bench files add_files -tb ./main.c # Create design solution open_solution "solution2" # Set the FPGA board set_part {xczu3eg-sbva484-1-e} # Set the clock period create_clock -period 10 -name default ## C simulation # Use Makefile instead. This is even slower. #csim_design -O -clean ## C code synthesis to generate Verilog code csynth_design ## C and Verilog co-simulation ## This usually takes a long time so it is commented ## You may uncomment it if necessary #cosim_design ## export synthesized Verilog code #export_design -format ip_catalog exit
- Invoke Vivado in batch-mode and pass the TCL file as argument.
$ faketime -f "-4y" vivado_hls synth.tcl
- You should see something similar on your terminal.
- To view the performance reports, open
proj/solution2/syn/report/top_csynth.rpt
. - You can view the log file
vivado_hls.log
for any warnings. To view the synthesized RTL, go toproj/solution2/impl/verilog/
.
- Invoke Vivado GUI.
$ vivado
- Create a new project, name it as
adder_Project
. Make sure you select the same board you're using. If you cannot find it, there is aninstall board
icon at top right corner where you can install your own board. - Add our adder IP core to the Vivado. Click
IP Catalog
at the left column, right click theVivado Repository
, and selectAdd Repository
. - Select the folder that includes your HLS solution
solution1
. Then clickselect
button. The following page should pop up. - Expand the IPs tab. If you see the top IP with an
orange
icon, there no issue for now. If the icon isgrey
, re-check whether the same board was chosen as in Vivado. - Now we build the block diagram. Click the
Create Block Design
at the left column. Click the+
icon at the upper side of the diagram. Typehls
for finding the add function ip. Typezynq
to find the embedded controller. - Since we specified two inputs and one output in our C code in different “bundles”, we need to initialize 3 AXI buses on the FPGA. To do so, double click the
ZYNQ
icon on the block diagram. SelectPS-PL Configuration
. Then, select theS AXI HP0 Interface
toS AXI HP2 Interface
underPS-PL Interface > Slave Interface > AXI HP
by checking their boxes. - Go back to the Block Diagram and click
Run Connection Automation
. - You need to manually map the HLS ports to the three AXI HP buses in the IP.
- Now select the
All Automationat
the left column. ClickOK
to start connection automation. - To check the correctness, click the
validation
(check icon) on the upper page. - The next step is to create a wrapper for the Block Design. Find the block diagram file under the design sources. Right click the design file (whatever you name it) and select
Create HDL Wrapper
. ChooseLet Vivado manage wrapper and auto-update
option. ClickOK
to start. - Finally, click
Generate Bitstream
underPROGRAM AND DEBUG
division (at the lower left of the entire page). Use the default settings (for our simple example) and start to run. - If bitstream generation is successful, you should be able to view the implemented design!
- After generating the bitstream we need two files for running the vector addition on FPGA: the bitstream with
.bit
as the extension and the hardware handoff file with.hwh
. You can find the.bit
file underadderProject/adderProject.runs/impl_1
. The.hwh
file is under the directoryadderProject/adderProject.gen/sources_1/bd/design_1/hw_handoff
. - You can download these two files to a flash drive and put them on your own laptop for the next step. Note that these two files must have the same name except for the extension.
- Check this page for setting up the Ultra96V2 board. (The SD card has image already. No need to flash the SD card for now.)
- Connect with a USB cable. Use your browser on your computer to connect to the board using the IP address of 192.168.3.1. Jupyter login password: xilinx.
- Upload the
.bit
file and the.hwh
file to Jupyter. In the same folder, create a new.ipynb
file for writing the script.- Click here to access the overlay tutorial.
- Find the address offset of the memory ports (
a
,b
, andsum
, in this example). This information can be found in the xtop_hw.h file undersolution1/impl/misc/drivers/top_v1_0/src
directory. - Below is the example Python host code to control the FPGA kernel.
import numpy as np import pynq from pynq import MMIO overlay = pynq.Overlay('adder.bit') top_ip = overlay.top_0 top_ip.signature a_buffer = pynq.allocate((100), np.int32) b_buffer = pynq.allocate((100), np.int32) sum_buffer = pynq.allocate((100), np.int32) # initialize input for i in range (0, 100): a_buffer[i] = i b_buffer[i] = i+5 aptr = a_buffer.physical_address bptr = b_buffer.physical_address sumptr = sum_buffer.physical_address # specify the address # These addresses can be found in the generated .v file: top_control_s_axi.v top_ip.write(0x10, aptr) top_ip.write(0x1c, bptr) top_ip.write(0x28, sumptr) # start the HLS kernel top_ip.write(0x00, 1) isready = top_ip.read(0x00) while( isready == 1 ): isready = top_ip.read(0x00) print("Array A:") print(a_buffer[0:10]) print("\nArray B:") print(b_buffer[0:10]) print("\nExpected Sum:") print((a_buffer + b_buffer)[0:10]) print("\nFPGA returns:") print(sum_buffer[0:10])