Giter VIP home page Giter VIP logo

verilog-pcie's Introduction

Verilog PCI Express Components Readme

Build Status

For more information and updates: http://alexforencich.com/wiki/en/verilog/pcie/start

GitHub repository: https://github.com/alexforencich/verilog-pcie

Introduction

Collection of PCI express related components. Includes PCIe to AXI and AXI lite bridges and a flexible, high-performance DMA subsystem. Currently supports operation with several FPGA families from Xilinx and Intel. Includes full cocotb testbenches that utilize cocotbext-pcie and cocotbext-axi.

Example designs are included for the following FPGA boards:

  • Alpha Data ADM-PCIE-9V3 (Xilinx Virtex UltraScale+ XCVU3P)
  • BittWare 520N-MX (Intel Stratix 10 MX 1SM21CHU2F53E2VG)
  • Exablaze ExaNIC X10 (Xilinx Kintex UltraScale XCKU035)
  • Exablaze ExaNIC X25 (Xilinx Kintex UltraScale+ XCKU3P)
  • Silicom fb2CG@KU15P (Xilinx Kintex UltraScale+ XCKU15P)
  • Intel Stratix 10 DX dev kit (Intel Stratix 10 MX 1SD280PT2F55E1VG)
  • Intel Stratix 10 MX dev kit (Intel Stratix 10 MX 1SM21CHU1F53E1VG)
  • Terasic DE10-Agilex (Intel Agilex F AGFB014R24B2E2V)
  • Xilinx Alveo U50 (Xilinx Virtex UltraScale+ XCU50)
  • Xilinx Alveo U55C (Xilinx Virtex UltraScale+ XCU55C)
  • Xilinx Alveo U55N/Varium C1100 (Xilinx Virtex UltraScale+ XCU55N)
  • Xilinx Alveo U200 (Xilinx Virtex UltraScale+ XCU200)
  • Xilinx Alveo U250 (Xilinx Virtex UltraScale+ XCU250)
  • Xilinx Alveo U280 (Xilinx Virtex UltraScale+ XCU280)
  • Xilinx VCU108 (Xilinx Virtex UltraScale XCVU095)
  • Xilinx VCU118 (Xilinx Virtex UltraScale+ XCVU9P)
  • Xilinx VCU1525 (Xilinx Virtex UltraScale+ XCVU9P)
  • Xilinx ZCU106 (Xilinx Zynq UltraScale+ XCZU7EV)

Documentation

FPGA-independent PCIe

The PCIe modules use a generic, FPGA-independent interface for handling PCIe TLPs. This permits the same core logic to be used on multiple FPGA families, with interface shims to connect to the PCIe IP on each target device.

The pcie_us_if module is an adaptation shim for Xilinx 7-series, UltraScale, and UltraScale+. It handles the main datapath, configuration space parameters, MSI interrupts, and flow control.

The pcie_s10_if module is an adaptation shim for Intel Stratix 10 GX/SX/TX/MX series FPGAs that use the H-Tile or L-Tile for PCIe. It handles the main datapath, configuration space parameters, MSI interrupts, and flow control.

The pcie_ptile_if module is an adaptation shim for Intel Stratix 10 DX/Agilex series FPGAs that use the P-Tile for PCIe. It handles the main datapath, configuration space parameters, and flow control.

PCIe AXI and AXI lite master

The pcie_axi_master, pcie_axil_master, and pcie_axil_master_minimal modules provide a bridge between PCIe and AXI. These can be used to implement PCIe BARs.

The pcie_axil_master_minimal module is a very simple module for providing register access, supporting only 32 bit operations.

The pcie_axi_master module is more complex, converting PCIe operations to AXI bursts. It can be used to terminate device-to-device DMA operations with reasonable performance.

The pcie_tlp_demux_bar module can be used to demultiplex PCIe operations based on the target BAR.

Flexible DMA subsystem

The split DMA interface/DMA client modules support highly flexible, highly performant DMA operations. The DMA interface and DMA client modules are connected by dual port RAMs with a high performance segmented memory interface. The segmented memory interface is a better 'impedance match' to the PCIe hard core interface - data realignment can be done in the same clock cycle; no bursts, address decoding, arbitration, or reordering simplifies implementation and provides much higher performance than AXI. The architecture is also quite flexible as it decouples the DMA interface from the clients with dual port RAMs, enabling mixing different client interface types and widths and even supporting clients running in different clock domains without datapath FIFOs.

DMA system block diagram

The dma_if_pcie module connects a generic, FPGA-independent PCIe interface to the segmented memory interface.

The dma_if_axi module connects an AXI interface to the segmented memory interface.

The dma_psdpram module is a dual clock, parallel simple dual port RAM module with a segmented interface. The depth is independently adjustable from the address width, simplifying use of the segmented interface. The module also contains a parametrizable output pipeline register to improve timing.

The dma_if_mux module enables sharing the DMA interface across several DMA clients. This module handles the tags and select lines appropriately on both the descriptor and segmented memory interface for plug-and-play operation without address assignment - routing is completely determined by component connections. The module also contains a FIFO to maintain read data ordering across multiple clients. Make sure to equalize pipeline delay across all paths for maximum performance.

DMA client modules connect the segmented memory interface to different internal interfaces.

The dma_client_axis_source and dma_client_axis_sink modules provide support for streaming DMA over AXI stream. The AXI stream width can be any power of two fraction of the segmented memory interface width.

arbiter module

General-purpose parametrizable arbiter. Supports priority and round-robin arbitration. Supports blocking until request release or acknowledge.

axis_arb_mux module

Frame-aware AXI stream arbitrated multiplexer with parametrizable data width and port count. Supports priority and round-robin arbitration.

dma_client_axis_sink module

AXI stream sink DMA client module. Uses a segmented memory interface.

dma_client_axis_source module

AXI stream source DMA client module. Uses a segmented memory interface.

dma_if_axi module

AXI DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_axi_rd module

AXI DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_axi_wr module

AXI DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_desc_mux module

DMA interface descriptor mux module. Enables sharing a DMA interface module between multiple DMA client modules.

dma_if_mux module

DMA interface mux module. Enables sharing a DMA interface module between multiple DMA client modules. Wrapper for dma_if_mux_rd and dma_if_mux_wr.

dma_if_mux_rd module

DMA interface mux module. Enables sharing a DMA interface module between multiple DMA client modules. Wrapper for dma_if_desc_mux and dma_ram_demux_wr.

dma_if_mux_wr module

DMA interface mux module. Enables sharing a DMA interface module between multiple DMA client modules. Wrapper for dma_if_desc_mux and dma_ram_demux_rd.

dma_if_pcie module

PCIe DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_pcie_rd module

PCIe DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_pcie_wr module

PCIe DMA interface module. Parametrizable interface width. Uses a double width segmented memory interface.

dma_if_pcie_us module

PCIe DMA interface module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Uses a double width segmented memory interface. Wrapper for dma_if_pcie_us_rd and dma_if_pcie_us_wr.

dma_if_pcie_us_rd module

PCIe DMA interface module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Uses a double width segmented memory interface.

dma_if_pcie_us_wr module

PCIe DMA interface module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Uses a double width segmented memory interface.

dma_psdpram module

DMA RAM module. Segmented simple dual port RAM to connect a DMA interface module to a DMA client.

dma_psdpram_async module

DMA RAM module with asynchronous clocks. Segmented simple dual port RAM to connect a DMA interface module to a DMA client.

dma_ram_demux module

DMA RAM interface demultiplexer module. Wrapper for dma_ram_demux_rd and dma_ram_demux_wr.

dma_ram_demux_rd module

DMA RAM interface demultiplexer module for read operations.

dma_ram_demux_wr module

DMA RAM interface demultiplexer module for write operations.

pcie_axi_dma_desc_mux module

Descriptor multiplexer/demultiplexer for PCIe AXI DMA module. Enables sharing the PCIe AXI DMA module between multiple request sources, interleaving requests and distributing responses.

pcie_axi_master module

PCIe AXI master module. Parametrizable interface width and AXI burst length. Wrapper for pcie_axi_master_rd and pcie_axi_master_wr.

pcie_axi_master_rd module

PCIe AXI master module. Parametrizable interface width and AXI burst length.

pcie_axi_master_wr module

PCIe AXI master module. Parametrizable interface width and AXI burst length.

pcie_axil_master module

PCIe AXI lite master module. Parametrizable interface width.

pcie_axil_master_minimal module

Minimal PCIe AXI lite master module. Parametrizable interface width. Only supports aligned 32-bit operations, all other operations will result in a completer abort. Only supports 32-bit AXI lite.

pcie_msix module

MSI-X support module. Implements MSI-X table and pending bit array with AXI lite register interface, accepts interrupt requests on a streaming interface, and generates corresponding write request TLPs.

pcie_ptile_cfg module

Configuration shim for Intel Stratix 10 DX/Agilex series FPGAs (P-Tile).

pcie_ptile_if module

PCIe interface shim for Intel Stratix 10 DX/Agilex series FPGAs (P-Tile). Wrapper for all Intel Stratix 10 DX/Agilex PCIe interface shims.

pcie_ptile_if_rx module

PCIe interface shim (RX) for Intel Stratix 10 DX/Agilex series FPGAs (P-Tile).

pcie_ptile_if_tx module

PCIe interface shim (TX) for Intel Stratix 10 DX/Agilex series FPGAs (P-Tile).

pcie_s10_cfg module

Configuration shim for Intel Stratix 10 GX/SX/TX/MX series FPGAs (H-Tile/L-Tile).

pcie_s10_if module

PCIe interface shim for Intel Stratix 10 GX/SX/TX/MX series FPGAs (H-Tile/L-Tile). Wrapper for all Intel Stratix 10 GX/SX/TX/MX PCIe interface shims.

pcie_s10_if_rx module

PCIe interface shim (RX) for Intel Stratix 10 GX/SX/TX/MX series FPGAs (H-Tile/L-Tile).

pcie_s10_if_tx module

PCIe interface shim (TX) for Intel Stratix 10 GX/SX/TX/MX series FPGAs (H-Tile/L-Tile).

pcie_s10_msi module

MSI shim for Intel Stratix 10 GX/SX/TX/MX series FPGAs (H-Tile/L-Tile).

pcie_tlp_demux module

PCIe TLP demultiplexer module.

pcie_tlp_demux_bar module

PCIe TLP demultiplexer module. Wrapper for pcie_tlp_demux with parametrizable BAR ID matching logic.

pcie_tlp_fifo module

PCIe TLP FIFO module.

pcie_tlp_fifo_raw module

PCIe TLP FIFO module with raw non-destriped output.

pcie_tlp_fifo_mux module

PCIe TLP FIFO + multiplexer module.

pcie_tlp_mux module

PCIe TLP multiplexer module.

pcie_us_axi_dma module

PCIe AXI DMA module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length. Wrapper for pcie_us_axi_dma_rd and pcie_us_axi_dma_wr.

pcie_us_axi_dma_rd module

PCIe AXI DMA module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length.

pcie_us_axi_dma_wr module

PCIe AXI DMA module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length.

pcie_us_axi_master module

PCIe AXI master module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length. Wrapper for pcie_us_axi_master_rd and pcie_us_axi_master_wr.

pcie_us_axi_master_rd module

PCIe AXI master module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length.

pcie_us_axi_master_wr module

PCIe AXI master module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit datapaths. Parametrizable AXI burst length.

pcie_us_axil_master module

PCIe AXI lite master module for Xilinx UltraScale series FPGAs. Supports 64, 128, 256, and 512 bit PCIe interfaces.

pcie_us_axis_cq_demux module

Demux module for Xilinx UltraScale CQ interface. Can be used to route incoming requests based on function, BAR, and other fields. Supports 64, 128, 256, and 512 bit datapaths.

pcie_us_axis_rc_demux module

Demux module for Xilinx UltraScale RC interface. Can be used to route incoming completions based on the requester ID (function). Supports 64, 128, 256, and 512 bit datapaths.

pcie_us_cfg module

Configuration shim for Xilinx UltraScale series FPGAs.

pcie_us_if module

PCIe interface shim for Xilinx UltraScale series FPGAs. Wrapper for all Xilinx UltraScale PCIe interface shims.

pcie_us_if_cc module

PCIe interface shim (CC) for Xilinx UltraScale series FPGAs.

pcie_us_if_cq module

PCIe interface shim (CQ) for Xilinx UltraScale series FPGAs.

pcie_us_if_rc module

PCIe interface shim (RC) for Xilinx UltraScale series FPGAs.

pcie_us_if_rq module

PCIe interface shim (RQ) for Xilinx UltraScale series FPGAs.

pcie_us_msi module

MSI shim for Xilinx UltraScale series FPGAs.

priority_encoder module

Parametrizable priority encoder.

pulse_merge module

Parametrizable pulse merge module. Combines several single-cycle pulse status signals together.

Common signals

Common parameters

Source Files

arbiter.v                  : Parametrizable arbiter
axis_arb_mux.v             : Parametrizable AXI stream mux
dma_client_axis_sink.v     : AXI stream sink DMA client
dma_client_axis_source.v   : AXI stream source DMA client
dma_if_axi.v               : AXI DMA interface
dma_if_axi_rd.v            : AXI DMA interface (read)
dma_if_axi_wr.v            : AXI DMA interface (write)
dma_if_desc_mux.v          : DMA interface descriptor mux
dma_if_mux.v               : DMA interface mux
dma_if_mux_rd.v            : DMA interface mux (read)
dma_if_mux_wr.v            : DMA interface mux (write)
dma_if_pcie.v              : PCIe DMA interface
dma_if_pcie_rd.v           : PCIe DMA interface (read)
dma_if_pcie_wr.v           : PCIe DMA interface (write)
dma_if_pcie_us.v           : PCIe DMA interface for Xilinx UltraScale
dma_if_pcie_us_rd.v        : PCIe DMA interface for Xilinx UltraScale (read)
dma_if_pcie_us_wr.v        : PCIe DMA interface for Xilinx UltraScale (write)
dma_psdpram.v              : DMA RAM (segmented simple dual port RAM)
dma_psdpram_async.v        : DMA RAM (segmented simple dual port RAM)
dma_ram_demux.v            : DMA RAM demultiplexer
dma_ram_demux_rd.v         : DMA RAM demultiplexer (read)
dma_ram_demux_wr.v         : DMA RAM demultiplexer (write)
pcie_axi_dma_desc_mux.v    : Descriptor mux for DMA engine
pcie_axi_master.v          : PCIe AXI master module
pcie_axi_master_rd.v       : PCIe AXI master read module
pcie_axi_master_wr.v       : PCIe AXI master write module
pcie_axil_master.v         : PCIe AXI Lite master module
pcie_axil_master_minimal.v : PCIe AXI Lite master module (minimal)
pcie_msix.v                : PCIe MSI-X support module
pcie_ptile_cfg.v           : Configuration shim for Intel P-Tile
pcie_ptile_if.v            : PCIe interface shim (Intel P-Tile)
pcie_ptile_if_rx.v         : PCIe interface shim (RX) (Intel P-Tile)
pcie_ptile_if_tx.v         : PCIe interface shim (TX) (Intel P-Tile)
pcie_s10_cfg.v             : Configuration shim for Intel Stratix 10
pcie_s10_if.v              : PCIe interface shim (Intel Stratix 10)
pcie_s10_if_rx.v           : PCIe interface shim (RX) (Intel Stratix 10)
pcie_s10_if_tx.v           : PCIe interface shim (TX) (Intel Stratix 10)
pcie_s10_msi.v             : MSI shim for Intel Stratix 10 devices
pcie_tlp_demux.v           : PCIe TLP demultiplexer
pcie_tlp_demux_bar.v       : PCIe TLP demultiplexer (BAR ID)
pcie_tlp_fifo.v            : PCIe TLP FIFO
pcie_tlp_fifo_raw.v        : PCIe TLP FIFO (raw output)
pcie_tlp_fifo_mux.v        : PCIe TLP FIFO + multiplexer
pcie_tlp_mux.v             : PCIe TLP multiplexer
pcie_us_axi_dma.v          : PCIe AXI DMA module (Xilinx UltraScale)
pcie_us_axi_dma_rd.v       : PCIe AXI DMA read module (Xilinx UltraScale)
pcie_us_axi_dma_wr.v       : PCIe AXI DMA write module (Xilinx UltraScale)
pcie_us_axi_master.v       : PCIe AXI master module (Xilinx UltraScale)
pcie_us_axi_master_rd.v    : PCIe AXI master read module (Xilinx UltraScale)
pcie_us_axi_master_wr.v    : PCIe AXI master write module (Xilinx UltraScale)
pcie_us_axil_master.v      : PCIe AXI Lite master module (Xilinx UltraScale)
pcie_us_axis_cq_demux.v    : Parametrizable AXI stream CQ demux
pcie_us_axis_rc_demux.v    : Parametrizable AXI stream RC demux
pcie_us_cfg.v              : Configuration shim for Xilinx UltraScale devices
pcie_us_if.v               : PCIe interface shim (Xilinx UltraScale)
pcie_us_if_cc.v            : PCIe interface shim (CC) (Xilinx UltraScale)
pcie_us_if_cq.v            : PCIe interface shim (CQ) (Xilinx UltraScale)
pcie_us_if_rc.v            : PCIe interface shim (RC) (Xilinx UltraScale)
pcie_us_if_rq.v            : PCIe interface shim (RQ) (Xilinx UltraScale)
pcie_us_msi.v              : MSI shim for Xilinx UltraScale devices
priority_encoder.v         : Parametrizable priority encoder
pulse_merge                : Parametrizable pulse merge module

Testing

Running the included testbenches requires cocotb, cocotbext-axi, cocotbext-pcie, and Icarus Verilog. The testbenches can be run with pytest directly (requires cocotb-test), pytest via tox, or via cocotb makefiles.

verilog-pcie's People

Contributors

alexforencich avatar andreasbraun90 avatar basseuph avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

verilog-pcie's Issues

typo in Stratix 10 shim

Hi Alex, I found a typo in pcie_s10_if_tx.v, line 321:

"wr_req_fifo_wr_data[127:96] = tx_rd_req_tlp_hdr[31:0];"

Where "rd" should be replaced by "wr". ;)

FPGA versus FPGA_AXI Example Questions

Hi Alex,

Thanks for sharing your verilog-pcie library! I have been running simulations for the various components and examples as well as studying the source but still have a few questions. For all the Xilinx/AMD examples, there are fpga and fpga_axi versions. The fpga version uses the dma_psdpram segmented memory but the fpga_axi version does not. What are the pros and cons of these different implementations?

Although the fpga version with dma_psdpram has no client side interface, can a dma_if_axi.v interface be added to create a AXI4-MM master port for DMA traffic similar to the fpga_axi version? Are there any examples of this?

All the examples appear to use only 2 segments for the dma_psdpram. Is there any advantage to more segments?

Best Regards,

Steve Haynal

Communication between logic and host using PCIe

Hi,

I want to deliver data generated by a specific logic to Host using PCIe.
Can you advise on the necessary process for this?

Please understand if the question is strange because I am a beginner of FPGA programming.
Thank you for reading it.

pcie_disable_fatal_err.sh not working

Hi,
Our servers crashing many times with:

[ 5420.891904] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 5420.891949] {1}[Hardware Error]: event severity: fatal
[ 5420.891967] {1}[Hardware Error]:  Error 0, type: fatal
[ 5420.891984] {1}[Hardware Error]:   section_type: PCIe error
[ 5420.892002] {1}[Hardware Error]:   port_type: 4, root port
[ 5420.892020] {1}[Hardware Error]:   version: 3.0
[ 5420.892035] {1}[Hardware Error]:   command: 0x0547, status: 0x4010
[ 5420.892053] {1}[Hardware Error]:   device_id: 0000:5d:00.0
[ 5420.892070] {1}[Hardware Error]:   slot: 0
[ 5420.892082] {1}[Hardware Error]:   secondary_bus: 0x5e
[ 5420.892098] {1}[Hardware Error]:   vendor_id: 0x8086, device_id: 0x2030
[ 5420.892117] {1}[Hardware Error]:   class_code: 000406
[ 5420.892132] {1}[Hardware Error]:   bridge: secondary_status: 0x2000, control: 0x0003
[ 5420.892156] Kernel panic - not syncing: Fatal hardware error!

model: PowerEdge FC640
NVMe: Intel
OS: CentOS Linux release 7.4.1708

i tried to use pcie_disable_fatal_err.sh but it's not working with errors of:

line 33: "0x0547" & ~0x0100: syntax error: operand expected (error token is ""0x0547" & ~0x0100")
line 40: "0x012e" & ~0x0004: syntax error: operand expected (error token is ""0x012e" & ~0x0004")

Full output:

[root@foo001 ~]# bash -x pcie_disable_fatal_err.sh 88:00.0
+ dev=88:00.0
+ '[' -z 88:00.0 ']'
+ '[' '!' -e /sys/bus/pci/devices/88:00.0 ']'
+ dev=0000:88:00.0
+ '[' '!' -e /sys/bus/pci/devices/0000:88:00.0 ']'
++++ readlink /sys/bus/pci/devices/0000:88:00.0
+++ dirname ../../../devices/pci0000:85/0000:85:02.0/0000:88:00.0
++ basename ../../../devices/pci0000:85/0000:85:02.0
+ port=0000:85:02.0
+ '[' '!' -e /sys/bus/pci/devices/0000:85:02.0 ']'
+ echo 'Disabling fatal error reporting on port 0000:85:02.0...'
Disabling fatal error reporting on port 0000:85:02.0...
++ setpci -s 0000:85:02.0 COMMAND
+ cmd=0547
+ echo Command: 0547
Command: 0547
pcie_disable_fatal_err.sh: line 33: "0x0547" & ~0x0100: syntax error: operand expected (error token is ""0x0547" & ~0x0100")
+ setpci -s 0000:85:02.0 COMMAND=
setpci: Missing value.
Try `setpci --help' for more information.
++ setpci -s 0000:85:02.0 CAP_EXP+8.w
+ ctrl=012e
+ echo 'Device control:' 012e
Device control: 012e
pcie_disable_fatal_err.sh: line 40: "0x012e" & ~0x0004: syntax error: operand expected (error token is ""0x012e" & ~0x0004")
+ setpci -s 0000:85:02.0 CAP_EXP+8.w=
setpci: Missing value.
Try `setpci --help' for more information.

Module not found issue comming

Hello team,

Tried to run test after making vivado project. while running make in tb/fpga_core folder below error is coming.

make results.xml
make[1]: Entering directory '/media/joy/D/cocotb/cocotb_work/cocotbext-pcie/verilog-pcie/example/ZCU106/fpga_axi/tb/fpga_core'
mkdir -p sim_build
/usr/bin/iverilog -o sim_build/sim.vvp -D COCOTB_SIM=1 -s fpga_core -P fpga_core.AXIS_PCIE_DATA_WIDTH=128 -P fpga_core.AXIS_PCIE_KEEP_WIDTH=4 -P fpga_core.AXIS_PCIE_RQ_USER_WIDTH=62 -P fpga_core.AXIS_PCIE_RC_USER_WIDTH=75 -P fpga_core.AXIS_PCIE_CQ_USER_WIDTH=88 -P fpga_core.AXIS_PCIE_CC_USER_WIDTH=33 -P fpga_core.RQ_SEQ_NUM_WIDTH=6 -f sim_build/cmds.f -g2012   ../../rtl/fpga_core.v ../../rtl/axi_ram.v ../../rtl/axis_register.v ../../lib/pcie/rtl/axis_arb_mux.v ../../lib/pcie/rtl/pcie_us_axil_master.v ../../lib/pcie/rtl/pcie_us_axi_dma.v ../../lib/pcie/rtl/pcie_us_axi_dma_rd.v ../../lib/pcie/rtl/pcie_us_axi_dma_wr.v ../../lib/pcie/rtl/pcie_us_axi_master.v ../../lib/pcie/rtl/pcie_us_axi_master_rd.v ../../lib/pcie/rtl/pcie_us_axi_master_wr.v ../../lib/pcie/rtl/pcie_us_axis_cq_demux.v ../../lib/pcie/rtl/pcie_us_cfg.v ../../lib/pcie/rtl/pcie_us_msi.v ../../lib/pcie/rtl/arbiter.v ../../lib/pcie/rtl/priority_encoder.v ../../lib/pcie/rtl/pulse_merge.v
:0: warning: parameter RQ_SEQ_NUM_WIDTH not found in fpga_core.
../../rtl/fpga_core.v:175: warning: Port 20 (enable) of pcie_us_axis_cq_demux expects 1 bits, got 32.
../../rtl/fpga_core.v:175:        : Pruning (signed) 31 high bits of the expression.
../../rtl/fpga_core.v:175: warning: Port 21 (drop) of pcie_us_axis_cq_demux expects 1 bits, got 32.
../../rtl/fpga_core.v:175:        : Pruning (signed) 31 high bits of the expression.
../../rtl/fpga_core.v:229: warning: Port 8 (s_axis_tid) of axis_arb_mux expects 16 bits, got 32.
../../rtl/fpga_core.v:229:        : Pruning (signed) 16 high bits of the expression.
../../rtl/fpga_core.v:229: warning: Port 9 (s_axis_tdest) of axis_arb_mux expects 16 bits, got 32.
../../rtl/fpga_core.v:229:        : Pruning (signed) 16 high bits of the expression.
../../rtl/fpga_core.v:778: warning: Port 4 (s_axi_awaddr) of axi_ram expects 16 bits, got 32.
../../rtl/fpga_core.v:778:        : Pruning 16 high bits of the expression.
../../rtl/fpga_core.v:778: warning: Port 23 (s_axi_araddr) of axi_ram expects 16 bits, got 32.
../../rtl/fpga_core.v:778:        : Pruning 16 high bits of the expression.
../../rtl/fpga_core.v:835: warning: Port 8 (s_axis_tid) of axis_register expects 8 bits, got 32.
../../rtl/fpga_core.v:835:        : Pruning (signed) 24 high bits of the expression.
../../rtl/fpga_core.v:835: warning: Port 9 (s_axis_tdest) of axis_register expects 8 bits, got 32.
../../rtl/fpga_core.v:835:        : Pruning (signed) 24 high bits of the expression.
../../rtl/fpga_core.v:998: warning: Port 4 (s_axi_awaddr) of axi_ram expects 16 bits, got 32.
../../rtl/fpga_core.v:998:        : Pruning 16 high bits of the expression.
../../rtl/fpga_core.v:998: warning: Port 23 (s_axi_araddr) of axi_ram expects 16 bits, got 32.
../../rtl/fpga_core.v:998:        : Pruning 16 high bits of the expression.
../../rtl/fpga_core.v:1067: warning: Port 5 (cfg_interrupt_msi_vf_enable) of pcie_us_msi expects 8 bits, got 32.
../../rtl/fpga_core.v:1067:        : Pruning (signed) 24 high bits of the expression.
MODULE=test_fpga_core TESTCASE= TOPLEVEL=fpga_core TOPLEVEL_LANG=verilog \
        /usr/bin/vvp -M /usr/local/lib/python3.8/dist-packages/cocotb/libs -m libcocotbvpi_icarus   sim_build/sim.vvp -fst
     -.--ns INFO     cocotb.gpi                         ..mbed/gpi_embed.cpp:74   in set_program_name_in_venv        Did not detect Python virtual environment. Using system-wide Python interpreter
     -.--ns INFO     cocotb.gpi                         ../gpi/GpiCommon.cpp:105  in gpi_print_registered_impl       VPI registered
     -.--ns INFO     cocotb.gpi                         ..mbed/gpi_embed.cpp:244  in embed_sim_init                  Python interpreter initialized and cocotb loaded!
     0.00ns INFO     cocotb                                      __init__.py:202  in _initialise_testbench           Running on Icarus Verilog version 10.3 (stable)
     0.00ns INFO     cocotb                                      __init__.py:208  in _initialise_testbench           Running tests with cocotb v1.4.0 from /usr/local/lib/python3.8/dist-packages/cocotb
     0.00ns INFO     cocotb                                      __init__.py:229  in _initialise_testbench           Seeding Python random module with 1615876260
     0.00ns CRITICAL cocotb.regression                         regression.py:177  in _discover_tests                 Failed to import module test_fpga_core: No module named 'cocotb_test'
     0.00ns INFO     cocotb.regression                         regression.py:178  in _discover_tests                 MODULE variable was "test_fpga_core"
     0.00ns INFO     cocotb.regression                         regression.py:179  in _discover_tests                 Traceback: 
     0.00ns INFO     cocotb.regression                         regression.py:180  in _discover_tests                 Traceback (most recent call last):
                                                                                                                       File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 175, in _discover_tests
                                                                                                                         module = _my_import(module_name)
                                                                                                                       File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 69, in _my_import
                                                                                                                         mod = __import__(name)
                                                                                                                       File "/media/joy/D/cocotb/cocotb_work/cocotbext-pcie/verilog-pcie/example/ZCU106/fpga_axi/tb/fpga_core/test_fpga_core.py", line 28, in <module>
                                                                                                                         import cocotb_test.simulator
                                                                                                                     ModuleNotFoundError: No module named 'cocotb_test'
                                                                                                                     
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/cocotb/__init__.py", line 246, in _initialise_testbench
    regression_manager = RegressionManager.from_discovery(dut)
  File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 154, in from_discovery
    return cls(dut, tests, hooks)
  File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 126, in __init__
    for test in tests:
  File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 175, in _discover_tests
    module = _my_import(module_name)
  File "/usr/local/lib/python3.8/dist-packages/cocotb/regression.py", line 69, in _my_import
    mod = __import__(name)
  File "/media/joy/D/cocotb/cocotb_work/cocotbext-pcie/verilog-pcie/example/ZCU106/fpga_axi/tb/fpga_core/test_fpga_core.py", line 28, in <module>
    import cocotb_test.simulator
ModuleNotFoundError: No module named 'cocotb_test'
     0.00ns ERROR    cocotb.gpi                                gpi_embed.cpp:314  in embed_sim_init                  cocotb initialization failed - exiting
     0.00ns ERROR    cocotb.scheduler                            __init__.py:269  in _sim_event                      Failing test at simulator request before test run completion: Simulator shutdown prematurely
ERROR: results.xml was not written by the simulation!
make[1]: *** [/usr/local/lib/python3.8/dist-packages/cocotb/share/makefiles/simulators/Makefile.icarus:69: results.xml] Error 1
make[1]: Leaving directory '/media/joy/D/cocotb/cocotb_work/cocotbext-pcie/verilog-pcie/example/ZCU106/fpga_axi/tb/fpga_core'
make: *** [/usr/local/lib/python3.8/dist-packages/cocotb/share/makefiles/Makefile.inc:40: sim] Error 2

Please update me asap !!

Support for Xilinx soft PCIe PHY?

Hello,

I'm wondering if parts of this project can be used to control the Xilinx soft PCIe PHY (PG239)? As stated in the description, only the hard IP is supported.

Any experience with that core, or do you known some other resources for that?

Thank you! 👍

Implementation of PCIe “Non-Transparent Bridge" (NTB)

PCIe interconnect with multiple PCIe Endpoints and hosts (FPGA, DPU, CPU, GPU, ...) can be organized in several topologies.

A NTB implementation for FPGA will enable several scenarios like

  • fail-over support,
  • cross-domain communication support at PCIe level without relying on ETH PHY and the way to support these others topologies.

Driver and Verilog mismatch

Driver not compatible with the verilog code?

Driver:

static void dma_cpl_buf_test(struct example_dev *edev, dma_addr_t dma_addr,
		u64 size, u64 stride, u64 count, int stall)
{
	unsigned long t;
	u64 cycles;
	u32 rd_req;
	u32 rd_cpl;

	rd_req = ioread32(edev->bar[0] + 0x000020);
	rd_cpl = ioread32(edev->bar[0] + 0x000024);

	// DMA base address
	iowrite32(dma_addr & 0xffffffff, edev->bar[0] + 0x001080);
	iowrite32((dma_addr >> 32) & 0xffffffff, edev->bar[0] + 0x001084);
	// DMA offset address
	iowrite32(0, edev->bar[0] + 0x001088);
	iowrite32(0, edev->bar[0] + 0x00108c);
	// DMA offset mask
	iowrite32(0x3fff, edev->bar[0] + 0x001090);
	iowrite32(0, edev->bar[0] + 0x001094);

I see 0x20... and lots of 0x1xxx addresses but they don't correspond to anything in the verilog code:

image

Why?

make:No rule to make target

Hello, I am a novice FPGA user. I am trying to execute 'make' in the verilog-pcie-master/example/AU50/fpga/tb/fpga_core directory. Here is the content of the makefile:

Copyright (c) 2020 Alex Forencich

Permission is hereby granted, free of charge, to any person obtaining a copy

of this software and associated documentation files (the "Software"), to deal

in the Software without restriction, including without limitation the rights

to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

copies of the Software, and to permit persons to whom the Software is

furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in

all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN

THE SOFTWARE.

TOPLEVEL_LANG = verilog

SIM ?= icarus
WAVES ?= 0

COCOTB_HDL_TIMEUNIT = 1ns
COCOTB_HDL_TIMEPRECISION = 1ps

DUT = fpga_core
TOPLEVEL = $(DUT)
MODULE = test_$(DUT)
VERILOG_SOURCES += ../../rtl/$(DUT).v
VERILOG_SOURCES += ../../rtl/common/example_core_pcie_us.v
VERILOG_SOURCES += ../../rtl/common/example_core_pcie.v
VERILOG_SOURCES += ../../rtl/common/example_core.v
VERILOG_SOURCES += ../../rtl/common/axi_ram.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_if.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_if_rc.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_if_rq.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_if_cq.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_if_cc.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_us_cfg.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_axil_master.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_axi_master.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_axi_master_rd.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_axi_master_wr.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_tlp_demux_bar.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_tlp_demux.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_tlp_mux.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_tlp_fifo.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_tlp_fifo_raw.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pcie_msix.v
VERILOG_SOURCES += ../../lib/pcie/rtl/dma_if_pcie.v
VERILOG_SOURCES += ../../lib/pcie/rtl/dma_if_pcie_rd.v
VERILOG_SOURCES += ../../lib/pcie/rtl/dma_if_pcie_wr.v
VERILOG_SOURCES += ../../lib/pcie/rtl/dma_psdpram.v
VERILOG_SOURCES += ../../lib/pcie/rtl/priority_encoder.v
VERILOG_SOURCES += ../../lib/pcie/rtl/pulse_merge.v

module parameters

export PARAM_AXIS_PCIE_DATA_WIDTH := 512
export PARAM_AXIS_PCIE_KEEP_WIDTH := $(shell expr $(PARAM_AXIS_PCIE_DATA_WIDTH) / 32 )
export PARAM_AXIS_PCIE_RQ_USER_WIDTH := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),62,137)
export PARAM_AXIS_PCIE_RC_USER_WIDTH := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),75,161)
export PARAM_AXIS_PCIE_CQ_USER_WIDTH := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),88,183)
export PARAM_AXIS_PCIE_CC_USER_WIDTH := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),33,81)
export PARAM_RC_STRADDLE := $(if $(filter-out 256 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),0,1)
export PARAM_RQ_STRADDLE := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),0,1)
export PARAM_CQ_STRADDLE := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),0,1)
export PARAM_CC_STRADDLE := $(if $(filter-out 512,$(PARAM_AXIS_PCIE_DATA_WIDTH)),0,1)
export PARAM_RQ_SEQ_NUM_WIDTH := 6
export PARAM_RQ_SEQ_NUM_ENABLE := 1
export PARAM_PCIE_TAG_COUNT := 64
export PARAM_BAR0_APERTURE := 24
export PARAM_BAR2_APERTURE := 24
export PARAM_BAR4_APERTURE := 16

ifeq ($(SIM), icarus)
PLUSARGS += -fst

COMPILE_ARGS += $(foreach v,$(filter PARAM_%,$(.VARIABLES)),-P $(TOPLEVEL).$(subst PARAM_,,$(v))=$($(v)))

ifeq ($(WAVES), 1)
	VERILOG_SOURCES += iverilog_dump.v
	COMPILE_ARGS += -s iverilog_dump
endif

else ifeq ($(SIM), verilator)
COMPILE_ARGS += -Wno-SELRANGE -Wno-WIDTH

COMPILE_ARGS += $(foreach v,$(filter PARAM_%,$(.VARIABLES)),-G$(subst PARAM_,,$(v))=$($(v)))

ifeq ($(WAVES), 1)
	COMPILE_ARGS += --trace-fst
endif

endif

include $(shell cocotb-config --makefiles)/Makefile.sim

iverilog_dump.v:
echo 'module iverilog_dump();' > $@
echo 'initial begin' >> $@
echo ' $$dumpfile("$(TOPLEVEL).fst");' >> $@
echo ' $$dumpvars(0, $(TOPLEVEL));' &gt;&gt; $@
echo 'end' >> $@
echo 'endmodule' >> $@

clean::
@rm -rf iverilog_dump.v
@rm -rf dump.fst $(TOPLEVEL).fst

But I encountered the following error:
rm -f results.xml
make -f Makefile results.xml
make[1]: Entering directory '/home/jalen/verilog-pcie-master/example/AU50/fpga/tb/fpga_core'
make[1]: *** No rule to make target '../../rtl/common/example_core_pcie_us.v', needed by 'sim_build/sim.vvp'. Stop.
make[1]: Leaving directory '/home/jalen/verilog-pcie-master/example/AU50/fpga/tb/fpga_core'
make: *** [/home/jalen/.local/lib/python3.10/site-packages/cocotb/share/makefiles/Makefile.inc:40: sim] Error 2
It seems that the .v files under the corresponding path are not found when copying VERILOG_SOURCES. However, I made sure that I downloaded the complete compressed package and extracted it. Could you please advise on how to solve this problem or identify which step I might have done wrong? Looking forward to your answer!"

Why is test data mismatching in example_driver

I've built the fpga.bit for Alveo U50 and written bitstream to the card. Then I loaded the mod of example_driver and it seemed to be running.
But unfortunately, I couldn't test the dma_cpl_buf_test because of the test data mismatching.

dmesg:

Details


[ 112.305279] edev 0000:18:00.0: Read status

[ 112.305281] edev 0000:18:00.0: 00000001

[ 4152.782667] edev 0000:18:00.0: edev remove

[ 4152.782671] edev 0000:18:00.0: Interrupt

[ 4152.782789] edev 0000:18:00.0: Unmapped BAR[0]

[ 4152.782791] edev 0000:18:00.0: Unmapped BAR[2]

[ 4152.782793] edev 0000:18:00.0: Unmapped BAR[4]

[ 4168.930886] edev 0000:18:00.0: edev probe

[ 4168.930888] edev 0000:18:00.0: Vendor: 0x1234

[ 4168.930888] edev 0000:18:00.0: Device: 0x0001

[ 4168.930889] edev 0000:18:00.0: Subsystem vendor: 0x10ee

[ 4168.930890] edev 0000:18:00.0: Subsystem device: 0x9037

[ 4168.930890] edev 0000:18:00.0: Class: 0x058000

[ 4168.930891] edev 0000:18:00.0: PCI ID: 0000:18:00.0

[ 4168.930895] edev 0000:18:00.0: Max payload size: 256 bytes

[ 4168.930895] edev 0000:18:00.0: Max read request size: 512 bytes

[ 4168.930896] edev 0000:18:00.0: Read completion boundary: 64 bytes

[ 4168.930897] edev 0000:18:00.0: Link capability: gen 3 x16

[ 4168.930897] edev 0000:18:00.0: Link status: gen 3 x8

[ 4168.930898] edev 0000:18:00.0: Relaxed ordering: enabled

[ 4168.930899] edev 0000:18:00.0: Phantom functions: disabled

[ 4168.930899] edev 0000:18:00.0: Extended tags: enabled

[ 4168.930900] edev 0000:18:00.0: No snoop: enabled

[ 4168.930900] edev 0000:18:00.0: NUMA node: 0

[ 4168.930905] edev 0000:18:00.0: 63.008 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x8 link at 0000:16:02.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link)

[ 4168.930908] edev 0000:18:00.0: Allocated DMA region virt 00000000be827f9a, phys 00000000f388d7f4

[ 4168.930918] edev 0000:18:00.0: BAR[0] 0x39fffe000000-0x39fffeffffff flags 0x0014220c

[ 4168.930919] edev 0000:18:00.0: BAR[2] 0x39fffd000000-0x39fffdffffff flags 0x0014220c

[ 4168.930920] edev 0000:18:00.0: BAR[4] 0x39ffff000000-0x39ffff00ffff flags 0x0014220c

[ 4168.930932] edev 0000:18:00.0: BAR[0] mapped at 0x000000004093ab23 with length 16777216

[ 4168.930937] edev 0000:18:00.0: BAR[2] mapped at 0x0000000075611646 with length 16777216

[ 4168.930940] edev 0000:18:00.0: BAR[4] mapped at 0x00000000da4696c3 with length 65536

[ 4168.931125] edev 0000:18:00.0: write to BAR2

[ 4168.931126] edev 0000:18:00.0: read from BAR2

[ 4168.931127] edev 0000:18:00.0: 11223344

[ 4168.931128] edev 0000:18:00.0: write test data

[ 4168.931129] edev 0000:18:00.0: read test data

[ 4168.931130] 00000000be827f9a: 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f ................

[ 4168.931130] 000000004d7ca939: 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f ................

[ 4168.931131] 000000000117967b: 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f !"#$%&'()*+,-./

[ 4168.931132] 0000000037f89332: 30 31 32 33 34 35 36 37 38 39 3a 3b 3c 3d 3e 3f 0123456789:;<=>?

[ 4168.931132] 00000000d1aee458: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f @abcdefghijklmno

[ 4168.931133] 000000008fb80d1c: 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f PQRSTUVWXYZ[\]^_

[ 4168.931133] 00000000871e84fb: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f `abcdefghijklmno

[ 4168.931134] 00000000daee07c3: 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f pqrstuvwxyz{|}~.

[ 4168.931134] 000000006eafa1da: 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f ................

[ 4168.931135] 00000000d3dd3f46: 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f ................

[ 4168.931135] 000000001ffff600: a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af ................

[ 4168.931136] 00000000e858a5ae: b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf ................

[ 4168.931136] 00000000e3ac005e: c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf ................

[ 4168.931137] 0000000062d160df: d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df ................

[ 4168.931137] 00000000c8c5069d: e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef ................

[ 4168.931138] 00000000ea855a29: f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff ................

[ 4168.931138] edev 0000:18:00.0: check DMA enable

[ 4168.931140] edev 0000:18:00.0: 00000001

[ 4168.931140] edev 0000:18:00.0: enable DMA

[ 4168.931141] edev 0000:18:00.0: check DMA enable

[ 4168.931142] edev 0000:18:00.0: 00000001

[ 4168.931142] edev 0000:18:00.0: enable interrupts

[ 4168.931143] edev 0000:18:00.0: start copy to card

[ 4168.931148] edev 0000:18:00.0: Interrupt

[ 4168.933737] edev 0000:18:00.0: Read status

[ 4168.933738] edev 0000:18:00.0: 00000001

[ 4168.933739] edev 0000:18:00.0: 800000aa

[ 4168.933740] edev 0000:18:00.0: start copy to host

[ 4168.933742] edev 0000:18:00.0: Interrupt

[ 4168.936621] edev 0000:18:00.0: Read status

[ 4168.936623] edev 0000:18:00.0: 00000001

[ 4168.936624] edev 0000:18:00.0: 80000055

[ 4168.936625] edev 0000:18:00.0: read test data

[ 4168.936626] 0000000058afc124: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

[ 4168.936626] 00000000a8eb7c77: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

[ 4168.936627] 000000006fd472b1: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

[ 4168.936627] 000000003ec4c5b8: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................

[ 4168.936628] 000000007fc4777f: 40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f @abcdefghijklmno

[ 4168.936628] 00000000abaa1662: 50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f PQRSTUVWXYZ[\]^_

[ 4168.936629] 00000000f70d419a: 60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f `abcdefghijklmno

[ 4168.936629] 0000000062f23c1e: 70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f pqrstuvwxyz{|}~.

[ 4168.936630] 0000000068817542: 80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f ................

[ 4168.936631] 00000000931cb7b7: 90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f ................

[ 4168.936631] 0000000020dda83f: a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af ................

[ 4168.936632] 00000000304a4f7d: b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf ................

[ 4168.936632] 00000000d60ca053: c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf ................

[ 4168.936633] 0000000007871f63: d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df ................

[ 4168.936633] 00000000ac8249b3: e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef ................

[ 4168.936634] 000000001232ed2a: f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff ................

[ 4168.936634] edev 0000:18:00.0: test data mismatch

[ 4168.936635] edev 0000:18:00.0: start immediate write to host

[ 4168.936637] edev 0000:18:00.0: Interrupt

[ 4168.939621] edev 0000:18:00.0: Read status

[ 4168.939622] edev 0000:18:00.0: 00000001

[ 4168.939623] edev 0000:18:00.0: 800000aa

[ 4168.939624] edev 0000:18:00.0: read data

[ 4168.939625] 11 22 33 44 ."3D

[ 4168.939625] edev 0000:18:00.0: Read status

[ 4168.939626] edev 0000:18:00.0: 00000001

I don't know why the first 64 bytes (dma_region + 0x200) are all zero! It really confused me for a long time. I want to know which possible reasons caused it and how to deal with it.

Thx.

AXI dma client support?

Hi alex:

I want to support AXI dma client. I think that you recommend to implement something like dma_client_axi_sink/dma_client_axi_source. Is that ture?

As far as I know, there is also a choice to use pcie_us_axi_dma, but you do not recomend it. From #31, you said that " I don't recommend using it, as it only supports Xilinx US/US+ and it has some significant performance limitations due to how AXI works."

I have 3 questions:

  1. why it only support Xilinx US/US+ but not FPGA-independent interface?
  2. why it have significant performance limitations, can you tell me in detail?
  3. what method you will recommand? (if possible, can you analysis the pros and the cons?)

Intel PCIe IP model

Hi,

Would you be open to add support for the Intel PCIe IP cores? If so, I might be able to contribute that.

Do you have any idea of how one might best integrate a new pinout and behaviour? While the PCIe BFM probably can remain 99% intact (I assume?), the interfaces from the core are quite different.

If you're curious, the interfaces are described here under "Key Interfaces".

Make failure for example VU1525/fpga_axi/driver

Hi Alex,

First thanks for provide this source code!

I'm looking for a linux host driver for FPGA PCIe. I was trying to use your driver under example/VU1525/fpga_axi/driver. When I make under that path, I encounter the error stating: implicit declaration of function ‘pci_alloc_irq_vectors'. I'm wondering if this can be solved by adding extra library in the include section. The full error message is attached below:

$ make
make: Warning: File 'Makefile' has modification time 593811147 s in the future
make -C /lib/modules/3.10.0-693.2.2.el7.x86_64/build M=/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver modules
make[1]: Entering directory '/usr/src/kernels/3.10.0-693.2.2.el7.x86_64'
make[1]: Warning: File 'arch/x86/Makefile' has modification time 518859695 s in the future
make[2]: Warning: File 'scripts/Makefile.lib' has modification time 518856459 s in the future
CC [M] /AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.o
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c: In function ‘edev_probe’:
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:134:11: error: implicit declaration of function ‘pci_alloc_irq_vectors’ [-Werror=implicit-function-declaration]
ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
^~~~~~~~~~~~~~~~~~~~~
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:134:46: error: ‘PCI_IRQ_MSI’ undeclared (first use in this function)
ret = pci_alloc_irq_vectors(pdev, 1, 32, PCI_IRQ_MSI);
^~~~~~~~~~~
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:134:46: note: each undeclared identifier is reported only once for each function it appears in
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:142:11: error: implicit declaration of function ‘pci_request_ir ’ [-Werror=implicit-function-declaration]
ret = pci_request_irq(pdev, 0, edev_intr, 0, edev, "edev");
^~~~~~~~~~~~~~~
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:230:5: error: implicit declaration of function ‘pci_free_irq_vectors’ [-Werror=implicit-function-declaration]
pci_free_irq_vectors(pdev);
^~~~~~~~~~~~~~~~~~~~
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c: In function ‘edev_remove’:
/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.c:254:5: error: implicit declaration of function ‘pci_free_irq’ [-Werror=implicit-function-declaration]
pci_free_irq(pdev, 0, edev);
^~~~~~~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:341: recipe for target '/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.o' failed
make[2]: *** [/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver/example_driver.o] Error 1
Makefile:1305: recipe for target 'module/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver' failed
make[1]: *** [module/AlexForencich/verilog-pcie/example/VCU1525/fpga_axi/driver] Error 2
make[1]: Leaving directory '/usr/src/kernels/3.10.0-693.2.2.el7.x86_64'
Makefile:7: recipe for target 'all' failed
make: *** [all] Error 2

Package tb/pcie.py as a pip package

Hi,

I'm starting to figure out how I want to do my testbenches for my project - and your pcie.py is really impressive and useful. It seems to be standalone to the point where that's the only thing one would need to simulate PCIe transactions, and in my case the only thing I would have to use I think.

That made me think: Would you be open for breaking that file (and potentially the AXI and US stuff as well, although I don't need that myself right now) out to its own library and make it a pip3 module? That way we can both use it and patches would go to a singular place.

How to test pcie_us_axi_dma on board?

Hi,

We want to use pcie_us_axi_dma in below mentioned 2 scenarios:

1. To access DDR memory from HOST, using pcie_us_axi_dma
2. To access DDR memory from Xilinx APU / RPU, using pcie_us_axi_dma

Can you please give us some pointers to start with this?
Like,

  • The required registers, the sequence to access those registers and possible values to be written in the registers
  • Any example C code or library

I have two questions ,can someone help me

Hello, I am a beginner. I want to use these open source module to build a module similar to xilinx's XDMA IP function. Regarding the pcie_axi_master module, I saw the explanation you gave is :It can be used to terminate device-to-device DMA operations with reasonable performance.I don't understand what this terminate means.can you help me ?And How can I use these modules to combine into a function similar to XDMA? I am just learning it as an individual and do not need to use it for other purposes.

Standalone use

I'm interested in using this library as a version of a board-to-board interconnect, sort of like Xilinx's chip2chip. One device would be connected to a host with a normal pcie connection, but other devices would hang off that one. From the perspective of the host there would only be one device, but the address space would span all the other boards as well because they would route axi transactions through this library to custom phys.

Would it be practical for me to write my own phy layer, possibly running at lower speeds than pcie typically runs? Would all the configuration signals (cfg_*) between pcie_us_cfg and the phy need to be functional (besides fairly obvious ones like link status)?

How much configuration is really required to get axi transactions to go across the physical interface? If I connected the physical interfaces of two boards running this core, would I be able to pass axi transactions between them without much configuration?

Segmented memory address bug caused by ram_mask_1_reg assertion in dma_if_pcie_rd.v

Thanks for this great project!

I am prototyping a system that uses DMA on an Alveo U250 FPGA, Vivado 2023.2, and Rev. 23 of the Ultrascale+ Integrated Block for PCIe express. My setup uses pcie_us_if + dma_if_pcie + dma_psdpram to perform r/w from a host which I believe is the suggested configuration. Issuing DMA reads less than 64 bytes works perfectly, and the data is written to the psdpram correctly. Issuing reads greater than 64 bytes works except for an incorrect first address generated by dma_if_pcie in its write of received data to the psdpram.

Here is the ILA of the first batch of writes from dma_if_pcie_rd to psdpram in completion of a 256 byte read. The first block of 64 bytes in this read is 01010101....., the second block of 64 is 02020202......, the third block of 64 is 0303030303......., the fourth block of 64 is 040404040...... destined for address 8192 of the psdpram, which I believe should be 0x40, of the even chunk of the segmented memory.
pcie_wr_ila
The address generated is incorrect, specifying 0x41 of the even chunks of the segmented memory. Oddly, the successive chunks are placed at the correct addresses. The ram segment address width is 11 bits, and the data is written at the following locations:
01010101.... -> 0x41 (even chunk)
02020202.... -> 0x40 (odd chunk)
03030303.... -> 0x41 (even chunk)
04040404.... -> 0x41 (odd chunk)
This results in data loss where the 64 bytes at address 8192 remain their default state of 0.

Digging into the code in dma_if_pcie_rd it seems that for some reason this block of code is being activated:

if (ram_mask_1_reg[i]) begin
    ram_wr_cmd_addr_pipe_next[i*RAM_SEG_ADDR_WIDTH +: RAM_SEG_ADDR_WIDTH] = addr_delay_reg[RAM_ADDR_WIDTH-1:RAM_ADDR_WIDTH-RAM_SEG_ADDR_WIDTH]+1;
end

This is explains where the +1 is coming from in the first address but there is no reason for the mask register to be asserted I believe. I have not been able to piece together exactly what the issue is yet, but by removing that block of code I get the following from the ILA:

pcie_wr_ila_good

The addresses are correct and things seem to be working. I am sure that removing that block of code breaks all sorts of other functionality (anything that is not aligned probably), but luckily what I am doing so far does not call for that. Would like to find a full solution if possible though!

Here are the full config parameters of my setup:

AXIS_PCIE_DATA_WIDTH = 512,
AXIS_PCIE_KEEP_WIDTH = 16,
AXIS_PCIE_RC_USER_WIDTH = 161,
AXIS_PCIE_RQ_USER_WIDTH = 137,
AXIS_PCIE_CQ_USER_WIDTH = 183,
AXIS_PCIE_CC_USER_WIDTH = 81,
RC_STRADDLE = 1,
RQ_STRADDLE = 1,
CQ_STRADDLE = 1,
CC_STRADDLE = 1,
RQ_SEQ_NUM_WIDTH = 6,
RQ_SEQ_NUM_ENABLE = 1,
IMM_ENABLE = 0,
IMM_WIDTH = 32,
PCIE_TAG_COUNT = 256,
READ_OP_TABLE_SIZE = 256,
READ_TX_LIMIT = 32,
READ_CPLH_FC_LIMIT = 512,
READ_CPLD_FC_LIMIT = 2048,
WRITE_OP_TABLE_SIZE = 32,
WRITE_TX_LIMIT = 32,
TLP_DATA_WIDTH = 512,
TLP_STRB_WIDTH = 16,
TLP_HDR_WIDTH = 128,
TLP_SEG_COUNT = 1,
TX_SEQ_NUM_COUNT = 2,
TX_SEQ_NUM_WIDTH = 5,
TX_SEQ_NUM_ENABLE = 1,
PF_COUNT = 1,
VF_COUNT = 0,
F_COUNT = PF_COUNT+VF_COUNT,
TLP_FORCE_64_BIT_ADDR = 0,
CHECK_BUS_NUMBER = 0,
RAM_SEL_WIDTH = 2,
RAM_ADDR_WIDTH = 18,
RAM_SEG_COUNT = 2,
RAM_SEG_DATA_WIDTH = 512,
RAM_SEG_BE_WIDTH = 64,
RAM_SEG_ADDR_WIDTH = 11,
AXI_DATA_WIDTH = 512,
AXI_STRB_WIDTH = 64,
AXI_ADDR_WIDTH = 26,
AXI_ID_WIDTH = 8,
PCIE_ADDR_WIDTH = 64,
LEN_WIDTH = 16,
TAG_WIDTH = 8

Update: There seems to be a similar or identical issue with the legacy ultracale+ design as well.

dma + axi memory

Hi Alex! First of all, congrats and thanks for all your great work with verilog_* repositories and corundum.

I have been testing verilog-pcie in an Alveo U25 platform (using AU200 as baseline) successfully communicating host (a workstation running Linux) and card (the AU25 connected via PCIe to the host). In these tests, I configure the pcie dma from the host side via the driver running on the Linux host and move data in both directions (host -> card and card -> host). In particular, the data in the card side is stored in the psdpram (embedded in the fpga). All this using the "standard fpga" version (not the fpga_axi one).

However, I find myself now in the following situation: I need to move data host -> fpga, do some processing on the fpga side and then move the results back to the host, being the size of the data in the order several MBs (> 10 MB). I cannot work with a dma psdpram embedded in the fpga now because of fpga resource limitation. My plan is to use the PS DDR (AXI) as the dma memory instead. Then I wonder:

  • Is it the best option for my scenario to use fpga_axi instead? This way the dma could interface the PS DDR directly via its AXI ports (HP0 for instance). I know you mention in some issues that it is legacy and less performant because of AXI working principles (in fact I think it has been removed from the repo), but in my case I have no choice than using AXI interface. Is it still a bad idea to use fpga_axi?
  • I also see an axi ram connected to bar2, but I think I cannot use the dma in that case and would have to move data with iowrite/ioread word by word from the host (which does not seem feasible in performance).
  • Is there any alternative that you would suggest? I was thinking of building a bridge psdp <-> axi, but I guess I would end up having the same performance than directly using fpga_axi version, right?

Many thanks and best regards!

Why there is no sgDMA in example_core.v

Hi Alex, I get a lot from video and your project Corundum.
I read Corundum and this repo for days and I found that many code in corundum is about sgDMA. And there is only a simple block DMA in verilog-pcie repo.
In fact, I plan to impl a sgDMA for a open source rdma, I wonder why you do not integrate sgDMA in example_core, but place it in corundum. That is very different from riffa.
Is there any cons to do so?

AU50 example fails to generate bitstream

Hello, Alex.

I have tried to build the AU50 example, but it fails to generate bitstream with the following messages.

DRC complains about "Unspecified I/O Standard" and "Unconstrained Logical Port".

Vivado version is 2020.1.

$ make
...
source generate_bit.tcl
# open_project fpga.xpr
Scanning sources...
Finished scanning sources
INFO: [IP_Flow 19-234] Refreshing IP repositories
INFO: [IP_Flow 19-1704] No user IP repositories specified
INFO: [IP_Flow 19-2313] Loaded Vivado IP repository '/tools/Xilinx/Vivado/2020.1/data/ip'.
open_project: Time (s): cpu = 00:00:04 ; elapsed = 00:00:05 . Memory (MB): peak = 2158.355 ; gain = 2.016 ; free physical = 247677 ; free virtual = 255592
# open_run impl_1
INFO: [Device 21-403] Loading part xcu50-fsvh2104-2-e
Netlist sorting complete. Time (s): cpu = 00:00:00.57 ; elapsed = 00:00:00.58 . Memory (MB): peak = 2906.109 ; gain = 0.000 ; free physical = 246452 ; free virtual = 254366
INFO: [Netlist 29-17] Analyzing 670 Unisim elements for replacement
INFO: [Netlist 29-28] Unisim Transformation completed in 0 CPU seconds
INFO: [Project 1-479] Netlist was created with Vivado 2020.1
INFO: [Project 1-570] Preparing netlist for logic optimization
INFO: [Timing 38-478] Restoring timing data from binary archive.
INFO: [Timing 38-479] Binary timing data restore complete.
INFO: [Project 1-856] Restoring constraints from binary archive.
INFO: [Project 1-853] Binary constraint restore complete.
Reading XDEF placement.
Reading placer database...
Reading XDEF routing.
Read XDEF File: Time (s): cpu = 00:00:04 ; elapsed = 00:00:05 . Memory (MB): peak = 3627.258 ; gain = 102.719 ; free physical = 245854 ; free virtual = 253769
Restored from archive | CPU: 4.690000 secs | Memory: 72.418594 MB |
Finished XDEF File Restore: Time (s): cpu = 00:00:05 ; elapsed = 00:00:05 . Memory (MB): peak = 3627.258 ; gain = 102.719 ; free physical = 245854 ; free virtual = 253769
Netlist sorting complete. Time (s): cpu = 00:00:00 ; elapsed = 00:00:00 . Memory (MB): peak = 3627.258 ; gain = 0.000 ; free physical = 245861 ; free virtual = 253776
INFO: [Project 1-111] Unisim Transformation Summary:
  A total of 152 instances were transformed.
  IBUF => IBUF (IBUFCTRL, INBUF): 1 instance
  RAM256X1D => RAM256X1D (MUXF7(x4), MUXF8(x2), RAMD64E(x8)): 54 instances
  RAM256X1S => RAM256X1S (MUXF7(x2), MUXF8, RAMS64E(x4)): 2 instances
  RAM32M16 => RAM32M16 (RAMD32(x14), RAMS32(x2)): 10 instances
  RAM32X1D => RAM32X1D (RAMD32(x2)): 1 instance
  RAM64M8 => RAM64M8 (RAMD64E(x8)): 72 instances
  RAM64X1D => RAM64X1D (RAMD64E(x2)): 12 instances

open_run: Time (s): cpu = 00:00:37 ; elapsed = 00:00:52 . Memory (MB): peak = 3627.258 ; gain = 1468.902 ; free physical = 245861 ; free virtual = 253776
# write_bitstream -force fpga.bit
Command: write_bitstream -force fpga.bit
Attempting to get a license for feature 'Implementation' and/or device 'xcu50'
INFO: [Common 17-349] Got license for feature 'Implementation' and/or device 'xcu50'
Running DRC as a precondition to command write_bitstream
INFO: [IP_Flow 19-1839] IP Catalog is up to date.
INFO: [DRC 23-27] Running DRC with 8 threads
ERROR: [DRC NSTD-1] Unspecified I/O Standard: 3 out of 71 logical ports use I/O standard (IOSTANDARD) value 'DEFAULT', instead of a user assigned specific value. This may cause I/O contention or incompatibility with the board power or connectivity affecting performance, signal integrity or in extreme cases cause damage to the device or the components to which it is connected. To correct this violation, specify all I/O standards. This design will fail to generate a bitstream unless all logical ports have a user specified I/O standard value defined. To allow bitstream creation with unspecified I/O standard values (not recommended), use this command: set_property SEVERITY {Warning} [get_drc_checks NSTD-1].  NOTE: When using the Vivado Runs infrastructure (e.g. launch_runs Tcl command), add this command to a .tcl file and add that file as a pre-hook for write_bitstream step for the implementation run. Problem ports: qsfp_led_stat_y, qsfp_led_act, and qsfp_led_stat_g.
ERROR: [DRC UCIO-1] Unconstrained Logical Port: 3 out of 71 logical ports have no user assigned specific location constraint (LOC). This may cause I/O contention or incompatibility with the board power or connectivity affecting performance, signal integrity or in extreme cases cause damage to the device or the components to which it is connected. To correct this violation, specify all pin locations. This design will fail to generate a bitstream unless all logical ports have a user specified site LOC constraint defined.  To allow bitstream creation with unspecified pin locations (not recommended), use this command: set_property SEVERITY {Warning} [get_drc_checks UCIO-1].  NOTE: When using the Vivado Runs infrastructure (e.g. launch_runs Tcl command), add this command to a .tcl file and add that file as a pre-hook for write_bitstream step for the implementation run.  Problem ports: qsfp_led_stat_y, qsfp_led_act, and qsfp_led_stat_g.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[0].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[1].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[2].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[3].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[4].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_0_int/ECC_RAM.RAMB36E2[5].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[0].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[1].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[2].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[3].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[4].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/RAM32K.bram_comp_inst/bram_16k_1_int/ECC_RAM.RAMB36E2[5].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[0].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[1].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[2].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[3].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[4].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_post_inst/bram_16k_int/ECC_RAM.RAMB36E2[5].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_repl_inst/bram_rep_int_0/ECC_RAM.RAMB36E2[0].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_repl_inst/bram_rep_int_0/ECC_RAM.RAMB36E2[1].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_repl_inst/bram_rep_int_0/ECC_RAM.RAMB36E2[2].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [DRC REQP-1858] RAMB36E2_writefirst_collision_advisory: Synchronous clocking is detected for BRAM (pcie4c_uscale_plus_inst/inst/pcie_4_0_pipe_inst/pcie_4_0_bram_inst/bram_repl_inst/bram_rep_int_0/ECC_RAM.RAMB36E2[3].ramb36e2_inst) in SDP mode with WRITE_FIRST write-mode. It is strongly suggested to change this mode to NO_CHANGE for best power characteristics. However, both WRITE_FIRST and NO_CHANGE may exhibit address collisions if the same address appears on both read and write ports resulting in unknown or corrupted read data. It is suggested to confirm via simulation that an address collision never occurs and if so it is suggested to try and avoid this situation. If address collisions cannot be avoided, the write-mode may be set to READ_FIRST which guarantees that the read data is the prior contents of the memory at the cost of additional power in the design. See the FPGA Memory Resources User Guide for additional information.
INFO: [Vivado 12-3199] DRC finished with 2 Errors, 22 Advisories
INFO: [Vivado 12-3200] Please refer to the DRC report (report_drc) for more information.
ERROR: [Vivado 12-1345] Error(s) found during DRC. Bitgen not run.
INFO: [Common 17-83] Releasing license: Implementation
28 Infos, 0 Warnings, 0 Critical Warnings and 3 Errors encountered.
write_bitstream failed
write_bitstream: Time (s): cpu = 00:00:24 ; elapsed = 00:00:11 . Memory (MB): peak = 4017.777 ; gain = 390.520 ; free physical = 245743 ; free virtual = 253658
ERROR: [Common 17-39] 'write_bitstream' failed due to earlier errors.

    while executing
"write_bitstream -force fpga.bit"
    (file "generate_bit.tcl" line 3)
INFO: [Common 17-206] Exiting Vivado at Fri Jul 24 14:14:00 2020...
../common/vivado.mk:110: recipe for target 'fpga.bit' failed
make[1]: *** [fpga.bit] Error 1
make[1]: Leaving directory '/home/tkheo/verilog-pcie/example/AU50/fpga_axi/fpga'
Makefile:14: recipe for target 'fpga' failed
make: *** [fpga] Error 2

Whether PCIe Gen4/Gen5 is supported?

Hi,thanks for this great achievement,i see there is an reference design for intel DE10_Agilex,and i had restored it with Quartus,but its PCIe rate is still Gen3x16,so i wonder if this design supports PCIe Gen4/Gen5 or not?

What name of the Top module name of all the verilogs in rtl folder?

I am trying to synthesis all the Verilog file in the rtl folder, But it has many top modules, that's why I am facing some errors saying " Multiple designs are available. Specify the design you want to use". Can you provide me the top level module name or i am missing the top level verilog? my rtl list are given below "
arbiter.v \
axis_arb_mux.v \
dma_client_axis_sink.v \
dma_client_axis_source.v \
dma_if_axi.v \
dma_if_axi_rd.v \
dma_if_axi_wr.v \
dma_if_desc_mux.v \
dma_if_mux.v \
dma_if_mux_rd.v \
dma_if_mux_wr.v \
dma_if_pcie.v \
dma_if_pcie_rd.v \
dma_if_pcie_wr.v \
dma_if_pcie_us.v \
dma_if_pcie_us_rd.v \
dma_if_pcie_us_wr.v \
dma_psdpram.v \
dma_psdpram_async.v \
dma_ram_demux.v \
dma_ram_demux_rd.v \
dma_ram_demux_wr.v \
pcie_axi_dma_desc_mux.v \
pcie_axi_master.v \
pcie_axi_master_rd.v \
pcie_axi_master_wr.v \
pcie_axil_master.v \
pcie_axil_master_minimal.v \
pcie_msix.v \
pcie_ptile_cfg.v \
pcie_ptile_if.v \
pcie_ptile_if_rx.v \
pcie_ptile_if_tx.v \
pcie_s10_cfg.v \
pcie_s10_if.v \
pcie_s10_if_rx.v \
pcie_s10_if_tx.v \
pcie_s10_msi.v \
pcie_tlp_demux.v \
pcie_tlp_demux_bar.v \
pcie_tlp_fifo.v \
pcie_tlp_fifo_raw.v \
pcie_tlp_fifo_mux.v \
pcie_tlp_mux.v \
pcie_us_axi_dma.v \
pcie_us_axi_dma_rd.v \
pcie_us_axi_dma_wr.v \
pcie_us_axi_master.v \
pcie_us_axi_master_rd.v \
pcie_us_axi_master_wr.v \
pcie_us_axil_master.v \
pcie_us_axis_cq_demux.v \
pcie_us_axis_rc_demux.v \
pcie_us_cfg.v \
pcie_us_if.v \
pcie_us_if_cc.v \
pcie_us_if_cq.v \
pcie_us_if_rc.v \
pcie_us_if_rq.v \
pcie_us_msi.v \
priority_encoder.v \
pulse_merge.v"

About pcie_us_if

Hello!
I just start learning your code, and there is one question that confuses me. Is the "pcie_us_if" module corresponding to the PCIe integrated blocks in UltraScale+? Is it referring to the left part in the diagram below? However, why is its axis direction different from what is mentioned in the product guide?
pcie

Example host code on top of the kernel module?

Hi Alex,

After I've managed to compile the kernel module and loaded it using 'insmod', is there a receommended host c code that can make use of the kernel (example.ko) and perform some simple read and write test?

Thank you for your time!

Send immediate data to host?

I wonder what is the best way to send a small amount of immediate data (say 4 bytes data not in the ram) to the host?
Would it be good to add s_axis_write_desc_imm_data/s_axis_write_desc_imm_data_enable to dma_if_pcie_wr so it can bypass ram read?

Using both DMA and separate AXI Slave as PCIe requester?

I've currently got a design set up using the pcie_us_axi_dma as the sole user of the PCIe requester interface. This works just as I'd expect and I can DMA between the device and the host, but the design has changed and now calls for the ability to have the device access the host's memory using individual AXI transactions from a separate AXI slave module.

I realize that having a DMA lets me basically do the same thing as having the device able to access the CPU memory directly, but behavior of parts of the system external to the design are forcing my hand a bit, here. The device must be able to deal with AXI transactions generated by the design that potentially cross the PCIe interface and end up in the host's address space.

Is there any way using the library as it is now to split the RQ and RC interfaces and have them shared by two separate users? Are there any plans to add a drop-in AXI slave module that would let the design treat the entire PCIe host address space as an AXI bus?

cocotb installaion commands

Below commands work in my system.
Need Ubuntu 18.04

apt-get install python3-pip
pip3 install cocotb
pip3 install cocotb_test
pip3 install cocotb_bus
pip3 install cocotbext-pcie
pip3 install cocotbext-axi

add-apt-repository ppa:team-electronics/ppa
apt-get update
apt-get install iverilog

Doc?

Hi, this repo looks potentially very interesting and comprehensive. Strong preference for a bit more doc on how to use it. (or a youtube video, if that is easier?).

64bit write/read instead of 32bit

Hello Alex,

I want to write 64bit at a one time instead of 32bit write.
Updated design as well as driver for 64bit using bar2.
Can you please tell me what change i need to check for 64bit write/read.
For now i'm using writel instead of iowrite32.

bug in dma_if_pcie_rd when max read request size is set to 4096 bytes

Hi Alex,
please consider this situation:

  1. max read request size is set to 4096 bytes.
  2. request a dma read from any address that is 4K aligned (say 0x1000), dma length is 4096 bytes.

Normally only one Mrd TLP will be sent, but dma_if_pcie_rd generates two TLPs, one is a Mrd from address 0x1000 and another is a Mrd from address 0x2000, both TLP's dword count is 1024 (4096 bytes). The later Mrd will corrupt the data.

I think the cause is in dma_if_pcie_rd.v, line 675:

req_last_tlp = (((req_pcie_addr_reg & 12'hfff) + (req_op_count_reg & 12'hfff)) & 12'hfff) == 0 && req_op_count_reg >> 12 == 0;

In above situation,"(((req_pcie_addr_reg & 12'hfff) + (req_op_count_reg & 12'hfff)) & 12'hfff) == 0" is true, req_op_count_reg is 4096, hence “req_op_count_reg >> 12” equals 1,"req_op_count_reg >> 12 == 0" is false, req_last_tlp is not assigned to 1, an unexpected tlp will be generated.

Is it OK to delete "&& req_op_count_reg >> 12 == 0" in line 675?
req_op_count_reg must be not greater than max read request size to reach line 675 (see line 663) and the maximum value of max read request size is 4096. "req_op_count_reg >> 12 == 0" is false only when req_op_count_reg equals 4096.
In the case that req_op_count_reg equals 4096, there is a chance that it is the last TLP (above situation), and "(((req_pcie_addr_reg & 12'hfff) + (req_op_count_reg & 12'hfff)) & 12'hfff) == 0" will determine wheter it is the last TLP.
In other cases that req_op_count_reg is less than 4096, "req_op_count_reg >> 12 == 0" is always false, so there is no need to evaluate it.

Please tell me if I was wrong.

dma_read_desc_status_valid not asserted when requesting memory read length > 8

My tb is based on verilog-pcie and S10PcieDevice, with max_payload_size=0x1 and max_read_request_size =0x2:

    self.rc = RootComplex()
    self.rc.max_payload_size = 0x1  # 256 bytes
    self.rc.max_read_request_size = 0x2  # 512 bytes

    self.dev = S10PcieDevice()

I found that when I request a dma memory read of length 8 (m_axis_dma_read_desc_len=0x20), it is working fine and the output port dma_read_desc_status_valid from dma_if_rd is toggling.

But when I increase m_axis_dma_read_desc_len to 0x40, dma_read_desc_status_valid is stuck at 0.

Here is the log for the two cases. Please comment. Thanks.

=================== memory read length == 8, dma_read_desc_status_valid  asserted =====================
#   3760.00ns INFO     RX frame: S10PcieFrame(data=[0x00000008, 0x010000ff, 0x00000000], parity=[0x0, 0x0, 0x0], func_num=0, vf_num=None, bar_range=0, err=0)
#   3772.54ns INFO     Memory read, address 0x00000000, length 8, BE 0xf/0xf, tag 0
#   3789.14ns INFO     TX frame: S10PcieFrame(data=[0x4a000008, 0x00000020, 0x01000000, 0x00080000, 0x00000000, 0x00000800, 0x00000000, 0x00080800, 0x00000000, 0x00000800, 0x00000000], parity=[0x6, 0xe, 0x7, 0xb, 0xf, 0xd, 0xf, 0x9, 0xf, 0xd, 0xf], func_num=0, vf_num=None, bar_range=0, err=0)
#   3864.00ns INFO     RX frame: S10PcieFrame(data=[0x00000009, 0x0100013c, 0x00000000], parity=[0x0, 0x0, 0x0], func_num=0, vf_num=None, bar_range=0, err=0)
#   3876.54ns INFO     Memory read, address 0x00000000, length 9, BE 0xc/0x3, tag 1
#   3893.65ns INFO     TX frame: S10PcieFrame(data=[0x4a000009, 0x00000020, 0x01000102, 0x00080000, 0x00000000, 0x00000800, 0x00000000, 0x00080800, 0x00000000, 0x00000800, 0x00000000, 0x00081000], parity=[0x7, 0xe, 0x4, 0xb, 0xf, 0xd, 0xf, 0x9, 0xf, 0xd, 0xf, 0x9], func_num=0, vf_num=None, bar_range=0, err=0)
#   3972.00ns INFO     RX frame: S10PcieFrame(data=[0x00000008, 0x010002ff, 0x00000004], parity=[0x0, 0x0, 0x0], func_num=0, vf_num=None, bar_range=0, err=0)
#   3984.54ns INFO     Memory read, address 0x00000004, length 8, BE 0xf/0xf, tag 2
#   4001.14ns INFO     TX frame: S10PcieFrame(data=[0x4a000008, 0x00000020, 0x01000204, 0x00000000, 0x00000800, 0x00000000, 0x00080800, 0x00000000, 0x00000800, 0x00000000, 0x00081000], parity=[0x6, 0xe, 0x4, 0xf, 0xd, 0xf, 0x9, 0xf, 0xd, 0xf, 0x9], func_num=0, vf_num=None, bar_range=0, err=0)
#   4076.00ns INFO     RX frame: S10PcieFrame(data=[0x00000009, 0x0100033c, 0x00000004], parity=[0x0, 0x0, 0x0], func_num=0, vf_num=None, bar_range=0, err=0)
#   4088.54ns INFO     Memory read, address 0x00000004, length 9, BE 0xc/0x3, tag 3


=================== memory read length == 16, dma_read_desc_status_valid  not asserted =====================
#   3760.00ns INFO     RX frame: S10PcieFrame(data=[0x00000010, 0x010000ff, 0x00000000], parity=[0x0, 0x0, 0x0], func_num=0, vf_num=None, bar_range=0, err=0)
#   3772.54ns INFO     Memory read, address 0x00000000, length 16, BE 0xf/0xf, tag 0
#   3793.20ns INFO     TX frame: S10PcieFrame(data=[0x4a000010, 0x00000040, 0x01000000, 0x00080000, 0x00000000, 0x00000800, 0x00000000, 0x00080800, 0x00000000, 0x00000800, 0x00000000, 0x00081000, 0x00000000, 0x00000800, 0x00000000, 0x00081800, 0x00000000, 0x00000800, 0x00000000], parity=[0x6, 0xe, 0x7, 0xb, 0xf, 0xd, 0xf, 0x9, 0xf, 0xd, 0xf, 0x9, 0xf, 0xd, 0xf, 0xb, 0xf, 0xd, 0xf], func_num=0, vf_num=None, bar_range=0, err=0)

Request for U200 Support

I noticed that you just recently added U280 support and U50 support to your project. I wanted to know if you were in the process of adding U200 support or what we could do to migrate either your U50 or U280 example code to our U200 board. I see that your pcie4 tcl script for the U280 example has some parameters that are too high bit-wise for the U200 pcie4. Is it just as simple as modifying this tcl script and keeping everything else the same?

Any help is appreciated, thanks!

vivado version

i tried to compile vcu1525 example design using makefile using vivado 2019.2, but it fails to generate bitstream, which vivado version should i use to make project?

here is the error message:-
[DRC NSTD-1] Unspecified I/O Standard: 28 out of 42 logical ports use I/O standard (IOSTANDARD) value 'DEFAULT', instead of a user assigned specific value. This may cause I/O contention or incompatibility with the board power or connectivity affecting performance, signal integrity or in extreme cases cause damage to the device or the components to which it is connected. To correct this violation, specify all I/O standards. This design will fail to generate a bitstream unless all logical ports have a user specified I/O standard value defined. To allow bitstream creation with unspecified I/O standard values (not recommended), use this command: set_property SEVERITY {Warning} [get_drc_checks NSTD-1]. NOTE: When using the Vivado Runs infrastructure (e.g. launch_runs Tcl command), add this command to a .tcl file and add that file as a pre-hook for write_bitstream step for the implementation run. Problem ports: pcie_tx_n[15], pcie_tx_n[14], pcie_tx_n[13], pcie_tx_n[12], pcie_tx_n[11], pcie_tx_n[10], pcie_tx_n[9], pcie_tx_n[8], pcie_tx_n[7], pcie_tx_n[6], pcie_tx_n[5], pcie_tx_n[4], pcie_tx_n[3], pcie_tx_n[2], pcie_tx_p[15]... and (the first 15 of 28 listed).

thanks

unexpected dma read request logged

From the log I see one request:

#   5618.12ns INFO     Memory read, address 0x00001000, length 28, BE 0xf/0xf, tag 30

However, I logged each request on dma_if_pcie_rd like this and found no such request at all.

always @(posedge clk) begin
    if (s_axis_read_desc_valid && s_axis_read_desc_ready)
        $display("[%0t] dma_if_pcie_rd: read dma_addr %0h len %0h", $stime,
            s_axis_read_desc_pcie_addr,
            s_axis_read_desc_len
        );
end

No address 0x00001000 or length 28 logged.
What could go wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.