Giter VIP home page Giter VIP logo

Comments (9)

cliffordwolf avatar cliffordwolf commented on July 18, 2024

Yes. Imo the lack of support for BRAM initialization is one of the biggest showstoppers atm for FPGA flows. I am aiming to have BRAM initialization support in Yosys 0.6, i.e. some time within the next few months, probably sooner if you follow the git head.

from yosys.

cliffordwolf avatar cliffordwolf commented on July 18, 2024

Just for clarification: The $readmemb is supported in Yosys. It will just prohibit bram instantiation and thus the memory will be implemented with logic gates and flip-flops.

Do you have a problem in read_verilog with $readmemb (syntax error, etc)?

from yosys.

fabrizioferrandi avatar fabrizioferrandi commented on July 18, 2024

No, I do not have problems with $readmemb. Anyway, I've got the impression that it is better to pass -nomem2reg to read_verilog. This seems to avoid some problems with multi-port inference. Is my feeling supported by some facts?

For the sake of clarity the true script I've used is a tcl script passed to vivado 2014.4 where yosys has been used as RTL synthesizer. Here it is the script:

set outputDir HLS_output//Synthesis/vivado_flow
file mkdir $outputDir
create_project main_minimal_interface -force
chan configure stdout -buffering none
exec >&@stdout yosys -p "read_verilog -nomem2reg -defer top.v" -p "synth_xilinx -edif HLS_output//Synthesis/vivado_flow/main_minimal_interface.edif -top main_minimal_interface"
read_xdc HLS_output//Synthesis/vivado_flow/main_minimal_interface.sdc
read_edif $outputDir/main_minimal_interface.edif
link_design -top main_minimal_interface -part xc7z020clg484-1
write_checkpoint -force $outputDir/post_synth.dcp
report_timing_summary -file $outputDir/post_synth_timing_summary.rpt
report_utilization -file $outputDir/post_synth_util.rpt
opt_design
report_utilization -file $outputDir/post_opt_design_util.rpt
place_design -directive Explore
report_clock_utilization -file $outputDir/clock_util.rpt
# Optionally run optimization if there are timing violations after placement
if {[get_property SLACK [get_timing_paths -max_paths 1 -nworst 1 -setup]] < 0} {
  puts "Found setup timing violations => running physical optimization"
  phys_opt_design
}
write_checkpoint -force $outputDir/post_place.dcp
report_utilization -file $outputDir/post_place_util.rpt
report_timing_summary -file $outputDir/post_place_timing_summary.rpt
route_design -directive Explore
write_checkpoint -force $outputDir/post_route.dcp
report_route_status -file $outputDir/post_route_status.rpt
report_timing_summary -file $outputDir/post_route_timing_summary.rpt
report_power -file $outputDir/post_route_power.rpt
report_drc -file $outputDir/post_imp_drc.rpt
report_utilization -file $outputDir/post_route_util.rpt
close_design
close_project

from yosys.

cliffordwolf avatar cliffordwolf commented on July 18, 2024

Anyway, I've got the impression that it is better to pass -nomem2reg to read_verilog. This seems to avoid some problems with multi-port inference. Is my feeling supported by some facts?

The mem2reg mechanism detects cases that cannot be implemented using the current yosys memory model and transforms those memories to registers directly in the verilog front-end without going through the memory representation. If you call the front-end with -nomem2reg (or set the nomem2reg attribute on a module or array), then you disable this mechanism and force usage of the memory representation. This can (and usually will) result in simulation synthesis mismatches.

From the README:

- The "nomem2reg" attribute on modules or arrays prohibits the
  automatic early conversion of arrays to separate registers. This
  is potentially dangerous. Usually the front-end has good reasons
  for converting an array to a list of registers. Prohibiting this
  step will likely result in incorrect synthesis results.

For example:

module test (input clk, a, output reg b);
  reg mem [3:0];
  always @(posedge clk) begin
    mem[0] = a;
    b = mem[0];
  end
endmodule

This module will introduce one clock cycle delay from a to b when synthesized without -nomem2reg (which is correct) and two clock cycles delay when synthesized with -nomem2reg (which is incorrect). You should only use -nomem2reg when you are sure that the mem2reg mechanism did incorrectly determine that the conversion to registers was necessary.

If you have a case where mem2reg triggers incorrectly (i.e. -nomem2reg produces correct (and hopefully also better) results because the memory can in fact be described with the current yosys memory model), then I'd be interested in a test case.

from yosys.

fabrizioferrandi avatar fabrizioferrandi commented on July 18, 2024

I've a testcase where a module inferring a BRAM with initialization is not properly synthesized in flip-flops and gates.
Vivado after reading the netlist during the opt_design command (e.g., Phase 3 Sweep) stops the synthesis since it founds Multi-driven nets. The same verilog is correctly synthesized by vivado with synth_design command

Here in the following the verilog code:

// File automatically generated by: PandA framework version=0.9.3-dev
// Send any bug to: [email protected]
// ************************************************************************
// The following text holds for all the components tagged with PANDA_GPLv3.
// 
// This hardware description is free; you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation; either version 3, or (at your option)
// any later version.
// 
// This hardware description is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
// or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
// for more details.
// 
// You should have received a copy of the GNU General Public License
// along with the PandA framework; see the files COPYING
// If not, see <http://www.gnu.org/licenses/>.
// ************************************************************************

`ifdef __ICARUS__
  `define _SIM_HAVE_CLOG2
`endif
`ifdef VERILATOR
  `define _SIM_HAVE_CLOG2
`endif
`ifdef MODEL_TECH
  `define _SIM_HAVE_CLOG2
`endif
`ifdef VCS
  `define _SIM_HAVE_CLOG2
`endif
`ifdef NCVERILOG
  `define _SIM_HAVE_CLOG2
`endif
`ifdef XILINX_SIMULATOR
  `define _SIM_HAVE_CLOG2
`endif
`ifdef XILINX_ISIM
  `define _SIM_HAVE_CLOG2
`endif

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2004-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module ADDRESS_DECODING_LOGIC(clock, reset, in1, in2, in3, sel_LOAD, sel_STORE, S_oe_ram, S_we_ram, S_addr_ram, S_Wdata_ram, Sin_Rdata_ram, S_data_ram_size, Sin_DataRdy, out1, Sout_Rdata_ram, Sout_DataRdy, proxy_in1, proxy_in2, proxy_in3, proxy_sel_LOAD, proxy_sel_STORE, proxy_out1, dout_a, dout_b, memory_addr_a, memory_addr_b, din_value_aggregated, be, bram_write);
  parameter BITSIZE_in1=1, BITSIZE_in2=1, BITSIZE_in3=1, BITSIZE_out1=1, BITSIZE_S_addr_ram=1, BITSIZE_S_Wdata_ram=8, BITSIZE_Sin_Rdata_ram=8, BITSIZE_Sout_Rdata_ram=8, BITSIZE_S_data_ram_size=1, address_space_begin=0, address_space_rangesize=4, BUS_PIPELINED=1, BRAM_BITSIZE=32, PRIVATE_MEMORY=0, USE_SPARSE_MEMORY=1, HIGH_LATENCY=0, BITSIZE_proxy_in1=1, BITSIZE_proxy_in2=1, BITSIZE_proxy_in3=1, BITSIZE_proxy_out1=1, BITSIZE_dout_a=1, BITSIZE_dout_b=1, BITSIZE_memory_addr_a=1, BITSIZE_memory_addr_b=1, BITSIZE_din_value_aggregated=1, BITSIZE_be=1, nbit_read_addr=32, n_byte_on_databus=4, n_mem_elements=4, n_bytes=4;
  // IN
  input clock;
  input reset;
  input [BITSIZE_in1-1:0] in1;
  input [BITSIZE_in2-1:0] in2;
  input [BITSIZE_in3-1:0] in3;
  input sel_LOAD;
  input sel_STORE;
  input S_oe_ram;
  input S_we_ram;
  input [BITSIZE_S_addr_ram-1:0] S_addr_ram;
  input [BITSIZE_S_Wdata_ram-1:0] S_Wdata_ram;
  input [BITSIZE_Sin_Rdata_ram-1:0] Sin_Rdata_ram;
  input [BITSIZE_S_data_ram_size-1:0] S_data_ram_size;
  input Sin_DataRdy;
  input [BITSIZE_proxy_in1-1:0] proxy_in1;
  input [BITSIZE_proxy_in2-1:0] proxy_in2;
  input [BITSIZE_proxy_in3-1:0] proxy_in3;
  input proxy_sel_LOAD;
  input proxy_sel_STORE;
  input [BITSIZE_dout_a-1:0] dout_a;
  input [BITSIZE_dout_b-1:0] dout_b;
  // OUT
  output [BITSIZE_out1-1:0] out1;
  output [BITSIZE_Sout_Rdata_ram-1:0] Sout_Rdata_ram;
  output Sout_DataRdy;
  output [BITSIZE_proxy_out1-1:0] proxy_out1;
  output [BITSIZE_memory_addr_a-1:0] memory_addr_a;
  output [BITSIZE_memory_addr_b-1:0] memory_addr_b;
  output [BITSIZE_din_value_aggregated-1:0] din_value_aggregated;
  output [BITSIZE_be-1:0] be;
  output bram_write;
  `ifndef _SIM_HAVE_CLOG2
    function integer log2;
       input integer value;
       integer temp_value;
      begin
        temp_value = value-1;
        for (log2=0; temp_value>0; log2=log2+1)
          temp_value = temp_value>>1;
      end
    endfunction
  `endif
  `ifdef _SIM_HAVE_CLOG2
    parameter nbit_addr = BITSIZE_S_addr_ram/*n_bytes ==  1 ? 1 : $clog2(n_bytes)*/;
    parameter nbits_address_space_rangesize = $clog2(address_space_rangesize);
    parameter nbits_byte_offset = n_byte_on_databus==1 ? 1 : $clog2(n_byte_on_databus);
  `else
    parameter nbit_addr = BITSIZE_S_addr_ram/*n_bytes ==  1 ? 1 : log2(n_bytes)*/;
    parameter nbits_address_space_rangesize = log2(address_space_rangesize);
    parameter nbits_byte_offset = n_byte_on_databus==1 ? 1 : log2(n_byte_on_databus);
  `endif


  function [n_byte_on_databus*2-1:0] CONV;
    input [n_byte_on_databus*2-1:0] po2;
  begin
    case (po2)
      1:CONV=(1<<1)-1;
      2:CONV=(1<<2)-1;
      4:CONV=(1<<4)-1;
      8:CONV=(1<<8)-1;
      16:CONV=(1<<16)-1;
      32:CONV=(1<<32)-1;
      default:CONV=-1;
    endcase
  end
  endfunction
  wire [2*BRAM_BITSIZE-1:0] dout;
  wire [2*BRAM_BITSIZE-1:0] out1_shifted;
  wire [2*BRAM_BITSIZE-1:0] S_Wdata_ram_int;
  wire cs, oe_ram_cs, we_ram_cs;
  wire [n_byte_on_databus*2-1:0] conv_in;
  wire [n_byte_on_databus*2-1:0] conv_out;
  wire [nbits_byte_offset-1:0] byte_offset;
  wire [BITSIZE_in2-1:0] tmp_addr;
  wire [nbit_addr-1:0] relative_addr;

  reg we_ram_cs_delayed;
  reg oe_ram_cs_delayed;
  reg oe_ram_cs_delayed_registered;
  reg [nbits_byte_offset-1:0] delayed_byte_offset;
  reg [nbits_byte_offset-1:0] delayed_byte_offset_registered;

  assign tmp_addr = (proxy_sel_LOAD||proxy_sel_STORE) ? proxy_in2 : in2;

  generate
  genvar j0_a;
    for (j0_a=0; j0_a<n_byte_on_databus; j0_a=j0_a+1)
    begin  : dout_a_computation
      assign dout[(j0_a+1)*8-1:j0_a*8] = dout_a[(j0_a+1)*8-1:j0_a*8];
    end
  endgenerate

  generate
  genvar j0_b;
    for (j0_b=0; j0_b<n_byte_on_databus; j0_b=j0_b+1)
    begin  : dout_b_computation
      assign dout[(j0_b+n_byte_on_databus+1)*8-1:(j0_b+n_byte_on_databus)*8] = dout_b[(j0_b+1)*8-1:j0_b*8];
    end
  endgenerate

  generate
    if(PRIVATE_MEMORY==0 && USE_SPARSE_MEMORY==0)
      assign cs = (S_addr_ram >= (address_space_begin)) && (S_addr_ram < (address_space_begin+address_space_rangesize));
    else if(PRIVATE_MEMORY==0)
      assign cs = S_addr_ram[nbit_addr-1:nbits_address_space_rangesize] == address_space_begin[nbit_addr-1:nbits_address_space_rangesize];
    else
      assign cs = 1'b0;
  endgenerate
  assign oe_ram_cs = S_oe_ram && cs;
  assign we_ram_cs = S_we_ram && cs;
  generate
    if(PRIVATE_MEMORY==0 && USE_SPARSE_MEMORY==0)
      assign relative_addr = (sel_STORE===1'b1 || sel_LOAD===1'b1 || proxy_sel_STORE===1'b1 || proxy_sel_LOAD===1'b1) ? tmp_addr-address_space_begin[nbit_addr-1:0] : S_addr_ram-address_space_begin[nbit_addr-1:0];
    else if(PRIVATE_MEMORY==0)
      assign relative_addr = (sel_STORE===1'b1 || sel_LOAD===1'b1 || proxy_sel_STORE===1'b1 || proxy_sel_LOAD===1'b1) ? tmp_addr[nbits_address_space_rangesize-1:0] : S_addr_ram[nbits_address_space_rangesize-1:0];
    else if(USE_SPARSE_MEMORY==1)
      assign relative_addr = tmp_addr[nbits_address_space_rangesize-1:0];
    else
      assign relative_addr = tmp_addr-address_space_begin[nbit_addr-1:0];
  endgenerate

  generate
    if (n_mem_elements==1)
      assign memory_addr_a = {nbit_read_addr{1'b0}};
    else if(n_byte_on_databus==1)
      assign memory_addr_a = relative_addr[nbit_read_addr-1:0];
    else
      assign memory_addr_a = relative_addr[nbit_read_addr+nbits_byte_offset-1:nbits_byte_offset];
  endgenerate

  generate
    if (n_bytes <= BRAM_BITSIZE/8)
      assign memory_addr_b = {nbit_read_addr{1'b0}};
    else if(n_byte_on_databus==1)
      assign memory_addr_b = relative_addr[nbit_read_addr-1:0] + 1'b1;
    else
      assign memory_addr_b = relative_addr[nbit_read_addr+nbits_byte_offset-1:nbits_byte_offset] + 1'b1;
  endgenerate

  generate
    if (n_byte_on_databus==1)
      assign byte_offset = {nbits_byte_offset{1'b0}};
    else
      assign byte_offset = relative_addr[nbits_byte_offset-1:0];
  endgenerate

  generate
    if(PRIVATE_MEMORY==0)
    begin
      assign conv_in = proxy_sel_STORE ? proxy_in3[BITSIZE_proxy_in3-1:3] : (sel_STORE ? in3[BITSIZE_in3-1:3] : S_data_ram_size[BITSIZE_S_data_ram_size-1:3]);
      assign conv_out = CONV(conv_in);
      assign be = conv_out << byte_offset;
    end
    else
    begin
      assign conv_in = proxy_sel_STORE ? proxy_in3[BITSIZE_proxy_in3-1:3] : in3[BITSIZE_in3-1:3];
      assign conv_out = CONV(conv_in);
      assign be = conv_out << byte_offset;
    end
  endgenerate

  generate
    if (BITSIZE_S_Wdata_ram < 2*BRAM_BITSIZE)
      assign S_Wdata_ram_int = {{2*BRAM_BITSIZE-BITSIZE_S_Wdata_ram{1'b0}}, S_Wdata_ram};
    else
      assign S_Wdata_ram_int = S_Wdata_ram[2*BRAM_BITSIZE-1:0];
  endgenerate

  generate
    if(PRIVATE_MEMORY==0)
      assign din_value_aggregated = proxy_sel_STORE ? proxy_in1 << byte_offset*8 : (sel_STORE ? in1 << byte_offset*8 : S_Wdata_ram_int << byte_offset*8);
    else
      assign din_value_aggregated = proxy_sel_STORE ? proxy_in1 << byte_offset*8 : in1 << byte_offset*8;
  endgenerate

  assign out1_shifted = dout >> delayed_byte_offset*8;
  assign out1 = out1_shifted[BITSIZE_out1-1:0];
  assign proxy_out1 = out1_shifted[BITSIZE_proxy_out1-1:0];

  always @(posedge clock or negedge reset)
  begin
    if(!reset)
    begin
      oe_ram_cs_delayed <= 1'b0;
      if(HIGH_LATENCY != 0) oe_ram_cs_delayed_registered <= 1'b0;
    end
    else
    begin
      if(HIGH_LATENCY == 0)
      begin
        oe_ram_cs_delayed <= oe_ram_cs & (!oe_ram_cs_delayed | BUS_PIPELINED);
      end
      else
      begin
        oe_ram_cs_delayed_registered <= oe_ram_cs & ((!oe_ram_cs_delayed_registered & !oe_ram_cs_delayed) | BUS_PIPELINED);
        oe_ram_cs_delayed <= oe_ram_cs_delayed_registered;
      end
    end
  end

  always @(posedge clock)
  begin
    if(HIGH_LATENCY == 0)
      delayed_byte_offset <= byte_offset;
    else
    begin
      delayed_byte_offset_registered <= byte_offset;
      delayed_byte_offset <= delayed_byte_offset_registered;
    end
  end

  always @(posedge clock or negedge reset)
  begin
    if(!reset)
      we_ram_cs_delayed <= 1'b0;
    else
      we_ram_cs_delayed <= we_ram_cs & !we_ram_cs_delayed;
  end

  generate
    if(PRIVATE_MEMORY==1)
      assign Sout_Rdata_ram =Sin_Rdata_ram;
    else if (BITSIZE_Sout_Rdata_ram <= 2*BRAM_BITSIZE)
      assign Sout_Rdata_ram = oe_ram_cs_delayed ? out1_shifted[BITSIZE_Sout_Rdata_ram-1:0] : Sin_Rdata_ram;
    else
      assign Sout_Rdata_ram = oe_ram_cs_delayed ? {{BITSIZE_Sout_Rdata_ram-2*BRAM_BITSIZE{1'b0}}, out1_shifted} : Sin_Rdata_ram;
  endgenerate

  generate
    if(PRIVATE_MEMORY==1)
      assign Sout_DataRdy = Sin_DataRdy;
    else
      assign Sout_DataRdy = oe_ram_cs_delayed | Sin_DataRdy | we_ram_cs_delayed;
  endgenerate

  assign bram_write = sel_STORE || proxy_sel_STORE || we_ram_cs;

  // Add assertion here
  // psl default clock = (posedge clock);
  // psl ERROR_S_data_ram_size: assert never {S_data_ram_size>2*BRAM_BITSIZE && (we_ram_cs || oe_ram_cs)};
  // psl ERROR_memory_addr: assert never {memory_addr_a>=n_mem_elements && (we_ram_cs || oe_ram_cs || sel_STORE || sel_LOAD || proxy_sel_STORE || proxy_sel_LOAD)};
  // psl ERROR_relative_addr: assert never {relative_addr+(S_data_ram_size/8) >n_bytes && (we_ram_cs || oe_ram_cs)};
  // psl ERROR_unaligned_access: assert never {byte_offset+S_data_ram_size[BITSIZE_S_data_ram_size-1:3] > BRAM_BITSIZE/4 && (we_ram_cs || oe_ram_cs)};
  // psl ERROR_oe_ram_cs_we_ram_cs: assert never {(we_ram_cs & oe_ram_cs) != 0};
  // psl ERROR_LOAD_S_oe_ram: assert never {sel_LOAD && oe_ram_cs};
  // psl ERROR_proxy_LOAD_S_oe_ram: assert never {proxy_sel_LOAD && oe_ram_cs};
  // psl ERROR_STORE_S_we_ram: assert never {sel_STORE && we_ram_cs};
  // psl ERROR_proxy_STORE_S_we_ram: assert never {proxy_sel_STORE && we_ram_cs};
  // psl ERROR_LOAD_we_ram_cs: assert never {sel_LOAD && we_ram_cs};
  // psl ERROR_proxy_LOAD_we_ram_cs: assert never {proxy_sel_LOAD && we_ram_cs};
  // psl ERROR_STORE_oe_ram_cs: assert never {sel_STORE && oe_ram_cs};
  // psl ERROR_proxy_STORE_oe_ram_cs: assert never {proxy_sel_STORE && oe_ram_cs};
  // psl ERROR_Sin_DataRdy_oe_ram_cs_delayed: assert never {Sin_DataRdy && oe_ram_cs_delayed};
  // psl ERROR_in3_size: assert never {in3>2*BRAM_BITSIZE && (sel_STORE || sel_LOAD)};
  // psl ERROR_proxy_in3_size: assert never {proxy_in3>2*BRAM_BITSIZE && (proxy_sel_STORE || proxy_sel_LOAD)};
  // psl ERROR_requested_size: assert never {BITSIZE_out1<in3 && (sel_LOAD)};
  // psl ERROR_proxy_requested_size: assert never {BITSIZE_proxy_out1<proxy_in3 && (proxy_sel_LOAD)};
  // psl ERROR_STORE_LOAD: assert never {sel_STORE && sel_LOAD};
  // psl ERROR_proxy_STORE_proxy_LOAD: assert never {proxy_sel_STORE && proxy_sel_LOAD};
endmodule

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2004-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module BRAM_MEMORY_SP(clock, bram_write, memory_addr_a, memory_addr_b, din_value_aggregated, be, dout_a, dout_b);
  parameter BITSIZE_dout_a=1, BITSIZE_dout_b=1, BITSIZE_memory_addr_a=1, BITSIZE_memory_addr_b=1, BITSIZE_din_value_aggregated=1, BITSIZE_be=1, INIT_file="array.data", BRAM_BITSIZE=32, nbit_read_addr=32, n_byte_on_databus=4, n_mem_elements=4, n_bytes=4, HIGH_LATENCY=0;
  // IN
  input clock;
  input bram_write;
  input [BITSIZE_memory_addr_a-1:0] memory_addr_a;
  input [BITSIZE_memory_addr_b-1:0] memory_addr_b;
  input [BITSIZE_din_value_aggregated-1:0] din_value_aggregated;
  input [BITSIZE_be-1:0] be;
  // OUT
  output [BITSIZE_dout_a-1:0] dout_a;
  output [BITSIZE_dout_b-1:0] dout_b;
  wire [n_byte_on_databus-1:0] we_a;
  wire [n_byte_on_databus-1:0] we_b;

  reg [BITSIZE_dout_a-1:0] dout_a;
  reg [BITSIZE_dout_a-1:0] dout_a_registered;
  reg [BITSIZE_dout_b-1:0] dout_b;
  reg [BITSIZE_dout_b-1:0] dout_b_registered;
  reg [BRAM_BITSIZE-1:0] memory [0:n_mem_elements-1] /* synthesis syn_ramstyle = "no_rw_check" */;

  initial
  begin
    $readmemb(INIT_file, memory, 0, n_mem_elements-1);
  end

  always @(posedge clock)
  begin
    if(HIGH_LATENCY==0)
    begin
      dout_a <= memory[memory_addr_a];
    end
    else
    begin
      dout_a_registered <= memory[memory_addr_a];
      dout_a <= dout_a_registered;
    end

  end

  generate
  genvar i11;
    for (i11=0; i11<n_byte_on_databus; i11=i11+1)
    begin : L11_write_a
      always @(posedge clock)
      begin
        if(we_a[i11])
          memory[memory_addr_a][(i11+1)*8-1:i11*8] <= din_value_aggregated[(i11+1)*8-1:i11*8];
      end
    end
  endgenerate

  generate
    if (n_bytes > BRAM_BITSIZE/8)
    begin
      always @(posedge clock)
      begin
        if(HIGH_LATENCY==0)
        begin
          dout_b <= memory[memory_addr_b];
        end
        else
        begin
          dout_b_registered <= memory[memory_addr_b];
          dout_b <= dout_b_registered;
        end
      end
      for (i11=0; i11<n_byte_on_databus; i11=i11+1)
      begin : L11_write_b
        always @(posedge clock)
        begin
          if(we_b[i11])
            memory[memory_addr_b][(i11+1)*8-1:i11*8] <= din_value_aggregated[(i11+1+n_byte_on_databus)*8-1:(i11+n_byte_on_databus)*8];
        end
      end
    end
  endgenerate

  generate
  genvar i2_a;
    for (i2_a=0; i2_a<n_byte_on_databus; i2_a=i2_a+1)
    begin  : write_enable_a
      assign we_a[i2_a] = (bram_write) && be[i2_a];
    end
  endgenerate

  generate
  genvar i2_b;
    for (i2_b=0; i2_b<n_byte_on_databus; i2_b=i2_b+1)
    begin  : write_enable_b
      assign we_b[i2_b] = (bram_write) && be[i2_b+n_byte_on_databus];
    end
  endgenerate
endmodule

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2004-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module ARRAY_1D_STD_BRAM_SP(clock, reset, in1, in2, in3, sel_LOAD, sel_STORE, S_oe_ram, S_we_ram, S_addr_ram, S_Wdata_ram, Sin_Rdata_ram, S_data_ram_size, Sin_DataRdy, out1, Sout_Rdata_ram, Sout_DataRdy, proxy_in1, proxy_in2, proxy_in3, proxy_sel_LOAD, proxy_sel_STORE, proxy_out1);
  parameter BITSIZE_in1=1, BITSIZE_in2=1, BITSIZE_in3=1, BITSIZE_out1=1, BITSIZE_S_addr_ram=1, BITSIZE_S_Wdata_ram=8, BITSIZE_Sin_Rdata_ram=8, BITSIZE_Sout_Rdata_ram=8, BITSIZE_S_data_ram_size=1, INIT_file="array.data", n_elements=1, data_size=32, address_space_begin=0, address_space_rangesize=4, BUS_PIPELINED=1, BRAM_BITSIZE=32, PRIVATE_MEMORY=0, USE_SPARSE_MEMORY=1, HIGH_LATENCY=0, BITSIZE_proxy_in1=1, BITSIZE_proxy_in2=1, BITSIZE_proxy_in3=1, BITSIZE_proxy_out1=1;
  // IN
  input clock;
  input reset;
  input [BITSIZE_in1-1:0] in1;
  input [BITSIZE_in2-1:0] in2;
  input [BITSIZE_in3-1:0] in3;
  input sel_LOAD;
  input sel_STORE;
  input S_oe_ram;
  input S_we_ram;
  input [BITSIZE_S_addr_ram-1:0] S_addr_ram;
  input [BITSIZE_S_Wdata_ram-1:0] S_Wdata_ram;
  input [BITSIZE_Sin_Rdata_ram-1:0] Sin_Rdata_ram;
  input [BITSIZE_S_data_ram_size-1:0] S_data_ram_size;
  input Sin_DataRdy;
  input [BITSIZE_proxy_in1-1:0] proxy_in1;
  input [BITSIZE_proxy_in2-1:0] proxy_in2;
  input [BITSIZE_proxy_in3-1:0] proxy_in3;
  input proxy_sel_LOAD;
  input proxy_sel_STORE;
  // OUT
  output [BITSIZE_out1-1:0] out1;
  output [BITSIZE_Sout_Rdata_ram-1:0] Sout_Rdata_ram;
  output Sout_DataRdy;
  output [BITSIZE_proxy_out1-1:0] proxy_out1;
  `ifndef _SIM_HAVE_CLOG2
    function integer log2;
       input integer value;
       integer temp_value;
      begin
        temp_value = value-1;
        for (log2=0; temp_value>0; log2=log2+1)
          temp_value = temp_value>>1;
      end
    endfunction
  `endif
  parameter n_bytes = n_elements*(data_size/8);
  parameter n_byte_on_databus = BRAM_BITSIZE/8;
  parameter n_mem_elements = n_bytes/(n_byte_on_databus) + (n_bytes%(n_byte_on_databus) == 0 ? 0 : 1);
  `ifdef _SIM_HAVE_CLOG2
    parameter nbit_read_addr = n_mem_elements == 1 ? 1 : $clog2(n_mem_elements);
  `else
    parameter nbit_read_addr = n_mem_elements == 1 ? 1 : log2(n_mem_elements);
  `endif

  wire [nbit_read_addr-1:0] memory_addr_a;
  wire [nbit_read_addr-1:0] memory_addr_b;
  wire [n_byte_on_databus*2-1:0] be;

  wire [2*BRAM_BITSIZE-1:0] din_value_aggregated;
  wire bram_write;
  wire [BRAM_BITSIZE-1:0] dout_a;
  wire [BRAM_BITSIZE-1:0] dout_b;

  BRAM_MEMORY_SP #(.BITSIZE_memory_addr_a(nbit_read_addr), .BITSIZE_memory_addr_b(nbit_read_addr), .BITSIZE_din_value_aggregated(2*BRAM_BITSIZE), .BITSIZE_be(n_byte_on_databus*2), .BITSIZE_dout_a(BRAM_BITSIZE), .BITSIZE_dout_b(BRAM_BITSIZE), .INIT_file(INIT_file), .BRAM_BITSIZE(BRAM_BITSIZE), .nbit_read_addr(nbit_read_addr), .n_byte_on_databus(n_byte_on_databus), .n_mem_elements(n_mem_elements), .n_bytes(n_bytes), .HIGH_LATENCY(HIGH_LATENCY)) BRAM_MEMORY_instance (.clock(clock), .bram_write(bram_write), .memory_addr_a(memory_addr_a), .memory_addr_b(memory_addr_b), .din_value_aggregated(din_value_aggregated), .be(be), .dout_a(dout_a), .dout_b(dout_b));

  ADDRESS_DECODING_LOGIC #(.BITSIZE_memory_addr_a(nbit_read_addr), .BITSIZE_memory_addr_b(nbit_read_addr), .BITSIZE_din_value_aggregated(2*BRAM_BITSIZE), .BITSIZE_be(n_byte_on_databus*2), .BITSIZE_dout_a(BRAM_BITSIZE), .BITSIZE_dout_b(BRAM_BITSIZE), .BITSIZE_in1(BITSIZE_in1), .BITSIZE_in2(BITSIZE_in2), .BITSIZE_in3(BITSIZE_in3), .BITSIZE_out1(BITSIZE_out1), .BITSIZE_S_addr_ram(BITSIZE_S_addr_ram), .BITSIZE_S_Wdata_ram(BITSIZE_S_Wdata_ram), .BITSIZE_Sin_Rdata_ram(BITSIZE_Sin_Rdata_ram), .BITSIZE_Sout_Rdata_ram(BITSIZE_Sout_Rdata_ram), .BITSIZE_S_data_ram_size(BITSIZE_S_data_ram_size), .address_space_begin(address_space_begin), .address_space_rangesize(address_space_rangesize), .BUS_PIPELINED(BUS_PIPELINED), .BRAM_BITSIZE(BRAM_BITSIZE), .PRIVATE_MEMORY(PRIVATE_MEMORY), .USE_SPARSE_MEMORY(USE_SPARSE_MEMORY), .BITSIZE_proxy_in1(BITSIZE_proxy_in1), .BITSIZE_proxy_in2(BITSIZE_proxy_in2), .BITSIZE_proxy_in3(BITSIZE_proxy_in3), .BITSIZE_proxy_out1(BITSIZE_proxy_out1), .nbit_read_addr(nbit_read_addr), .n_byte_on_databus(n_byte_on_databus), .n_mem_elements(n_mem_elements), .n_bytes(n_bytes), .HIGH_LATENCY(HIGH_LATENCY)) ADDRESS_DECODING_LOGIC_instance (.out1(out1), .Sout_Rdata_ram(Sout_Rdata_ram), .Sout_DataRdy(Sout_DataRdy), .proxy_out1(proxy_out1), .clock(clock), .reset(reset), .in1(in1), .in2(in2), .in3(in3), .sel_LOAD(sel_LOAD), .sel_STORE(sel_STORE), .S_oe_ram(S_oe_ram), .S_we_ram(S_we_ram), .S_addr_ram(S_addr_ram), .S_Wdata_ram(S_Wdata_ram), .Sin_Rdata_ram(Sin_Rdata_ram), .S_data_ram_size(S_data_ram_size), .Sin_DataRdy(Sin_DataRdy), .proxy_in1(proxy_in1), .proxy_in2(proxy_in2), .proxy_in3(proxy_in3), .proxy_sel_LOAD(proxy_sel_LOAD), .proxy_sel_STORE(proxy_sel_STORE), .bram_write(bram_write), .memory_addr_a(memory_addr_a), .memory_addr_b(memory_addr_b), .din_value_aggregated(din_value_aggregated), .be(be), .dout_a(dout_a), .dout_b(dout_b));
endmodule

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2004-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module STD_BRAM_SP(clock, reset, S_oe_ram, S_we_ram, S_addr_ram, S_Wdata_ram, Sin_Rdata_ram, Sout_Rdata_ram, S_data_ram_size, Sin_DataRdy, Sout_DataRdy);
  parameter BITSIZE_S_addr_ram=1, BITSIZE_S_Wdata_ram=8, BITSIZE_Sin_Rdata_ram=8, BITSIZE_Sout_Rdata_ram=8, BITSIZE_S_data_ram_size=1, INIT_file="array.data", n_elements=1, data_size=32, address_space_begin=0, address_space_rangesize=4, BUS_PIPELINED=1, BRAM_BITSIZE=32, USE_SPARSE_MEMORY=1, HIGH_LATENCY=0;
  // IN
  input clock;
  input reset;
  input S_oe_ram;
  input S_we_ram;
  input [BITSIZE_S_addr_ram-1:0] S_addr_ram;
  input [BITSIZE_S_Wdata_ram-1:0] S_Wdata_ram;
  input [BITSIZE_Sin_Rdata_ram-1:0] Sin_Rdata_ram;
  input [BITSIZE_S_data_ram_size-1:0] S_data_ram_size;
  input Sin_DataRdy;
  // OUT
  output [BITSIZE_Sout_Rdata_ram-1:0] Sout_Rdata_ram;
  output Sout_DataRdy;
  ARRAY_1D_STD_BRAM_SP #(.BITSIZE_in1(1), .BITSIZE_in2(BITSIZE_S_addr_ram), .BITSIZE_in3(BITSIZE_S_data_ram_size), .BITSIZE_out1(1), .BITSIZE_S_addr_ram(BITSIZE_S_addr_ram), .BITSIZE_S_Wdata_ram(BITSIZE_S_Wdata_ram), .BITSIZE_Sin_Rdata_ram(BITSIZE_Sin_Rdata_ram), .BITSIZE_Sout_Rdata_ram(BITSIZE_Sout_Rdata_ram), .BITSIZE_S_data_ram_size(BITSIZE_S_data_ram_size), .INIT_file(INIT_file), .n_elements(n_elements), .data_size(data_size), .address_space_begin(address_space_begin), .address_space_rangesize(address_space_rangesize), .BUS_PIPELINED(BUS_PIPELINED), .BRAM_BITSIZE(BRAM_BITSIZE), .PRIVATE_MEMORY(0), .USE_SPARSE_MEMORY(USE_SPARSE_MEMORY), .HIGH_LATENCY(HIGH_LATENCY), .BITSIZE_proxy_in1(1), .BITSIZE_proxy_in2(BITSIZE_S_addr_ram), .BITSIZE_proxy_in3(BITSIZE_S_data_ram_size), .BITSIZE_proxy_out1(1)) ARRAY_1D_STD_BRAM_instance (.Sout_Rdata_ram(Sout_Rdata_ram), .Sout_DataRdy(Sout_DataRdy), .clock(clock), .reset(reset), .in1(1'b0), .in2({BITSIZE_S_addr_ram{1'b0}}), .in3({BITSIZE_S_data_ram_size{1'b0}}), .sel_LOAD(1'b0), .sel_STORE(1'b0), .S_oe_ram(S_oe_ram), .S_we_ram(S_we_ram), .S_addr_ram(S_addr_ram), .S_Wdata_ram(S_Wdata_ram), .Sin_Rdata_ram(Sin_Rdata_ram), .S_data_ram_size(S_data_ram_size), .Sin_DataRdy(Sin_DataRdy), .proxy_in1(1'b0), .proxy_in2({BITSIZE_S_addr_ram{1'b0}}), .proxy_in3({BITSIZE_S_data_ram_size{1'b0}}), .proxy_sel_LOAD(1'b0), .proxy_sel_STORE(1'b0));
endmodule

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2004-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module STD_BRAM(clock, reset, S_oe_ram, S_we_ram, S_addr_ram, S_Wdata_ram, Sin_Rdata_ram, Sout_Rdata_ram, S_data_ram_size, Sin_DataRdy, Sout_DataRdy);
  parameter BITSIZE_S_addr_ram=1, BITSIZE_S_Wdata_ram=8, BITSIZE_Sin_Rdata_ram=8, BITSIZE_Sout_Rdata_ram=8, BITSIZE_S_data_ram_size=1, INIT_file="array.data", n_elements=1, data_size=32, address_space_begin=0, address_space_rangesize=4, BUS_PIPELINED=1, BRAM_BITSIZE=32, USE_SPARSE_MEMORY=1;
  // IN
  input clock;
  input reset;
  input S_oe_ram;
  input S_we_ram;
  input [BITSIZE_S_addr_ram-1:0] S_addr_ram;
  input [BITSIZE_S_Wdata_ram-1:0] S_Wdata_ram;
  input [BITSIZE_Sin_Rdata_ram-1:0] Sin_Rdata_ram;
  input [BITSIZE_S_data_ram_size-1:0] S_data_ram_size;
  input Sin_DataRdy;
  // OUT
  output [BITSIZE_Sout_Rdata_ram-1:0] Sout_Rdata_ram;
  output Sout_DataRdy;
  STD_BRAM_SP #(.BITSIZE_S_addr_ram(BITSIZE_S_addr_ram), .BITSIZE_S_Wdata_ram(BITSIZE_S_Wdata_ram), .BITSIZE_Sin_Rdata_ram(BITSIZE_Sin_Rdata_ram), .BITSIZE_Sout_Rdata_ram(BITSIZE_Sout_Rdata_ram), .BITSIZE_S_data_ram_size(BITSIZE_S_data_ram_size), .INIT_file(INIT_file), .n_elements(n_elements), .data_size(data_size), .address_space_begin(address_space_begin), .address_space_rangesize(address_space_rangesize), .BUS_PIPELINED(BUS_PIPELINED), .BRAM_BITSIZE(BRAM_BITSIZE), .USE_SPARSE_MEMORY(USE_SPARSE_MEMORY), .HIGH_LATENCY(0)) STD_BRAM_instance (.Sout_Rdata_ram(Sout_Rdata_ram), .Sout_DataRdy(Sout_DataRdy), .clock(clock), .reset(reset), .S_oe_ram(S_oe_ram), .S_we_ram(S_we_ram), .S_addr_ram(S_addr_ram), .S_Wdata_ram(S_Wdata_ram), .Sin_Rdata_ram(Sin_Rdata_ram), .S_data_ram_size(S_data_ram_size), .Sin_DataRdy(Sin_DataRdy));
endmodule

// This component is part of the BAMBU/PANDA IP LIBRARY
// Copyright (C) 2013-2015 Politecnico di Milano
// Author(s): Fabrizio Ferrandi <[email protected]>
// License: PANDA_GPLv3
`timescale 1ns / 1ps
module register_AR_NORETIME(clock, reset, in1, out1);
  parameter BITSIZE_in1=1, BITSIZE_out1=1;
  // IN
  input clock;
  input reset;
  input [BITSIZE_in1-1:0] in1;
  // OUT
  output [BITSIZE_out1-1:0] out1;
  (* dont_retime *)(* keep = "true" *) reg [BITSIZE_out1-1:0] reg_out1;
  assign out1 = reg_out1;
  always @(posedge clock or negedge reset)
    if (!reset)
      reg_out1 <= {BITSIZE_out1{1'b0}};
    else
      reg_out1 <= in1;

  // Add assertion here
  // psl default clock = (posedge clock);
  // psl ERROR_regout_unknown: assert never {(^in1) === 1'bX && wenable === 1'b1};
endmodule

// 
// 
// Author(s): 
// License: 
`timescale 1ns / 1ps
module STD_BRAM_wrapper(clock, reset, S_oe_ram, S_we_ram, S_addr_ram, S_Wdata_ram, Sin_Rdata_ram, S_data_ram_size, Sin_DataRdy, Sout_Rdata_ram, Sout_DataRdy);
  // IN
  input clock;
  input reset;
  input S_oe_ram;
  input S_we_ram;
  input [14:0] S_addr_ram;
  input [31:0] S_Wdata_ram;
  input [31:0] Sin_Rdata_ram;
  input [6:0] S_data_ram_size;
  input Sin_DataRdy;
  // OUT
  output [31:0] Sout_Rdata_ram;
  output Sout_DataRdy;
  // Component and signal declarations
  wire [31:0] S_Wdata_ram_SIGI1;
  wire [31:0] S_Wdata_ram_SIGI2;
  wire [14:0] S_addr_ram_SIGI1;
  wire [14:0] S_addr_ram_SIGI2;
  wire [6:0] S_data_ram_size_SIGI1;
  wire [6:0] S_data_ram_size_SIGI2;
  wire S_oe_ram_SIGI1;
  wire S_oe_ram_SIGI2;
  wire S_we_ram_SIGI1;
  wire S_we_ram_SIGI2;
  wire Sin_DataRdy_SIGI1;
  wire Sin_DataRdy_SIGI2;
  wire [31:0] Sin_Rdata_ram_SIGI1;
  wire [31:0] Sin_Rdata_ram_SIGI2;
  wire Sout_DataRdy_SIGO1;
  wire Sout_DataRdy_SIGO2;
  wire [31:0] Sout_Rdata_ram_SIGO1;
  wire [31:0] Sout_Rdata_ram_SIGO2;

  STD_BRAM #(.BITSIZE_S_addr_ram(15), .BITSIZE_S_Wdata_ram(32), .BITSIZE_Sin_Rdata_ram(32), .BITSIZE_Sout_Rdata_ram(32), .BITSIZE_S_data_ram_size(7), .INIT_file("array_ref_0.data"), .n_elements(256), .data_size(32), .address_space_begin(0), .address_space_rangesize(1024), .BUS_PIPELINED(1), .BRAM_BITSIZE(16), .USE_SPARSE_MEMORY(1)) STD_BRAM (.Sout_Rdata_ram(Sout_Rdata_ram_SIGO1), .Sout_DataRdy(Sout_DataRdy_SIGO1), .clock(clock), .reset(reset), .S_oe_ram(S_oe_ram_SIGI2), .S_we_ram(S_we_ram_SIGI2), .S_addr_ram(S_addr_ram_SIGI2), .S_Wdata_ram(S_Wdata_ram_SIGI2), .Sin_Rdata_ram(Sin_Rdata_ram_SIGI2), .S_data_ram_size(S_data_ram_size_SIGI2), .Sin_DataRdy(Sin_DataRdy_SIGI2));
  register_AR_NORETIME #(.BITSIZE_in1(32), .BITSIZE_out1(32)) S_Wdata_ram_REG (.out1(S_Wdata_ram_SIGI2), .clock(clock), .reset(reset), .in1(S_Wdata_ram_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(15), .BITSIZE_out1(15)) S_addr_ram_REG (.out1(S_addr_ram_SIGI2), .clock(clock), .reset(reset), .in1(S_addr_ram_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(7), .BITSIZE_out1(7)) S_data_ram_size_REG (.out1(S_data_ram_size_SIGI2), .clock(clock), .reset(reset), .in1(S_data_ram_size_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(1), .BITSIZE_out1(1)) S_oe_ram_REG (.out1(S_oe_ram_SIGI2), .clock(clock), .reset(reset), .in1(S_oe_ram_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(1), .BITSIZE_out1(1)) S_we_ram_REG (.out1(S_we_ram_SIGI2), .clock(clock), .reset(reset), .in1(S_we_ram_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(1), .BITSIZE_out1(1)) Sin_DataRdy_REG (.out1(Sin_DataRdy_SIGI2), .clock(clock), .reset(reset), .in1(Sin_DataRdy_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(32), .BITSIZE_out1(32)) Sin_Rdata_ram_REG (.out1(Sin_Rdata_ram_SIGI2), .clock(clock), .reset(reset), .in1(Sin_Rdata_ram_SIGI1));
  register_AR_NORETIME #(.BITSIZE_in1(1), .BITSIZE_out1(1)) Sout_DataRdy_REG (.out1(Sout_DataRdy_SIGO2), .clock(clock), .reset(reset), .in1(Sout_DataRdy_SIGO1));
  register_AR_NORETIME #(.BITSIZE_in1(32), .BITSIZE_out1(32)) Sout_Rdata_ram_REG (.out1(Sout_Rdata_ram_SIGO2), .clock(clock), .reset(reset), .in1(Sout_Rdata_ram_SIGO1));
  // io-signal post fix
  assign S_oe_ram_SIGI1 = S_oe_ram;
  assign S_we_ram_SIGI1 = S_we_ram;
  assign S_addr_ram_SIGI1 = S_addr_ram;
  assign S_Wdata_ram_SIGI1 = S_Wdata_ram;
  assign Sin_Rdata_ram_SIGI1 = Sin_Rdata_ram;
  assign S_data_ram_size_SIGI1 = S_data_ram_size;
  assign Sin_DataRdy_SIGI1 = Sin_DataRdy;
  assign Sout_Rdata_ram = Sout_Rdata_ram_SIGO2;
  assign Sout_DataRdy = Sout_DataRdy_SIGO2;

endmodule

The initialization file array_ref_0.data

1110001001010010
0110001010001010
1001011100000110
0000000100110111
1110001001110010
0000010011000110
1001110101001001
0010110110110011
1100001010000010
1100001110111000
0000111011110011
1110011110001001
0011000101101011
0010101111101011
0010110101111001
0110001110100001
0111010001101010
1101110111101110
0010000010010011
1110011111100001
1011001100000111
0010010101100011
0100111110111000
1001100101101111
0111010011101000
0111110111100000
1110010101001001
1001100000100110
0001000010011100
1110101100001101
1001010010101110
0010001001110100
0110010101011101
1110110101011011
1011001011100101
0111000110001110
1101111001100111
0101110010101010
1100011000011101
0001100101111001
1011101001111111
1111010001010001
0011001100011101
1101001011011011
0011000010100011
1100110110001101
1100100010110001
0011100001110110
0101000101001001
0111110001110010
0110111001001011
0001001011111011
1111001111100010
1011011001111111
0000111000101000
0011101110011101
0001111100011011
0001010001110110
1110011011100000
0001001010101000
1000110010110111
1101011111101001
0011111010101111
0100000111101111
1000001100101010
1101100001110011
1101110010010110
0110000011100011
0101001001110011
1011011011011101
1011001100000110
1010010100011010
0000011001100000
1001100100010100
1000100111000001
0001000001100100
1001000000001011
0110110101111110
1110100100110010
1000101010101011
1010010111010111
1101111000110001
0111010110010101
0001010011111001
1000101010111000
0110010000110011
0111010111001000
1000110011111001
0010111001001101
1010110000100010
0001111001000011
0000100100110110
1011101011001111
1110001001100101
1100110000011010
1001000011010001
1100001111001111
0100101001100001
1001010101000110
0110000010010011
0101010000001110
1001001101100010
1011101100101000
0010000101101100
1010001000001001
0001000011000101
1111101010000011
0101101010001010
1111101001011000
1011001100110110
0011100110000000
0110000001010110
0001000101111010
1001000100001111
1101110101011111
1000001001100011
1000101110011001
0010101110110101
1010001101001011
1000011111010001
0110101010010101
1011110000100110
0000010110001110
1111011011010111
1111010011000101
0100000110011100
0110000110110100
0110111001110000
1110111001100001
1110110101011011
0011000011001010
0001100000111001
1101001101111001
1000100111000111
0100010001110111
0111000110111000
1001101101100010
1011010001111011
0000001010010100
0110100110010000
0000001101111111
0011000001101101
1110101100011100
0110100111100010
1000111100011011
1101100000000101
1010001001101111
0101001001000111
1001111111100110
0110000011110001
1010001111110110
0010011100000101
1111110100001001
0101110111010011
1010111110100000
1010100010110001
1110110011111111
1110110011110101
1010110011100011
1110010011111000
0101101000000000
1101111001111111
0101011010010011
0001101110101011
0110000100001011
0001010111100010
1001100100010100
1011100011011010
0111111001100010
0010000011111111
0110000011111010
0101101010101100
0100101010110110
0111101111100100
0001011011110000
1110001111110011
0001101110010011
0010011000100111
1000011110010100
1101011001100011
0110011111010001
0010001010010001
1010111000110100
0110000100100110
1100010010101001
1111111100001101
0010001100110010
1011000000000110
1011001000010110
1011111100101000
1101110010101000
0101011010001110
0110010111101111
0111001000011111
0110010001011111
0001001001011101
0001101011101001
0000011011100011
0101111101111110
0100111000010100
1000111101100000
1010000000110111
1110110101101110
0010010000101111
0011111100010100
1101001001000110
0001011011010011
1101010000000110
0110000110110001
0010010011110010
1101110111000110
1001100111001000
1110011110111110
1101100110100101
0100010111000110
0110111010001010
1101101001101001
1110000000000101
1000110111101010
1100100000001010
0001110000000001
1110110010010011
0001110110110001
1000100111011101
1100000111011010
0110111001010000
1101100000011101
0111000110000100
0011100100011011
1000101001010010
1000001101010101
0011101100000111
0110101100001011
0101101101001101
0111010010000100
0000011001110100
0010110010010000
1001100111100100
1010110101001000
0101101010011111
1010001010000011
1100111011110001
1101111011101011
1110100001101100
1010101110110001
1110011000010001
0110001100000011
0000110110111101
0010000100101101
1011110011000001
0111010001110100
1110010110011001
1001100101000001
1111111001111000
0101101111010001
0111001010100000
1000010100000000
1110100010010001
1011111111000001
1010000100100111
1010001111111010
0000000001010101
0110101101011101
0010111101001111
0010101111001011
1110001100111101
1110101100110101
0110011011100111
0101110110001011
0110010101100010
1111100001011001
1111010001001001
1000001011111100
0011111000000111
1101111100001001
0000100000001110
0111000100111111
1111111101110100
0111000001010110
0010001011110110
1011011101010111
0010000111011100
1110101101001111
1011110000011011
1010100110100111
1010111001111100
1110111111110111
1001111100111111
1010111101000111
1101010001111100
1001100110110001
0100011000011011
0001000101000000
1000110000111101
1011010110110110
0101101111000111
1000001100000100
0011110001000111
0110101111110010
0001101111111011
1010101011111011
0110001101111001
0111111000110001
0110001010110101
0101001000110000
1101101100011001
1000010110000001
1110101111001001
1011010110110011
1000011110000011
0010010101101000
0000100000100000
1111010001101101
1100101100100101
0101011100101110
0111100100101000
1010101010001110
1010000001011011
1010100000010111
1111111110000011
0001001010011101
0101100001001001
0011001101001011
0111111101011010
0010000100111001
1101110100100000
0110111110000110
0111011100100110
0001101001110011
0000111110011111
1010000011100010
0100101001000111
1011000011010110
0001000010001011
0101101001000100
1101100000010001
1100110100100001
1010010011001010
1101000010010001
1111010000010110
0110010110010001
0110110110111011
0010011111101111
1000101010001101
0110101010000001
0010000110000010
1000011101101110
1000001111111011
0110001001010110
1100011111001110
0100101111111010
0100101011000000
1100110011101101
0111101010010111
0011111000001110
0111111111010000
0111100110101101
0001110011111101
0101001110101001
0011011100011001
0110101000100110
1011000001011011
1011001010011001
0100011001011111
1001111000110100
1111001011100011
1100100101001100
0110100001100001
0011010000011111
0010110111110100
0110011011010101
0011110011111111
0010101001100010
0011010111001111
1000110100100110
1011010110101010
0101010010111010
0000001110100011
0100011110000100
0010101010010100
1111001101101101
0111001101001110
0010001000011001
0100111111100000
0100010010111101
0111000111100100
1111011000111101
0101100010011100
0110000001111101
0001010010101111
0001101101110101
0100000110101000
1010001010100010
1101010111100101
1110000110110011
0100001110010001
0110001101001111
0111100010011001
0010001011010100
0011001101010100
1101011101000100
1011001001000001
0000111100101100
1111000000000000
0001000001110110
1011111111101101
0100100111011110
1110001011000010
1100111001110110
0000000110111000
1010100000101101
1110101001111101
0111101101010001
1110110000111101
0010001011110111
0000100101101111
1011111000001100
1001000111101000
1100010100111111
1100101100110111
1111100100001011
1111100101001000
1110101000010101
0101101101011011
1011011011010000
0111101101011100
0010101011111111
0110101110101101
1000010001111111
0011011110101101
1010100111000100
1101010111001010
0101100000100111
0111010110100100
0111111100001110
0010011100101111
0000001001000100
0100011000111010
0100110010010000
1001010011101001
0001001111011010
1101001110101011
0100001110010111
0100111000010001
1011010010111011
0111100110010000
0011101000100010
0101100000010010
0011001101111011
1101101001100010
0001110101001011
1001011110110010
1110111100100111
1101010110010001
1110110111110001
1110011010100111
1100001111001011
1010000101101100
0000011101101111
1011011100111111
1111101110111001
0101010001011110
1000111011111100
0010011001110001
1010110000111110
0000110001110101
1010111110010010
0100100111011100
1110101001101000
0001000011100011
1010001010111111
0100100010110001
0100110111001111
0111001100100110
1100011001010000
1011011010001100
1111001000000011
0010100100110101
0100101101001110
1001010011010001
1110100110111010
1111010000110001
1110011110000111
1011011011111100
1101110011110011
0010101010100101
0001100011001111
1000110011110100
0001100010001010
0101000011110010
0101100000010011
1100100011101000
0011010000100010
1000000011010010
1011010001000001
0000000000101000
0110110111111001
0011101100110110
0111010100010100
1010001000001110
0011100001100001
0000010100110010
0110100111111101
1010001000101111
0011010111000100
1101111011000100
0110110000001001
0001101110111111
1000100100101011
0100101000110011
0111110110100010
0001010001100010

and the vivado tcl synthesis script

proc dump_statistics {  } {
  set util_rpt [report_utilization -return_string]
  set LUTFFPairs 0
  set SliceRegisters 0
  set Slice 0
  set SliceLUTs 0
  set SliceLUTs1 0
  set BRAMFIFO36 0
  set BRAMFIFO18 0
  set BRAMFIFO36_star 0
  set BRAMFIFO18_star 0
  set BRAM18 0
  set BRAMFIFO 0
  set BIOB 0
  set DSPs 0
  set TotPower 0
  set design_slack 0
  set design_req 0
  set design_delay 0
  regexp --  {\s*LUT Flip Flop Pairs\s*\|\s*([^[:blank:]]+)} $util_rpt ignore LUTFFPairs
  regexp --  {\s*Slice Registers\s*\|\s*([^[:blank:]]+)} $util_rpt ignore SliceRegisters
  regexp --  {\s*Slice\s*\|\s*([^[:blank:]]+)} $util_rpt ignore Slice
  regexp --  {\s*Slice LUTs\s*\|\s*([^[:blank:]]+)} $util_rpt ignore SliceLUTs
  regexp --  {\s*Slice LUTs\*\s*\|\s*([^[:blank:]]+)} $util_rpt ignore SliceLUTs1
  if { [expr {$LUTFFPairs == 0}] } {
    set LUTFFPairs $SliceLUTs1
  }
  if { [expr {$SliceLUTs == 0}] } {
    set SliceLUTs $SliceLUTs1
  }
  regexp --  {\s*RAMB36/FIFO36\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BRAMFIFO36
  regexp --  {\s*RAMB18/FIFO18\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BRAMFIFO18
  regexp --  {\s*RAMB36/FIFO\*\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BRAMFIFO36_star
  regexp --  {\s*RAMB18/FIFO\*\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BRAMFIFO18_star
  regexp --  {\s*RAMB18\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BRAM18
  set BRAMFIFO [expr {(2 *$BRAMFIFO36) + $BRAMFIFO18 + (2*$BRAMFIFO36_star) + $BRAMFIFO18_star + $BRAM18}]
  regexp --  {\s*Bonded IOB\s*\|\s*([^[:blank:]]+)} $util_rpt ignore BIOB
  regexp --  {\s*DSPs\s*\|\s*([^[:blank:]]+)} $util_rpt ignore DSPs
  set power_rpt [report_power -return_string]
  regexp --  {\s*Total On-Chip Power \(W\)\s*\|\s*([^[:blank:]]+)} $power_rpt ignore TotPower
  set Timing_Paths [get_timing_paths -max_paths 1 -nworst 1 -setup]
  if { [expr {$Timing_Paths == ""}] } {
    set design_slack 0
    set design_req 0
  } else {
    set design_slack [get_property SLACK $Timing_Paths]
    set design_req [get_property REQUIREMENT  $Timing_Paths]
  }
  if { [expr {$design_slack == ""}] } {
    set design_slack 0
  }
  if { [expr {$design_req == ""}] } {
    set design_req 0
  }
  set design_delay [expr {$design_req - $design_slack}]
  puts $design_delay 
  file delete -force vivado_flowSTD_BRAM_wrapper_report.xml 
  set ofile_report [open vivado_flowSTD_BRAM_wrapper_report.xml w]
  puts $ofile_report "<?xml version=\"1.0\"?>"
  puts $ofile_report "<document>"
  puts $ofile_report "  <application>"
  puts $ofile_report "    <section stringID=\"XILINX_SYNTHESIS_SUMMARY\">"
  puts $ofile_report "      <item stringID=\"XILINX_LUT_FLIP_FLOP_PAIRS_USED\" value=\"$LUTFFPairs\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_SLICE\" value=\"$Slice\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_SLICE_REGISTERS\" value=\"$SliceRegisters\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_SLICE_LUTS\" value=\"$SliceLUTs\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_BLOCK_RAMFIFO\" value=\"$BRAMFIFO\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_IOPIN\" value=\"$BIOB\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_DSPS\" value=\"$DSPs\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_POWER\" value=\"$TotPower\"/>"
  puts $ofile_report "      <item stringID=\"XILINX_DESIGN_DELAY\" value=\"$design_delay\"/>"
  puts $ofile_report "    </section>"
  puts $ofile_report "  </application>"
  puts $ofile_report "</document>"
  close $ofile_report
}; #END PROC
set outputDir vivado_flow
file mkdir $outputDir
create_project STD_BRAM_wrapper -force
chan configure stdout -buffering none
exec >&@stdout yosys -p "read_verilog -defer STD_BRAM.v" -p "synth_xilinx -edif vivado_flow/STD_BRAM_wrapper.edif -top STD_BRAM_wrapper"
read_xdc STD_BRAM_wrapper.sdc
read_edif $outputDir/STD_BRAM_wrapper.edif
link_design -mode out_of_context -top STD_BRAM_wrapper -part xc7z020clg484-1 
set_property HD.CLK_SRC BUFGCTRL_X0Y16 [get_ports clock]
dump_statistics
opt_design
dump_statistics
place_design
dump_statistics
route_design
dump_statistics
close_design
close_project

from yosys.

cliffordwolf avatar cliffordwolf commented on July 18, 2024

The following (much smaller ;) testcase shows what is going on here:

module mem2reg_with_two_always_blocks(
    input clk,
    input [1:0] a_addr, a_din, b_addr, b_din,
    input a_wen, b_wen,
    output reg [1:0] a_dout, b_dout
);
    reg [1:0] memory [0:3];

    initial begin
        memory[0] = 0;
        memory[1] = 1;
        memory[2] = 2;
        memory[3] = 3;
    end

    always @(posedge clk) begin
        if (a_wen)
            memory[a_addr] <= a_din;
        a_dout <= memory[a_addr];
    end

    always @(posedge clk) begin
        if (b_wen)
            memory[b_addr] <= b_din;
        b_dout <= memory[b_addr];
    end
endmodule

After mem2reg this are two always blocks each inferring a flop flop to drive each element of memory[].

Interestingly if I synthesize above example with vivado, it will produce a RAMB18E1 cell. (For a total of 8 bits of memory!!!) So even vivado has its problems fully understanding what is going when you write to memory from two always blocks. The following implementation using only one always block will produce the expected result (with both vivado and yosys):

module mem2reg_with_two_always_blocks(
    input clk,
    input [1:0] a_addr, a_din, b_addr, b_din,
    input a_wen, b_wen,
    output reg [1:0] a_dout, b_dout
);
    reg [1:0] memory [0:3];

    initial begin
        memory[0] = 0;
        memory[1] = 1;
        memory[2] = 2;
        memory[3] = 3;
    end

    always @(posedge clk) begin
        if (a_wen)
            memory[a_addr] <= a_din;
        if (b_wen)
            memory[b_addr] <= b_din;
        a_dout <= memory[a_addr];
        b_dout <= memory[b_addr];
    end
endmodule

In general you should only write to a memory from two always blocks if the writes are from two different clock domains.

I am not sure yet if I will create a work-around for this problem in Yosys. For this specific case the problem will go away anyway as soon as support for memory initialization is added.. I have to think a little more about it..

But given that Vivado's synth_design creates horrible results for small memories when the write port is implemented with two always blocks as shown above, and given the generic nature of the BRAM_MEMORY_SP module, it might be a good idea to rewrite it to use a single-always-block coding style anyways.

PS: Another problem is that the initialization values for FFs do not make it into the EDIF netlist at the moment. You have to convert it to an explicit reset path with a reset input pin, like you would in an ASIC flow (Sec. IV in Appnote 011 shows how it is done). In short: Initialized data isn't really supported in Yosys 0.5 synthesis. (In my defense: It is well supported in the formal verification flow. ;) But I am aware that this is a very important feature and hope to have much better support for it in Yosys 0.6..

from yosys.

fabrizioferrandi avatar fabrizioferrandi commented on July 18, 2024

Thanks for going in deep with the issue. It was and it is a pain to model BlockRam in Verilog for FPGA. In our framework we have different models for different FPGA vendors.
The example I've submitted is a customization of a more general case.
Anyway, the big problem I'd in modeling BRAM_MEMORY_SP is to use the byte-enabling feature. So, the solution I've found that pass all the regressions is to split reading from writing and to describe the byte enabling with the generate for. Altera for example goes on the same path you are suggesting but it is less parametrizable(e.g., I was not able to use generate for) and requires SystemVerilog .
In more complex cases we have in our framework for Xilinx, the number of read and write ports could be different so I've other models where reading and writing are separated in different processes. This mainly comes from the fact that Xilinx supports multiple reads and up to two write ports.

In general you should only write to a memory from two always blocks if the writes are from two different clock domains.

I do not have evidence of this from the Xilinx documentation. Maybe I miss something.

Anyway, thanks for looking in the issue.

from yosys.

cliffordwolf avatar cliffordwolf commented on July 18, 2024

So, the solution I've found that pass all the regressions is to split reading from writing and to describe the byte enabling with the generate for.

unrolling for loops in always blocks is part of synthesizable verilog. (sec. 7.7.6 of the 2005 Verilog RTL Synthesis Standard says they are supported under the condition that "Loop bounds shall be statically computable for a for loop.") So you can write everything in one always block by substituting generate-for with a regular for loop :

integer i11;
always @(posedge clock)
begin
  if(HIGH_LATENCY==0)
  begin
    dout_a <= memory[memory_addr_a];
  end
  else
  begin
    dout_a_registered <= memory[memory_addr_a];
    dout_a <= dout_a_registered;
  end

  for (i11=0; i11<n_byte_on_databus; i11=i11+1)
  begin
      if(we_a[i11])
        memory[memory_addr_a][(i11+1)*8-1:i11*8] <= din_value_aggregated[(i11+1)*8-1:i11*8];
  end
  if (n_bytes > BRAM_BITSIZE/8)
  begin
    begin
      if(HIGH_LATENCY==0)
      begin
        dout_b <= memory[memory_addr_b];
      end
      else
      begin
        dout_b_registered <= memory[memory_addr_b];
        dout_b <= dout_b_registered;
      end
    end
    for (i11=0; i11<n_byte_on_databus; i11=i11+1)
      begin
        if(we_b[i11])
          memory[memory_addr_b][(i11+1)*8-1:i11*8] <= din_value_aggregated[(i11+1+n_byte_on_databus)*8-1:(i11+n_byte_on_databus)*8];
      end
  end
end

In general you should only write to a memory from two always blocks if the writes are from two different clock domains.

I do not have evidence of this from the Xilinx documentation. Maybe I miss something.

This is not Xilinx specific. The synthesis language spec does not define what should happen in those cases. So it is only supported for cases like clock domain crossing and is always a non-standard extension and should be assumed not to be supported unless stated differently in the documentation.

from yosys.

cliffordwolf avatar cliffordwolf commented on July 18, 2024

Support for initialized (Xilinx) BRAMs has now been implemented in 8520b7f. Please test.

I'm now closing this issue. Please open a new issue if you find a bug in the feature.

from yosys.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.