In one project, we made a configurable pipeline that you can set in the tcl :<br

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Good progress in recently merged <a class="issue-link js-issue-link" data-error-text="

Making pragmas configurable,about fastmachinelearning/hls4ml

Comments (7)

nhanvtran commented on May 13, 2024

Kevin P says it should work passing a c variable to a preprocessor directive, so that makes things easier!

from hls4ml.

ejk43 commented on May 13, 2024

Kevin P says it should work passing a c variable to a preprocessor directive, so that makes things easier

Neat! I have not tried this before, but happy to hear it's possible.

As a related thought: I've recently been using a few digital signal processing operations provided with HLS that might provide a helpful model: 1) CORDIC and 2) FFT .. Both cordic and fft libraries use template functions to control their implementation details, including bitwidths, fixed point vs floating point, HDL architecture, and even what type of operation they run.

Here's a code snippet for an example CORDIC operation. It's a bit extreme use of templating, but I think it's instructive to see some what we could theoretically do. For example, the CORDIC is running a "Translate" operation, it's using scaled radians (range -1 to +1 instead of -pi to +pi), it's choosing the number of iterations automatically, and it's using BRAM to scale the magnitude result, to call out a few of the capabilities.

// Find mag/angle via cordic
typename translate_inputs<cmplx_width, hls::CORDIC_FORMAT_SIG_FRAC>::in cordicdata;
typename translate_outputs<cordic_width, hls::CORDIC_FORMAT_SIG_FRAC>::out outputdata;
cordicdata.cartesian = in[ii];
hls::cordic_base<hls::CORDIC_F_TRANSLATE, hls::CORDIC_TRUE,
                hls::CORDIC_FORMAT_SIG_FRAC, hls::CORDIC_FORMAT_SCA,
                cmplx_width, cordic_width,
                hls::CORDIC_ITER_AUTO, hls::CORDIC_PREC_AUTO,
                hls::CORDIC_ROUND_TRUNCATE, hls::CORDIC_SCALE_BRAM > (cordicdata, outputdata);

Here's an example code snippet for running an FFT...

struct static_config : hls::ip_fft::params_t {
  // Default parameters: Fixed pt config!
  static const unsigned ordering_opt = hls::ip_fft::natural_order;
  static const unsigned max_nfft = FFT_NFFT_MAX;
  static const unsigned config_width = FFT_CONFIG_WIDTH;
  static const unsigned phase_factor_width = FFT_PHASE_FACTOR_WIDTH;
  static const unsigned input_width = FFT_INPUT_WIDTH;
  static const unsigned output_width = FFT_OUTPUT_WIDTH;
};
typedef hls::ip_fft::config_t<static_config> static_config_t;
typedef hls::ip_fft::status_t<static_config> static_status_t;

static_config_t fft_config;
static_status_t fft_status;
fft_config.setDir(1);
fft_config.setSch(0x2AB);

hls::fft<static_config>(iq_in, iq_out, &fft_status, &fft_config);
bool ovflo = fft_status.getOvflo();

There's a LOT that goes into configuring and using the HLS FFT, but the important takeaway in my opinion is that we can create a struct which gets passed in to a template function... I suspect we might be able to use this architecture as inspiration for an improved interface to the neural network library that can plug in to Keras with more flexibility....

Thoughts?

from hls4ml.

nhanvtran commented on May 13, 2024

@ejk43 I like the idea of having the templated configuration for each layer. Just to make sure I understand, we could even configure without preprocessor directives and then combine both functionalities pretty cleanly, right?

also good tips on the CORDIC!

from hls4ml.

benjaminkreis commented on May 13, 2024

Good progress in recently merged #7. Should do something similar for the activation functions

from hls4ml.

ejk43 commented on May 13, 2024

Another thought here-- would it make sense to supply the configuration struct with a target "initiation interval"? The "II" is the real driver of the potential resource reuse, so it would be great to be able to supply the target II to the nnet library, which the calculates the ideal partitioning and unroll factors to hit the target... For example, II = 1 would require full unrolling. II = 2 would want to be unrolled by half the total operations, etc

I've found the pragma PIPELINE directive with a specified II is basically unreliable unless the partitioning and unroll directives are also correct. This suggests to me the nnet library would want to do the required calculations to get unroll/partition correct.

(also, a pretty neat feature could be to chain multiple layers with different network sizes... I wonder if we could intelligently adjust unroll factors for downstream/upstream layers based on the input initiation interval target, to get more resource-reuse downstream if possible. This may be a bit of a stretch goal, though)

from hls4ml.

nhanvtran commented on May 13, 2024

That sounds good. We also found that the pipelining pragma did weird things without unroll/partition set correctly -- thus it was actually commented out of the latest PR. The other problem was that it was in the top function. So putting it in each layer should give us finer control

from hls4ml.

nhanvtran commented on May 13, 2024

Closing this issue. Technical basics are solved and now finer details are being discussed in other issues!

from hls4ml.

Making pragmas configurable about hls4ml HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent