Comments (7)
Kevin P says it should work passing a c variable to a preprocessor directive, so that makes things easier!
from hls4ml.
Kevin P says it should work passing a c variable to a preprocessor directive, so that makes things easier
Neat! I have not tried this before, but happy to hear it's possible.
As a related thought: I've recently been using a few digital signal processing operations provided with HLS that might provide a helpful model: 1) CORDIC and 2) FFT .. Both cordic and fft libraries use template functions to control their implementation details, including bitwidths, fixed point vs floating point, HDL architecture, and even what type of operation they run.
Here's a code snippet for an example CORDIC operation. It's a bit extreme use of templating, but I think it's instructive to see some what we could theoretically do. For example, the CORDIC is running a "Translate" operation, it's using scaled radians (range -1 to +1 instead of -pi to +pi), it's choosing the number of iterations automatically, and it's using BRAM to scale the magnitude result, to call out a few of the capabilities.
// Find mag/angle via cordic
typename translate_inputs<cmplx_width, hls::CORDIC_FORMAT_SIG_FRAC>::in cordicdata;
typename translate_outputs<cordic_width, hls::CORDIC_FORMAT_SIG_FRAC>::out outputdata;
cordicdata.cartesian = in[ii];
hls::cordic_base<hls::CORDIC_F_TRANSLATE, hls::CORDIC_TRUE,
hls::CORDIC_FORMAT_SIG_FRAC, hls::CORDIC_FORMAT_SCA,
cmplx_width, cordic_width,
hls::CORDIC_ITER_AUTO, hls::CORDIC_PREC_AUTO,
hls::CORDIC_ROUND_TRUNCATE, hls::CORDIC_SCALE_BRAM > (cordicdata, outputdata);
Here's an example code snippet for running an FFT...
struct static_config : hls::ip_fft::params_t {
// Default parameters: Fixed pt config!
static const unsigned ordering_opt = hls::ip_fft::natural_order;
static const unsigned max_nfft = FFT_NFFT_MAX;
static const unsigned config_width = FFT_CONFIG_WIDTH;
static const unsigned phase_factor_width = FFT_PHASE_FACTOR_WIDTH;
static const unsigned input_width = FFT_INPUT_WIDTH;
static const unsigned output_width = FFT_OUTPUT_WIDTH;
};
typedef hls::ip_fft::config_t<static_config> static_config_t;
typedef hls::ip_fft::status_t<static_config> static_status_t;
static_config_t fft_config;
static_status_t fft_status;
fft_config.setDir(1);
fft_config.setSch(0x2AB);
hls::fft<static_config>(iq_in, iq_out, &fft_status, &fft_config);
bool ovflo = fft_status.getOvflo();
There's a LOT that goes into configuring and using the HLS FFT, but the important takeaway in my opinion is that we can create a struct which gets passed in to a template function... I suspect we might be able to use this architecture as inspiration for an improved interface to the neural network library that can plug in to Keras with more flexibility....
Thoughts?
from hls4ml.
@ejk43 I like the idea of having the templated configuration for each layer. Just to make sure I understand, we could even configure without preprocessor directives and then combine both functionalities pretty cleanly, right?
also good tips on the CORDIC!
from hls4ml.
Good progress in recently merged #7. Should do something similar for the activation functions
from hls4ml.
Another thought here-- would it make sense to supply the configuration struct with a target "initiation interval"? The "II" is the real driver of the potential resource reuse, so it would be great to be able to supply the target II to the nnet library, which the calculates the ideal partitioning and unroll factors to hit the target... For example, II = 1 would require full unrolling. II = 2 would want to be unrolled by half the total operations, etc
I've found the pragma PIPELINE directive with a specified II is basically unreliable unless the partitioning and unroll directives are also correct. This suggests to me the nnet library would want to do the required calculations to get unroll/partition correct.
(also, a pretty neat feature could be to chain multiple layers with different network sizes... I wonder if we could intelligently adjust unroll factors for downstream/upstream layers based on the input initiation interval target, to get more resource-reuse downstream if possible. This may be a bit of a stretch goal, though)
from hls4ml.
That sounds good. We also found that the pipelining pragma did weird things without unroll/partition set correctly -- thus it was actually commented out of the latest PR. The other problem was that it was in the top function. So putting it in each layer should give us finer control
from hls4ml.
Closing this issue. Technical basics are solved and now finer details are being discussed in other issues!
from hls4ml.
Related Issues (20)
- Conv2DTranspose is not supported HOT 1
- SeparableConv1D fail CSynth with Vitis backend
- hls_model.build() cannot not work in Jupyter HOT 4
- ERROR: [XFORM 203-504] Stop unrolling loop 'MultLoop' HOT 1
- ERROR: [XFORM 203-504] Stop unrolling loop 'MultLoop' HOT 8
- Multiple stream clones with different numbers of outputs crash Keras converter.
- hls4ml building project HOT 3
- Inconsistant behaviour of `accum_t` in `io_stream` and `io_parallel` for `pooling` layers with `Vivado` backend
- part7b of tutorial has no result returned back
- [XFORM 203-502] HOT 6
- Non-quantized QKeras layers break conversion
- Concatenation Layer Issue with PyTorch ResNet
- Keras Reshape Layer is Built with Error HOT 1
- vivadoaccelerator backend : bit file note generated HOT 5
- About QBatchNormalization is not support QKeras po2 quantizer HOT 1
- ERROR: [XFORM 203-504] Stop unrolling loop 'Product1' (firmware/nnet_utils/nnet_dense_latency.h:37) in function 'nnet::dense_latency<ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, ap_fixed<16, 6, (ap_q_mode)5, (ap_o_mode)3, 0>, config42_mult>' because it may cause large runtime and excessive memory usage due to increase in code size. Please avoid unrolling the loop or form sub-functions for code in the loop body. myproject_prj:solution1 Dec 27, 2023 6:47:26 PM
- Failure at converters.convert_from_pytorch_model . compile/build()
- Move transpose based on backend from hls4ml/model/layers.py to backend-specific areas
- Problem tracing binary CNN model after recent tracing optimization HOT 8
- Wrong prediction of C Simulation compared to QKeras? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hls4ml.