Comments (5)
@hecmay Interesting. It is tricky as we cannot use hardware emulation to debug this. I encountered a similar problem with you. When I was setting the IP of the first FPGA1, I found the APR valid entry of FPGA2 and FPGA3 are already found (I haven't set them). My solution was to change the IP address (such as 192.168.3.10 or even more weird ones). Sometimes resetting the FPGA with 'xbutil reset -d ...' may help.
In addition, I write the received data into data_received.txt, you can check if it is a whole UDP packet that is lost or just several 64 Bytes Stream transitions are lost. For example, there was one time that I tried 1408+704 Bytes (1.5 UDP packet), but I only get the first 22 stream transitions (33 transitions expect). In this case, the second UDP packet is lost. In this case, it could be related to their Network setup. If 25 transitions are received (for example), then it could be the problem of my kernels that some stream transitions are lost.
from oct_fpga_project_template_for_nextlab.
Oh, I just figured out you need to set the IP address on HEAD node once again before you initiating data transmit request on it
$ ./udp_setup bit_container_0.xclbin head_ip.ini
$ ./head_bin bit_container_0.xclbin 1024
and it would work!
from oct_fpga_project_template_for_nextlab.
Yes, it is tricky because we cannot set up all FPGAs simultaneously. When we set up FPGA1, the FPGA2 hasn't been set up so the ARP discovery FPGA1 cannot find FPGA2. Mostly, we have to run udp_setup at twice except for the last one...
from oct_fpga_project_template_for_nextlab.
@ngdxzy thanks for the information. I also found that even if all the FPGAs are set up correctly, the FPGA on the HEAD node may still get deadlocked sometimes (due to packet loss or something I guess?)
I was able to run it successfully just now (for around ~1000 times to collect some latency data), and all of a sudden, it stopped working (i.e., the HEAD node cannot receive any data it has sent out and Rx kernel keeps waiting there)
from oct_fpga_project_template_for_nextlab.
I was able to run it successfully just now (for around ~1000 times to collect some latency data), and all of a sudden, it stopped working (i.e., the HEAD node cannot receive any data it has sent out and Rx kernel keeps waiting there)
Not so sure why. My hypothesis is that someone else in the cluster is using the same IP:PORT settings, so the packet was forwarded to their device. I ran into such scenarios when running the udp-demo example, where my receiver node receives some data before I started the sender node (so it must be the data from someone else)
but even if I changed the IP, it still does not work now. the only good thing is that at least it works sometimes
from oct_fpga_project_template_for_nextlab.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from oct_fpga_project_template_for_nextlab.