Giter VIP home page Giter VIP logo

Comments (17)

pkelsey avatar pkelsey commented on August 22, 2024

Are you using the default make file settings, which compiles libuinet with no optimization at all, or have you modified them?

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

I have not yet modified them. I am going to try them next.

  1. I am going to try them with compiler optimization to see the result.
  2. I am going to run the calgrind to see where the time is taken inside libuinet.

Can you also please suggest if there is some other optimization which I can try to get the better numbers? What about the equivalent sysctl changes.

Also, in the above program, my server runs with libuinet and netmap in a VM and my clients are on the base machine.

Thanks...
~Peter

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

Also, if I increase a the client requests, I see the accept failing . The numbers printed here are from my test program which prints the number of connections are handled in that one second.

this 1 sec : connections 3379
this 1 sec : connections 3254
this 1 sec : connections 3140
this 1 sec : connections 3173
this 1 sec : connections 3232
this 1 sec : connections 3209
accept failed (53)
this 1 sec : connections 3028
accept failed (53)
accept failed (53)
this 1 sec : connections 3170
this 1 sec : connections 3074
accept failed (53)
accept failed (53)
this 1 sec : connections 3031
accept failed (53)
accept failed (53)
accept failed (53)

I think it is because of slow pick. In this case how can we increase the queue size?

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

I am not seeing the accept failed errors after I creased the value of MAXCON in sys/sys/socket.h.
But, still my CPS does not go beyond 5K in a single core VM.

now, I have compiled libuinet and sample application with -O3 flag and running with nice (19), and the maximum CPS I was able to achieve is 10K connections. which is less than when compared with kernel space TCP/IP application.

top output:

top - 05:34:28 up 38 min, 3 users, load average: 1.30, 1.16, 1.04
Threads: 104 total, 2 running, 102 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.3 us, 15.5 sy, 36.1 ni, 46.7 id, 0.0 wa, 0.0 hi, 1.4 si, 0.0 st
KiB Mem: 8177624 total, 1401420 used, 6776204 free, 17636 buffers
KiB Swap: 303100 total, 0 used, 303100 free, 141176 cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2205 root 39 19 1482m 769m 1896 S 36.5 9.6 3:12.49 nm_rx: netmap0
2206 root 39 19 1482m 769m 1896 S 10.0 9.6 0:49.71 server
2204 root 39 19 1482m 769m 1896 S 6.6 9.6 0:35.17 nm_tx: netmap0

My observation is that nm_tx thread has reduced the CPU usage in -O3 run, while nm_rx thread did not.

I also printed various constants used in this program, I observed that even is maxsocket number is good, maxfiles number is very low.

uinet starting: cpus=1, nmbclusters=262144
callwheelsize=524288
callwheelsize=524288
link_elf_lookup_symbol: missing symbol hash table
link_elf_lookup_symbol: missing symbol hash table
UINET multiprocessor subsystem configured with 1 CPUs
Timecounters tick every 10.000 msec
maxusers=1
maxfiles=72
maxsockets=262144
nmbclusters=262144
tcp_recvspace=65536
tcp_finwait2_timeout=6000
tcp_fast_finwait2_recycle=0
tcp_recvspace=65536
configstr is eth1
netmap0: Ethernet address: 08:00:27:96:e4:88

from libuinet.

pkelsey avatar pkelsey commented on August 22, 2024

Peter,

Thank you for all of the detailed information on what results you are getting and how you are getting them. I am really busy, but I am working my way towards reproducing what you are seeing and will get back to you.

In the meantime, one thing you could try if you are up to it is batch-processing accepts in accept_cb. You can look at accept_cb() in bin/passive.c for an example. If you ignore all of the references to peer sockets and connections there, I think the structure is pretty straightforward to transfer to your test program. Batching accepts should reduce the total event loop overhead under high connection rates.

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

Hi Patrick,
Doing batch processing increased the number by around 500 to 1K. We also printed that in one batch we were processing max 30 to 40.

From the callgrind analysis we figured out that we are taking considerable time when the server closes the connection. We thought let the server not close the connection immediately and see if the performance improves. But, looks like we can't keep more than 65K con-current connections. Not sure if this is limited by some constants/defines. Do you remember what can we change to increase con-current connection limits.

I have sent you the callgrind screen shot, through email which will help you in where the time is being spent.

Please let us know.

Thanks, Peter

from libuinet.

pkelsey avatar pkelsey commented on August 22, 2024

What command line are you using on the server side, and what are you using to drive traffic? The first thing I am thinking of given the apparent 65k limit is exhaustion of the 16-bit port space on the client side.

The only limit on the libuinet side should be the maximum number of sockets configured via the second parameter to uinet_init. This limit is really an upper bound of the size of the pool used for socket context - making it a huge number at init time will not result in any immediate additional allocation, it will just allow the pool to grow that large if required during operation. If the issue is that you are hitting the limit due to connections being in time-wait, increasing that parameter should relieve the issue.

libuinet has been tested with up to 1 million concurrent listen sockets plus 1 million concurrent active sockets, which requires a suitably large value for the second parameter to uinet_init(), and also a suitable multiplicity of available {server_IP, server_port, client_IP, client_port} tuples.

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

I was using one client machine, I think which is running out of ports. I will use multiple clients and let you know.

On the second front, the max sockets is set to 262144. The other parameters is mentioned below.

Also, I have sent you the callgrind o/p in email.

uinet starting: cpus=1, nmbclusters=262144
callwheelsize=524288
callwheelsize=524288
link_elf_lookup_symbol: missing symbol hash table
link_elf_lookup_symbol: missing symbol hash table
UINET multiprocessor subsystem configured with 1 CPUs
Timecounters tick every 10.000 msec
maxusers=1
maxfiles=72
maxsockets=262144
nmbclusters=262144
tcp_recvspace=65536
tcp_finwait2_timeout=6000
tcp_fast_finwait2_recycle=0
tcp_recvspace=65536
configstr is eth1
netmap0: Ethernet address: 08:00:27:96:e4:88

from libuinet.

pkelsey avatar pkelsey commented on August 22, 2024

OK. To answer an earlier questions of yours regarding the small value of maxfiles, don't worry about that. The maxfiles parameter exists as part of the FreeBSD common kernel infrastructure that is in libuinet, but libuinet makes no use of it - there is no emulation or use of kernel file descriptors at all in libuinet.

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

Thanks. Please let me know what you find from the callgrind output. I am going to try to find the CPS without closing the accepted connections (as "soclose" was taking significant CPU cycle, as shown in the callgrind output). I am also going to replace arc4random with a simple static variable for the random number generation to save the time from arc4random. With these two let me see how much CPS can I get. I am just trying to figure out the places where we need to do some optimization.

I know you are busy for your presentation tomorrow. So, please see when you have time. I will keep you updated on my progress.

Thanks...Peter

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

I tried to see with out closing the socket what is the CPS i can achieve, It was around 18K connection per second. When compared with the open and close it is +7K sessions.

I would like try by disabling syncache. Could you please let me know if i can give it a try by disabling syncache ?

Thanks

from libuinet.

pkelsey avatar pkelsey commented on August 22, 2024

I am getting closer to the point where I can spend a little time digging into this. It is interesting that the close reduces performance so significantly. Until I can reproduce this on my end and have something more concrete to comment on, here are a couple of things that I think frame the issue:

It is a known issue that FreeBSD performance is currently lagging in the area of short-lived connections - see http://www.freebsd.org/cgi/query-pr.cgi?pr=183659. This doesn't mean further tuning and application-side work won't improve the numbers you are seeing, but I think it does set expectations for how high the numbers might go.

libuinet itself is just entering the phase where performance will be analyzed and improved. One of the things that really needs to happen ahead of this work is updating the stack sources libuinet is using to something considerably more recent than the 9.1-RELEASE version it currently uses. Not only do we want to avoid measuring and 'fixing' issues that no longer exist due to subsequent improvements in the main line sources, but in cases where the libuinet work indicates there could be general improvements made to the stack itself, we want to avoid the work of then reproducing the issue with more current sources and developing equivalent patches for submission.

from libuinet.

petergsnm avatar petergsnm commented on August 22, 2024

Thank you.

Please let me know, once you finish the integration. I can do the testing for you and help you identifying the few jerks (if any). I have also integrated a small webserver with libuinet to measure the RPS and CPS and have a KVM-VM handly to measure the performance.

Looking forward to hear from you.

from libuinet.

qiaobing avatar qiaobing commented on August 22, 2024

Is there a time frame for the migration away from 9.1-RELEASE?

Also, I wonder if the user land stack will lose any benefits from checksum offloading which the kernel stack running on a physical box can enjoy (I understand petergsnm's tests were done on KVM).

from libuinet.

caladri avatar caladri commented on August 22, 2024

See issue #11 for information on libuinet's current inability to make use of checksum offloading due to deficiencies of netmap. And yes, if running in a VM or on hardware which doesn't preserve or provide checksum offloading, then the stack will need to do checksum offloading.

from libuinet.

sdu07xd avatar sdu07xd commented on August 22, 2024

I cannt compile it in linux!
Should I set any environment variable ?

the error as flows:
uinet_if_netmap_host.c:331:71: error: ‘struct ifreq’ declared inside parameter list [-Werror]
uinet_if_netmap_host.c:331:71: error: its scope is only this definition or declaration, which is probably not what you want [-Werror]
uinet_if_netmap_host.c: In function ‘if_netmap_ethtool_set_flag’:
uinet_if_netmap_host.c:335:5: error: dereferencing pointer to incomplete type
uinet_if_netmap_host.c: At top level:
uinet_if_netmap_host.c:364:75: error: ‘struct ifreq’ declared inside parameter list [-Werror]
uinet_if_netmap_host.c: In function ‘if_netmap_ethtool_set_discrete’:
uinet_if_netmap_host.c:368:5: error: dereferencing pointer to incomplete type
uinet_if_netmap_host.c: In function ‘if_netmap_set_offload’:
uinet_if_netmap_host.c:396:15: error: storage size of ‘ifr’ isn’t known
uinet_if_netmap_host.c:396:15: error: unused variable ‘ifr’ [-Werror=unused-variable]
uinet_if_netmap_host.c: In function ‘if_netmap_set_promisc’:
uinet_if_netmap_host.c:447:15: error: storage size of ‘ifr’ isn’t known
uinet_if_netmap_host.c:469:19: error: ‘IFF_PROMISC’ undeclared (first use in this function)
uinet_if_netmap_host.c:469:19: note: each undeclared identifier is reported only once for each function it appears in
uinet_if_netmap_host.c:447:15: error: unused variable ‘ifr’ [-Werror=unused-variable]

from libuinet.

pkelsey avatar pkelsey commented on August 22, 2024

Please don't piggyback on existing unrelated issues.

Open a new issue for this and include necessary context for interpreting your problem, such as the specific Linux distribution and version you are using, whether you are using something other than the stock compiler for that distribution, the command you executed and the directory you executed it in.

from libuinet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.