Comments (13)
Yep. The performance isn't as bad as it was first reported though, as further testing recently has shown we are on part with, and in some cases better than, nanomsg.
There may be some opportunities to reduce use of the global aio lock, or even break it up. For example, I can see we grab it in aio_abort, but all we do is collect the value of a function pointer -- most likely that can be done without a lock, or by using some kind of atomic instead.
I'm remembering now that the global lock for aios was introduced to support expiration. I had trouble figuring out how to detangle multiple locks that would be required otherwise, because we have just a single global expiration thread.
It may be possible to utilize the capabilities in the poller(s) to support expiration, and thus eliminate some of this, but that requires more careful planning and thought, and research.
from nng.
If you're considering websockets, you might as well give up the question of performance. WebSockets are abysmal for performance, everywhere -- because the specification requires it. (Extra bogus "encryption" layer means unavoidable data copies, etc.)
Having said that, I probably have a different idea about performance than you do. I'm concerned about insane levels of scaling, and super low latencies. The goal is to support single digit microsecond latencies. (We are not there yet.)
For your use case described above, I don't think it much matters which way you go, and I'd go for whatever is easiest for you. Probably the PUB/SUB approach with NNG will be easier, as with vanilla websockets you still have to do your own message framing etc. (Basically the upper layer protocol bits.)
from nng.
the bit of profiling i've done has pointed pretty clearly to contention on the static nni_mtx nni_aio_lk
in aio.c. i need to understand the aio system better to formulate a concrete suggestion, but it would make sense that a global lock for io operations could be a bottleneck.
from nng.
So there are some important things. First off, I've found ways to significantly improve performance, shaving 15-20 usec per operation. This is by eliminating an extra set of context switches by having completions that are already running asynchronously call other completions synchronously. This avoids a pointless set of thread rescheduling hacks.
Second, my own performance tests indicate that nanomsg is still faster when using it's own performance tools. But, I've come to believe that these synthetic benchmarks are actually useless, and that in the real world, nng is probably faster than nanomsg.
There are two reasons for believing this:
First is that almost everyone integrates nanomsg into a poll loop, where they use nn_poll or the NN_RECVFD or NN_SENDFD options to get descriptors that they can integrate into their own poll loop. This means extra system calls before the application gets to know about things. With nng, we can use the aio structure to get notifications via condition variable, bypassing two extra system calls per operation. This should be rather huge. (Note that the synthetic benchmarks don't use these at all.)
Second, nanomsg is inherently single threaded in the backend. This means it does not scale at all, failing to engage multiple cores. For some applications this is fine, but for large numbers of applications this becomes severely limiting. (Worse, nanomsg steals the CPU from the application, by running significant amounts of protocol processing on the application's thread. This leads to faster single threaded performance by avoiding context switches, but it prevents the application from doing anything else useful at the same time.
If your application is inherently single threaded using only blocking nn_send and nn_recv calls, then you will see slightly reduced performance with nanomsg. While these type of applications are common, they are rarely performance sensitive. Far more common in performance sensitive areas are asynchronous application consumers.
from nng.
Things should be much better now... but there is still work to do.
Pollers could utilize multiple threads for increased scalability. (There are some tricky race related considerations though). We also need to do a better job of auto-scaling based on the underlying system (more CPUs == more threads.)
from nng.
Is there a reason this is still open?
from nng.
Probably I should close it and replace it with specific tickets for specific enhancements.
from nng.
That would be a helpful communication.
I'm oscillating back and forth between nng and libwebsockets for my next venture. It's backed by SQLite running in a write only thread and then I've got several readers. My app is mostly inside the SQLite database using Triggers for the workload. My thought is to send messages out using a virtual table of some sort. I like the pubsub stuff in NNG and I like the ease of establishing a web service using LWS. It all comes down to which will do more manageable IPC. Comparing the two different approaches is NOT straightforward. Biggest thing is thinking through the memory and message management scenarios. Anything that I can glean on performance is definitely helpful...
from nng.
libwebsockets helps with the sub protocols. It's pretty interesting. Has some really unique features. The static serving of compressed files directly from inside a zip file. I also like it's stacking of protocol mounting. But your messaging patterns are pretty nifty. I think I'll end up trying both before I decide the best coarse of action...
from nng.
I'm going to close this -- we've broken this up by identifying a bunch of additional work items, and on the way to v1.3 we've actually made more significant strides.
The white hot aio lock is still a problem, but I have been experimenting with ways to reduce that -- more to come later.
from nng.
@gdamore any recent measurements to look at?
from nng.
I've been looking mostly at micro benchmarks on my mac and PC at this point. I can say for some workloads I've seen latencies drop by more than half -- up to 75% in one case -- though that was a somewhat contrived test. The smallest improvement I saw was about 5%.
If the dominant factor in your workload is actually moving the message across the wire, and you're using pipeline or pair (not polyamorous), then you will probably see the smallest benefit.
The pair and pipeline protocols leave the most still on the table. REQ/REP, PUB/SUB, and BUS have the most gains so far. The data copy reductions will improve workloads moving large messages the most. The micro-optimizations and contention improvements will probably show the biggest gains on workload with small messages.
I can tell you that my changes shaved 1-2 dozen microseconds in round-trip latency for typical workloads on my hardware.
The problem with generating "real comparisions" is deciding what workloads to model, and then actually having dedicated hardware to run repeatable benchmarks.
from nng.
Actually this short overview is fine for me, thanks 😉.
from nng.
Related Issues (20)
- Crashed by SIGABRT HOT 2
- websocket transport use nng_stream will cause crashed HOT 5
- leak in xsub0_test during socket close HOT 2
- expire subsystem should be refactored for scale
- Hope you could publish new release package 1.6.0 to conan HOT 1
- Valgrind memory leak HOT 2
- [Question] nng_stream and nng_aio caused a panic HOT 1
- docs should indicate nng_aio_wait must not be used from a callback HOT 1
- Consider pointer based socket (unsafe) for performance HOT 1
- surveyor could be simplified to not use timer
- remove nni_timer HOT 1
- consider refactoring resend timer for REQ HOT 1
- Consider unsafe aio API
- deadlock in websocket listener close
- websocket dialer hang on shutdown
- websocket should send, and wait for, WS_CLOSE frames on shutdown
- Performance followup HOT 4
- Test Failure (nng.tcp6) - Debian s390x platform HOT 4
- Public ID hash API
- Planning for NNG 2.0 HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nng.