Comments (28)
I straced both sides and the output is not exactly shocking:
p1:
with call_fetch:
write(4, "\2\ncall_fetch\24\2O\7\23\0\0\0\23\0\2\1+\24\2OO", 29) = 29
select(11, [3 4 5 6 7 8 9 10], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 999997})
read(4, "\2\6result\2\5fetch\24\2O\7\23\0\0\0P", 8191) = 24
with call then fetch:
write(4, "'\24\2O\7\24\0\0\0\23\0\2\1+\24\2OO", 18) = 18
write(4, "\2\5fetch\24\2O\7\24\0\0\0", 15) = 15
select(11, [3 4 5 6 7 8 9 10], NULL, NULL, {10, 0}) = 1 (in [4], left {9, 961472})
read(4, "\2\6result\2\5fetch\24\2O\7\24\0\0\0P", 8191) = 24
p2:
with call_fetch:
select(15, [0 11 12 13 14], NULL, NULL, {10, 0}) = 1 (in [0], left {8, 622846})
read(0, "\2\ncall_fetch\24\2O\7\26\0\0\0\23\0\2\1+\24\2OO", 8191) = 29
select(15, [0 11 12 13 14], NULL, NULL, {0, 0}) = 0 (Timeout)
write(0, "\2\6result\2\5fetch\24\2O\7\26\0\0\0P", 24) = 24
with call then fetch:
select(15, [0 11 12 13 14], NULL, NULL, {10, 0}) = 1 (in [0], left {8, 853315})
read(0, "'\24\2O\7\27\0\0\0\23\0\2\1+\24\2OO", 8191) = 18
select(15, [0 11 12 13 14], NULL, NULL, {0, 0}) = 0 (Timeout)
select(15, [0 11 12 13 14], NULL, NULL, {10, 0}) = 1 (in [0], left {9, 961106})
read(0, "\2\5fetch\24\2O\7\27\0\0\0", 8191) = 15
write(0, "\2\6result\2\5fetch\24\2O\7\27\0\0\0P", 24) = 24
I can get rid of the 0-timeout selects by avoiding select entirely when the workqueue is non-empty, but this makes no difference.
from julia.
On Mac, I see reasonable results. Perhaps something to do with some buffer setting in the linux kernel - one of those /proc entries?
julia> tic();println(remote_call_fetch(2,+,1,1));toc()
2
elapsed time: 0.0008530616760254 sec
0.0008530616760254
julia> tic();println(fetch(remote_call(2,+,1,1)));toc()
2
elapsed time: 0.00111985206604 sec
0.00111985206604
Darwin The-Surfing-Burrito.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386
from julia.
Tried on 64-bit Linux (Opteron - not that it should matter), and see the same 60x slowdown.
Linux neumann.cs.ucsb.edu 2.6.18-8.1.6.el5 #1 SMP Thu Jun 14 17:29:04 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
from julia.
This may help:
http://www.ibm.com/developerworks/linux/library/l-hisock/index.html
from julia.
I'm not sure any of those are the issue. I think this is odder. Maybe one of these Linux-specific TCP bugs? Unfortunately, I could only find the Google cached version.
from julia.
Jeff, I assume you're holding the TCP connection open, right? If so, can you try doing the latter in a loop? If only the first one is slow, then I'm more inclined to finger slow start as the culprit.
from julia.
When I loop it N times it always seems to take exactly N times longer:
julia> tic();for i=1:100;fetch(remote_call(2,+,1,1));end;toc()
elapsed time: 3.9971921443939209 sec
from julia.
And looping the first one is proportionally faster? Can you confirm that the same TCP connection gets used throughout?
from julia.
Looping the first one 100x seems to take only 50x longer. Interesting. It's almost exactly like somebody inserted a sleep call in the second case.
I'm doing all the socket stuff manually so I'm pretty sure there are no new connections. Wouldn't that show up in the strace output?
from julia.
Yeah, it should show up in the strace output since doing anything with opening or closing sockets requires a system call.
from julia.
This problem is officially shitty.
from julia.
I have a new idea for fixing this: implement timer events, and use it to combine messages sent within, say, 2ms of each other into a single write. Then at least on the sending side both cases will have the same strace signature.
from julia.
Ugh. This is such a hack. Not a complaint about the approach really, but it's just so fucked up that this has to be down. Why is the Linux TCP implementation this buggy?
from julia.
I have always had problems with latency on linux, for as far as I can remember. For the last 10 years, one has always had to look for some kernel patches, and /proc entries, and nothing really addresses it.
-viral
On Jun 20, 2011, at 9:42 AM, StefanKarpinski wrote:
Ugh. This is such a hack. Not a complaint about the approach really, but it's just so fucked up that this has to be down. Why is the Linux TCP implementation this buggy?
Reply to this email directly or view it on GitHub:
#45 (comment)
from julia.
It might be a good thing to do anyway; delaying and aggregating messages is pretty common.
from julia.
I think this should be moved to 2.0.
from julia.
OK, get excited. I just tried this and it actually worked:
julia> @time println(remote_call_fetch(2,+,1,1))
2
elapsed time: 0.00174689292907715 sec
julia> @time println(fetch(remote_call(2,+,1,1)))
2
elapsed time: 0.00170516967773438 sec
from julia.
Turns out we can do even better by doing message aggregation only on the requesting side:
julia> @time println(remote_call_fetch(2,+,1,1))
2
elapsed time: 0.00072383880615234 sec
julia> @time println(fetch(remote_call(2,+,1,1)))
2
elapsed time: 0.00075888633728027 sec
from julia.
Which commit fixed this?
from julia.
Not committed yet.
from julia.
This performance problem is back. The fix needs to be restored within the new I/O system.
from julia.
I think we had dropped the distinction between send_msg and send_msg_now when IOBuffer didn't exist. This can probably be restored fairly easily now.
from julia.
See also f9637a2. Threads may be necessary to make this work properly.
from julia.
I used wireshark to capture network packets.
For
julia> tic(); remotecall_fetch(2,+,1,1); toc()
elapsed time: 0.001487489 seconds
0.001487489
the capture showed:
For
julia> tic();fetch(remotecall(2,+,1,1));toc()
elapsed time: 0.040404147 seconds
0.040404147
the capture showed:
Is it possible it is not a network issue? In the second case the response to the first packet is arriving after 0.039 seconds, and the request-response for the second packet hardly takes any time.
Does it have anything to do with creating a RemoteRef in the second case (which is not required in with remotecall_fetch)
NOTE: The term pichat
refers to port 9009 in the images. Apparently 9009 is the standard port for some P2P application called pichat
.
from julia.
This is a really weird bug. Notice the timings below:
julia> tic();a=remotecall(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();tic();fetch(a);toc()
elapsed time: 0.001218872 seconds
elapsed time: 0.038245016 seconds
elapsed time: 0.000710215 seconds
0.000710215
The remotecall_fetch()
immediately after the remotecall()
is slow!
and
julia> tic();a=remotecall_fetch(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();
elapsed time: 0.001811197 seconds
elapsed time: 0.00117068 seconds
elapsed time: 0.001172147 seconds
No such issues here.
from julia.
I was able to fix this before by aggregating messages into a single write
using a timer and an io thread.
On Apr 26, 2013 7:23 AM, "Amit Murthy" [email protected] wrote:
This is a really weird bug. Notice the timings below:
julia> tic();a=remotecall(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();tic();fetch(a);toc()
elapsed time: 0.001218872 seconds
elapsed time: 0.038245016 seconds
elapsed time: 0.000710215 seconds
0.000710215The remotecall_fetch() immediately after the remotecall() is slow!
and
julia> tic();a=remotecall_fetch(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();tic();remotecall_fetch(2,+,1,1);toc();
elapsed time: 0.001811197 seconds
elapsed time: 0.00117068 seconds
elapsed time: 0.001172147 secondsNo such issues here.
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/45#issuecomment-17067872
.
from julia.
I don't notice any of these on the mac.
from julia.
This is a bit of a shot in the dark, but is it at all possible that this is related to #2816?
I believe the discovery there was an unnecessary buffer copy in a very low level function right above the tcp stack.
from julia.
Related Issues (20)
- some `Union` constructors don't throw HOT 2
- SuiteSparse-7.5.1 build is broken (on source build CI)
- Pretty printing of parametric types with method ambiguities can be quite vapid
- REPL completion fails when `propertynames` throws an error
- the `Number` identity constructor isn't general enough
- the `Tuple`-`Tuple` constructor and the `convert` method may return an object not of requested type HOT 8
- julia 1.6.7: union type issue on Windows HOT 6
- student has permissions problem for `libcairo-2.dll` HOT 6
- Clarify safety of `map!(f, x, x)` HOT 2
- Precompiling too many pkgs in parallel crash or throw on memory-limited Windows HOT 2
- [bug] Colon range constructor doesn't support irrational HOT 1
- Call `show_error_hints` unconditionally in `displayerror` instead of forcing every exception's `showerror` to call it explicitly? HOT 1
- 1.10 regression involving a higher-order function calling `^` HOT 1
- sysimage-native-code=no option broken
- Stack overflow during type inference of `LinuxPerf` in VSCode REPL HOT 1
- REPLCompletions spends a lot of time on `statx` syscalls (when crawling filesystem) HOT 15
- Round to Nearest Fraction HOT 3
- Bounds check outside loop affects loop performance
- `fieldcount` and `fieldtypes` mishandle some `Union` types HOT 3
- Regression in broadcast assignment to a `SlowSubArray` on nightly HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from julia.