Comments (14)
I'm amazed! I was wondering whether cabal
might somehow pick a different, stale executable, but I can still reproduce the issue when running the executables directly.
from tasty-bench.
I can reproduce the issue, but the slowdown cannot be attributed to --pattern
option. If you pass -p ers
(so that both readers and writers match), performance is as fast as without -p
. And vice versa: if you do not pass -p
at all, but just comment out bgroup "readers"
, performance does degrade.
from tasty-bench.
Good sleuthing! And what a weird effect!
from tasty-bench.
I was able to get somewhat better results with this patch: https://github.com/Bodigrim/pandoc/commit/95aa335146a650544dc7dd4cbaf8d3cc9bf12c95
I think the rest is a weird code layout / sharing / laziness issue, business as usual (sigh), but I do not know pandoc
well enough to investigate further.
from tasty-bench.
Oof, sorry for bothering you with this. I didn't expect that the problem would be on the pandoc
side!
from tasty-bench.
I can reproduce this too, and I have been trying various things (increasing strictness, forcing values, etc.), to no avail. What's really very odd is that if you use the pattern writers
, everything is slower, but if you use asciidoc
, it's fast again, even though there is no asciidoc reader, so it's only writers that are being tested in this case.
from tasty-bench.
Btw, I tried the patch above and it didn't make the problem go away.
from tasty-bench.
I tried switching to gauge
, and I found that
- I no longer get different results depending on whether I use the pattern
writers
- I get longer run times than reported by tasty-bench (~ 9.3ms for the asciidoc writer, as compared to 5.0 or 6.9 with tasty-bench, depending on the pattern used)
This, and the fact that I couldn't get the funny behavior to go away by increasing strictness or other changes in the benchmark suite, lead me to believe that this could in fact be an issue with tasty-bench.
from tasty-bench.
Generally speaking, it is expected that executing only selected benchmarks (or the same benchmarks, but in a different order) can affect their measurements. That's because all benchmarks pay a tax for GC, which depends on a global heap layout. More often executing less benchmarks makes them faster, e. g., consider the following scenario:
testData :: String
testData = replicate 1000000 'a'
main = defaultMain
[ bench "length" $ nf length testData
, bench "square" $ nf (^2) 10
, bench "genericLength" $ nf genericLength testData
]
Here the first benchmark allocates a huge amount of heap, which is retained because the third benchmark also uses it. It means that the second benchmark will be really slow: each GC kicking in during its execution is extremely expensive. Now if one rerun this suite with -p square
, so that instead of a huge string only a small thunk is kept in heap, the results will be drastically faster.
It is a bit less expected and more counterintuitive, that executing only a few benchmarks can make them slower. A natural hypothesis would be that --pattern
is expensive, but as discussed above this is not the case.
My (uneducated) guess is that since readers
benchmarks involve corresponding writers
as well (e. g., commonmark
), there is something funny with sharing going on. Like, there is a thunk, which is referenced from both bench groups; as long as the second group is present, GC will never prune it. But once readers
disabled, GC eagerly prunes this thunk, causing its reevaluation and slowing down writers
. Dunno, it's hard for me to tell.
The issue of heap layout plagues all Haskell benchmarks. It was reported for criterion
and while in this particular case gauge
seems stable, it is not immune to it. One workaround is to run each benchmark in a separate process. If you are interested in exploring this path, I can come up with a Bash incantation.
Why am I blaming heap layout and GC for this conundrum? If I run pandoc
benchmarks with -p ers +RTS -s
, I see 50% of time spent in GC (this is already quite bad to achieve stable results from benchmarking). But if I run with -p writers +RTS -s
, time spent in GC grows to 60%! I cannot really attribute this change to anything inherent to tasty-bench
: we still use a pattern, and readers
/ writers
are relatively long (several ms), so it's not like tasty-bench
bookkeeping can eat comparable resources.
Now, if I increase nursery size with -A256m
, the problem is gone, at least on my machine: -p writers +RTS -A256m
produces the same measurements as just +RTS -A256m
. That's the reason why I think that the issue is caused by RTS and heap layout and not by tasty-bench
.
I suggest running benchmarks with increased nursery +RTS -A256m
to eliminate this kind of noise.
(Upd.: I copied -A256m
from pandoc's cabal.project
, in fact something like -A32m
could probably do as well)
from tasty-bench.
Thanks for the explanation and the observations -- and also for the suggestion!
I will try it (and maybe move back to tasty-bench, which I like).
from tasty-bench.
Reporting back: using +RTS -A256m
for the benchmarks does fix things. (The effect of --pattern
disappears, all the timings get shorter, and the timings from gauge and tasty-bench line up.) So I think you correctly diagnosed this issue and it can be closed.
from tasty-bench.
Awesome! It would be great if this advice was documented!
from tasty-bench.
@sjakobi I updated documentation in 2b4d1e4
from tasty-bench.
Looks great! Thank you! :)
from tasty-bench.
Related Issues (20)
- When running benchmarks against a baseline, it is unclear whether the baseline was compared against HOT 1
- Graceful degradation for Windows without chcp 65001
- Misleadling verbalization of speed up / slow down HOT 4
- Think about benchmarking of linear and unlifted data HOT 1
- Even less dependencies? HOT 8
- CSV reporter doesn't show baseline comparison HOT 5
- Benchmarking a memoized function HOT 2
- Estimate standard deviation for memory statistics HOT 2
- Output of the benchmark function is retained in memory HOT 11
- Custom, ad-hoc metrics? HOT 4
- Allow running benchmarks a given number of times. HOT 8
- Recommend mitigations for benchmark instability introduced by GHCs SpecConstr. HOT 5
- Benchmarks sometimes get stuck HOT 3
- Consider benchmarking until GCStats stabilise when given +RTS -T HOT 1
- Excessive inlining may optimize away the function to benchmark HOT 12
- How to remove "All" from every entry HOT 6
- Subtract benchmark baseline HOT 6
- Unhandled resource exception on pure IO action HOT 6
- Print more digits? HOT 6
- Some benchmarks written for `criterion` just complete immediately with `tasty-bench`. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tasty-bench.