Hi,
I'm not able to observe the performance benefit due to propeller toolchain for the included test program (main.cc, callee.cc). Followed the steps given in Propeller_RFC.pdf.
High level observations:
- Elapsed time doesn't show any improvement.
- cycles and instruction, branch mispredicts are almost same
- overall cache-misses are lower but L1-icache-load-misses are similar
$ time ./a.out.orig.labels 1000000000 2 >& /dev/null
real 0m21.094s
user 0m20.489s
sys 0m0.604s
$ time ./a.out.labels 1000000000 2 >& /dev/null
real 0m20.357s
user 0m19.908s
sys 0m0.448s
Elapsed time varies from 1 to 5%.
Perf data
$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall ./a.out.o
rig.labels 1000000000 1> /dev/null
Performance counter stats for './a.out.orig.labels 1000000000':
80,231,347,233 cycles (66.67%)
243,314,361,618 instructions # 3.03 insn per cycle (83.33%)
22,522 cache-misses (83.33%)
2,644,077 L1-icache-load-misses (83.33%)
20,400,061 br_misp_retired.all_branches (83.33%)
53,442,616,374 br_inst_retired.all_branches (83.34%)
68,554,744 icache_64b.iftag_stall (57.14%)
21.191516400 seconds time elapsed
Optimized binary
$ perf stat -e cycles,instructions,cache-misses,L1-icache-load-misses,br_misp_retired.all_branches,br_inst_retired.all_branches,icache_64b.iftag_stall ./a.out.l
abels 1000000000 1> /dev/null
Performance counter stats for './a.out.labels 1000000000':
81,446,698,907 cycles (66.66%)
243,218,220,681 instructions # 2.99 insn per cycle (83.33%)
14,907 cache-misses (83.34%)
2,533,002 L1-icache-load-misses (83.34%)
20,571,010 br_misp_retired.all_branches (83.34%)
53,455,580,211 br_inst_retired.all_branches (83.33%)
68,847,492 icache_64b.iftag_stall (57.14%)
21.512644234 seconds time elapsed
The referenced paper doesn't mention the benefit for the included test program. What is expected improvement for the included test?
Please see more details (build, runtime steps, etc.) in following gist.
https://gist.github.com/uttampawar/5407f998bc3f02f58c4b83b0b4dc20fe
Any hint is appreciated.