vineetgarc / gcc Goto Github PK
View Code? Open in Web Editor NEWThis project forked from riscvarchive/riscv-gcc
License: GNU General Public License v2.0
This project forked from riscvarchive/riscv-gcc
License: GNU General Public License v2.0
This directory contains the GNU Compiler Collection (GCC). The GNU Compiler Collection is free software. See the files whose names start with COPYING for copying permission. The manuals, and some of the runtime libraries, are under different terms; see the individual source files for details. The directory INSTALL contains copies of the installation information as HTML and plain text. The source of this information is gcc/doc/install.texi. The installation information includes details of what is included in the GCC sources and what files GCC installs. See the file gcc/doc/gcc.texi (together with other files that it includes) for usage and porting information. An online readable version of the manual is in the files gcc/doc/gcc.info*. See http://gcc.gnu.org/bugs/ for how to report bugs usefully. Copyright years on GCC source files may be listed using range notation, e.g., 1987-2012, indicating that every year in the range, inclusive, is a copyrightable year that could otherwise be listed individually.
Note: This issue involves an upstream gcc patch (for gcc 13, officially backported to gcc-12) but the issue happens on a gcc-11 backport hence no official bugzilla entry or issue on riscv gcc repo
The gcc change b646d7d ("RISC-V: Inhibit FP <--> int register moves via tune param")to elide FMV instructions can benefit performance on uarches where FMV are costlier and repeated stack spills could potentially be cached. And it is supposed to be codesize neutral - under high register pressure it would replace FMV.x.d/FMV.d.x. pair with FLD/FST pair.
However with a gcc 11 backport of above, we see regression on SPECFP2017 benchmark 519.lbm (dynamic instructions count).
The additional stack spills are obviously there, but there are additional FLD for loading floating point constants from .rodata - multiple times - and a few times the register hoisting the constant is still live.
336: u2 = 1.5 * (ux*ux + uy*uy + uz*uz);
338: feqs[C ] = (1.0/3.0)*rho*(1.0 - u2);
339: feqs[N ] = feqs[S ] = (1.0/18.0)*rho*(1.0 + 4.5*(+uy )*(+uy ) - u2);
...
codegen BEFORE b646d7d
10cb8: 8345784b fnmsub.d fa6,fa0,fs4,fa6
10cbc: d3b27243 fmadd.d ft4,ft4,fs11,fs10
10cc0: 5b4575cb fnmsub.d fa1,fa0,fs4,fa1
10cc4: 1236f6d3 fmul.d fa3,fa3,ft3
10cc8: d3457dcb fnmsub.d fs11,fa0,fs4,fs10
10ccc: 0b4570cb fnmsub.d ft1,fa0,fs4,ft1
10cd0: 0345704b fnmsub.d ft0,fa0,fs4,ft0
10cd4: 2b4572cb fnmsub.d ft5,fa0,fs4,ft5
10cd8: 3345734b fnmsub.d ft6,fa0,fs4,ft6
10cdc: 2345724b fnmsub.d ft4,fa0,fs4,ft4
10ce0: 1345714b fnmsub.d ft2,fa0,fs4,ft2
10ce4: 9345794b fnmsub.d fs2,fa0,fs4,fs2
10ce8: bc42 fsd fa6,56(sp)
10cea: b42e fsd fa1,40(sp)
10cec: ac36 fsd fa3,24(sp)
10cee: be05 j 1081e <main+0x2e0>
codegen AFTER b646d7d
Note the repeated refetch of const @ LC29 = 1.5 used as multiplier in line 336 of src.
# lbm.c:338: feqs[C ] = (1.0/3.0)*rho*(1.0 - u2);
10c8e: c316f8cb fnmsub.d fa7,fa3,fa7,fs8
10c92: 3c06 fld fs8,96(sp)
10c94: bcc6 fsd fa7,120(sp)
10c96: 8a81b887 fld fa7,-1880(gp) # fld fa7, %lo(.LC29)(a3)
10c9a: c316f8cb fnmsub.d fa7,fa3,fa7,fs8
10c9e: b8c6 fsd fa7,112(sp)
10ca0: 8a81bc07 fld fs8,-1880(gp) # fld fs8, %lo(.LC29)(a3)
10ca4: 28e6 fld fa7,88(sp)
10ca6: f386ff4b fnmsub.d ft10,fa3,fs8,ft10
10caa: b4fa fsd ft10,104(sp)
10cac: 8b86ff4b fnmsub.d ft10,fa3,fs8,fa7
10cb0: 2c46 fld fs8,80(sp)
10cb2: 8a81b887 fld fa7,-1880(gp) # fld fa7, %lo(.LC29)(a3)
10cb6: b0fa fsd ft10,96(sp)
10cb8: c316ff4b fnmsub.d ft10,fa3,fa7,fs8
10cbc: 2886 fld fa7,64(sp) # NOK: could have reused fa7 and fld into fa8
10cbe: 8a81bc07 fld fs8,-1880(gp) # fld fs8, %lo(.LC29)(a3)
10cc2: acfa fsd ft10,88(sp)
10cc4: 8b86ff4b fnmsub.d ft10,fa3,fs8,fa7
10cc8: 38c2 fld fa7,48(sp)
10cca: 8a81bc07 fld fs8,-1880(gp) # fld fs8, %lo(.LC29)(a3)
10cce: a8fa fsd ft10,80(sp)
10cd0: 8b86ff4b fnmsub.d ft10,fa3,fs8,fa7
10cd4: 3c22 fld fs8,40(sp)
10cd6: a0fa fsd ft10,64(sp)
10cd8: 8a81bf07 fld ft10,-1880(gp) # fld ft10, %lo(.LC29)(a3)
10cdc: c3e6f8cb fnmsub.d fa7,fa3,ft10,fs8
10ce0: 2c62 fld fs8,24(sp)
10ce2: b846 fsd fa7,48(sp)
10ce4: c3e6f8cb fnmsub.d fa7,fa3,ft10,fs8
10ce8: 2c42 fld fs8,16(sp)
10cea: a83a fsd fa4,16(sp)
10cec: c3e6f6cb fnmsub.d fa3,fa3,ft10,fs8
10cf0: b446 fsd fa7,40(sp)
10cf2: ac36 fsd fa3,24(sp)
10cf4: b61d j 1081a <main+0x2dc>
...
.LC29:
.word 0
.word 1073217536 # 1.5e
Just before this codegen, there's another such pathetic refetch for a different constant .LC27 1.0 used in lines 339 and beyond.
fld fa7,%lo(.LC27)(s10)
fmadd.d fa3,fa3,fs4,fa7
# lbm.c:336: u2 = 1.5 * (ux*ux + uy*uy + uz*uz);
fld fa7,16(sp)
fsd fs8,104(sp)
fmul.d fs8,fa4,fa4
fmul.d fa4,fa4,fs7
fsd fa3,96(sp)
fadd.d fa3,fa7,fs8
fld fa7,%lo(.LC27)(s10)
fmadd.d fs8,fs8,fs4,fa7
fmadd.d ft10,ft10,fs4,fa7
fld fa7,%lo(.LC27)(s10)
fsd fs8,88(sp)
fmul.d fs8,fs10,fs10
fmadd.d fs8,fs8,fs4,fa7
fsd fs8,80(sp)
fmul.d fs8,fs9,fs9
fmadd.d fa7,fs8,fs4,fa7
fld fs8,24(sp)
fsd fa7,64(sp)
fld fa7,%lo(.LC27)(s10)
fmadd.d fs8,fs8,fs4,fa7
fld fa7,%lo(.LC27)(s10)
fsd fs8,48(sp)
fld fs8,40(sp)
fmadd.d fs8,fs8,fs4,fa7
fld fa7,%lo(.LC27)(s10)
fsd fs8,40(sp)
fmul.d fs8,fs11,fs11
fmadd.d fs8,fs8,fs4,fa7
fsd fs8,24(sp)
fld fs8,104(sp)
fmadd.d fa7,fs8,fs4,fa7
fld fs8,%lo(.LC27)(s10)
fsd fa7,16(sp)
...
.align 3
.LC27:
.word 0
.word 1072693248 # 1.0
I've yet to create a small test case as this only happens with -flto=auto on final link of benchmark.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.