Comments (13)
Good news!
When I change o2 to o1, everything works fine.
If you still need to understand my development environment,
My OS is wsl2,
The (cross-) compiler version is the latest "gcc-arm-10.3-2021.07-x86_64-aarch64-none-elf.tar.xz"
I don’t know how to thank you enough! ^o^/
from rpi4-osdev.
Nice tutorial.
The solution is to add the compiler flag -mstrict-align.
I've now lost count of the number of times I've seen GCC produce code with unaligned access, when the processor doesn't support it (or isn't in a mode where it supports it). The BCM2711, ARMv8 can support unaligned access, but it requires a particular register bit to be correctly set (UNALIGNED_TRP I believe).
Adding this compiler flag results in a working part 6 demo (at least for me), as well as allowing my own code to work :-).
For other targets (e.g. when generating 32-bit ARM object files), the required setting is -mno-unaligned-access.
from rpi4-osdev.
Interesting! I think I saw similar behaviour during the development process.
Can you clarify for me - what is your development environment e.g. OS, (cross-)compiler version etc.?
I'll try and repeat the problem tonight.
These freaky issues popped up more than I would have hoped for, and can be the result of many things - bad compiler optimisation, stack overflows etc.
I'll try and be as helpful as I can :)
Thanks,
Adam
from rpi4-osdev.
If I could make a quick suggestion, try changing the -O2
to -O1
in the compiler flags in the Makefile. If it then works, the compiler optimisation is doing something funky!
from rpi4-osdev.
Really glad to hear it! I still would love to investigate the actual cause, but I'm glad you're up and running :)
from rpi4-osdev.
I also want to understand the real reason,
It's just that I'm still a beginner, and I may not be able to help you.
Can you reproduce this bug now?
What can i do for this?Do you need my assembly file?
from rpi4-osdev.
As a "beginner", you've done well to identify and articulate the problem so accurately. It takes skill to pinpoint an issue like this, so don't underestimate your ability!
I've got everything I need from you, and can quickly spin up an environment that matches yours. Just one question: are you using Ubuntu in WSL?
All I'm short on is time! ;-) Let's see if I can spend an hour or two tonight!
from rpi4-osdev.
Yes, my OS is ubuntu-20.04
from rpi4-osdev.
What's clear to me is that this is a compiler trying to over-optimise. See the Arm gcc output here using -O2
(it starts at the line where initBricks
calls drawRect
in the loop):
81170: 1d fc ff 97 bl 0x801e4 <drawRect>
81174: 40 03 40 b9 ldr w0, [x26]
81178: 84 03 40 f9 ldr x4, [x28]
8117c: 01 04 00 11 add w1, w0, #1
81180: 00 7c b7 9b umull x0, w0, w23
81184: 82 00 00 8b add x2, x4, x0
81188: 9b 68 20 b8 str w27, [x4, x0]
8118c: 41 03 00 b9 str w1, [x26]
81190: 53 d0 00 29 stp w19, w20, [x2, #4]
81194: 73 ea 02 11 add w19, w19, #186
81198: 56 c0 00 f8 stur x22, [x2, #12]
8119c: 5b 50 00 39 strb w27, [x2, #20]
811a0: 7f be 1e 71 cmp w19, #1967
811a4: a1 fd ff 54 b.ne 0x81158 <initBricks+0x58>
811a8: 94 52 00 11 add w20, w20, #20
811ac: 18 13 00 91 add x24, x24, #4
811b0: 9f 2a 02 71 cmp w20, #138
811b4: 60 00 00 54 b.eq 0x811c0 <initBricks+0xc0>
811b8: 15 03 40 b9 ldr w21, [x24]
811bc: e4 ff ff 17 b 0x8114c <initBricks+0x4c>
811c0: f3 53 41 a9 ldp x19, x20, [sp, #16]
811c4: f5 5b 42 a9 ldp x21, x22, [sp, #32]
811c8: f7 63 43 a9 ldp x23, x24, [sp, #48]
811cc: f9 6b 44 a9 ldp x25, x26, [sp, #64]
811d0: fb 73 45 a9 ldp x27, x28, [sp, #80]
811d4: fd 7b c6 a8 ldp x29, x30, [sp], #96
811d8: c0 03 5f d6 ret
Here's the same snippet in the output from clang (which does not reproduce the error you are seeing - it runs just fine):
81698: d3 fa ff 97 bl 0x801e4 <drawRect>
8169c: 28 57 44 f9 ldr x8, [x25, #2216]
816a0: 49 d3 48 b9 ldr w9, [x26, #2256]
816a4: 29 7d 1b 9b mul x9, x9, x27
816a8: 18 69 29 b8 str w24, [x8, x9]
816ac: 49 d3 48 b9 ldr w9, [x26, #2256]
816b0: 29 21 1b 9b madd x9, x9, x27, x8
816b4: 36 05 00 b9 str w22, [x9, #4]
816b8: 49 d3 48 b9 ldr w9, [x26, #2256]
816bc: 29 21 1b 9b madd x9, x9, x27, x8
816c0: 33 09 00 b9 str w19, [x9, #8]
816c4: 49 d3 48 b9 ldr w9, [x26, #2256]
816c8: 29 21 1b 9b madd x9, x9, x27, x8
816cc: 3c 0d 00 b9 str w28, [x9, #12]
816d0: 49 d3 48 b9 ldr w9, [x26, #2256]
816d4: 29 21 1b 9b madd x9, x9, x27, x8
816d8: 37 11 00 b9 str w23, [x9, #16]
816dc: 49 d3 48 b9 ldr w9, [x26, #2256]
816e0: 28 21 1b 9b madd x8, x9, x27, x8
816e4: 29 05 00 11 add w9, w9, #1
816e8: df d6 1b 71 cmp w22, #1781
816ec: 18 51 00 39 strb w24, [x8, #20]
816f0: d6 ea 02 11 add w22, w22, #186
816f4: 49 d3 08 b9 str w9, [x26, #2256]
816f8: 41 fc ff 54 b.ne 0x81680 <initBricks+0x58>
816fc: 73 52 00 11 add w19, w19, #20
81700: e8 07 40 f9 ldr x8, [sp, #8]
81704: 08 05 00 91 add x8, x8, #1
81708: 1f 15 00 f1 cmp x8, #5
8170c: e1 fa ff 54 b.ne 0x81668 <initBricks+0x40>
81710: f4 4f 46 a9 ldp x20, x19, [sp, #96]
81714: f6 57 45 a9 ldp x22, x21, [sp, #80]
81718: f8 5f 44 a9 ldp x24, x23, [sp, #64]
8171c: fa 67 43 a9 ldp x26, x25, [sp, #48]
81720: fc 6f 42 a9 ldp x28, x27, [sp, #32]
81724: fd 7b 41 a9 ldp x29, x30, [sp, #16]
81728: ff c3 01 91 add sp, sp, #112
8172c: c0 03 5f d6 ret
Frankly, working out what's going wrong in the gcc example is a little above my pay grade!
If you figure it out, I'd love to know though... I know my code isn't great, but the compiler shouldn't trip over these simple lines.
Note how much shorter the Arm gcc example is though. Interesting, eh?
from rpi4-osdev.
For comparison, I disassembled the Arm gcc output when -O1
is specified in Makefile (less compiler optimisation - and the thing that worked for you):
80f10: 97fffcb7 bl 801ec <drawRect>
80f14: b9400280 ldr w0, [x20]
80f18: f945bea1 ldr x1, [x21, #2936]
80f1c: 8b000400 add x0, x0, x0, lsl #1
80f20: d37df000 lsl x0, x0, #3
80f24: b8206836 str w22, [x1, x0]
80f28: b9400282 ldr w2, [x20]
80f2c: d37f7c40 ubfiz x0, x2, #1, #32
80f30: 8b224000 add x0, x0, w2, uxtw
80f34: d37df001 lsl x1, x0, #3
80f38: f945bea0 ldr x0, [x21, #2936]
80f3c: 8b010000 add x0, x0, x1
80f40: b9000413 str w19, [x0, #4]
80f44: f945bea0 ldr x0, [x21, #2936]
80f48: 8b010000 add x0, x0, x1
80f4c: b9000817 str w23, [x0, #8]
80f50: b9000c1b str w27, [x0, #12]
80f54: b900101a str w26, [x0, #16]
80f58: 39005016 strb w22, [x0, #20]
80f5c: 11000442 add w2, w2, #0x1
80f60: b9000282 str w2, [x20]
80f64: 1102ea73 add w19, w19, #0xba
80f68: 711ebe7f cmp w19, #0x7af
80f6c: 54fffc61 b.ne 80ef8 <initBricks+0x48> // b.any
80f70: 110052f7 add w23, w23, #0x14
80f74: 91001318 add x24, x24, #0x4
80f78: 11005339 add w25, w25, #0x14
80f7c: 71022aff cmp w23, #0x8a
80f80: 54fffba1 b.ne 80ef4 <initBricks+0x44> // b.any
80f84: a94153f3 ldp x19, x20, [sp, #16]
80f88: a9425bf5 ldp x21, x22, [sp, #32]
80f8c: a94363f7 ldp x23, x24, [sp, #48]
80f90: a9446bf9 ldp x25, x26, [sp, #64]
80f94: f9402bfb ldr x27, [sp, #80]
80f98: a8c67bfd ldp x29, x30, [sp], #96
80f9c: d65f03c0 ret
Interesting how it looks a lot more like the Clang output (if you squint hard enough!).
from rpi4-osdev.
One thing I wonder is whether the compiler optimisation has somehow required the FPU, which we don't enable in our boot code? You might try adding the GCC flag -mcpu=cortex-a72+nofp
to your Makefile and trying again at -O2
. I've done a build and the assembly code differs. I haven't run it up yet to see if it works though...
UPDATE: tried the build and that hasn't solved the issue. Will keep thinking on this one! In the meantime, I've added something to the docs - thanks for pointing this out.
from rpi4-osdev.
Awesome - I'm going to test it myself and then update the Makefiles across the board! What an odd quirk...
from rpi4-osdev.
Tested and updated! Thanks again - that's a great fix :)
from rpi4-osdev.
Related Issues (20)
- Typo - part 8 - "very my" HOT 2
- Typo - part 9 - "hz" HOT 4
- Link - part 11 - Not hypertext HOT 1
- Page number - part 14 - Incorrect page reference on datasheet HOT 1
- Explanation - part 14 - Active LOW HOT 1
- Typo - Part 14 - arp.c - random close comment HOT 1
- Raspberry Pi 4 kernel load address HOT 2
- Building to test on QEMU/KVM HOT 4
- Improve framebuffer drawing -- dual buffering or other technique? HOT 1
- Synchronous Exception at mmio_write() HOT 4
- External vs internal ethernet controller HOT 1
- Where is this stuff documented? HOT 1
- "Incorrect" mailbox protocol in part 5 HOT 2
- Question: USB Support HOT 12
- Timer 3 bar not fully painted in Part 13 (Interrupts) HOT 10
- Various Issues when creating the Makefile HOT 2
- fatal error becuase I apparantely didn't include my header files
- Question: How did you install aarch64? HOT 5
- Pull Up Pull down register is the wrong way around HOT 1
- part9-sound clock registers HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rpi4-osdev.