Giter VIP home page Giter VIP logo

Comments (6)

osandov avatar osandov commented on June 2, 2024

Huh, my guess is this is some sort of double fault (i.e., we faulted while handling whatever caused the original stack trace). I'm not sure why we didn't unwind past page_fault(), though. What was the actual cause of the first stack trace? A GPF or something else? And is this kernel using the ORC unwinder or the frame pointer unwinder?

from drgn.

danobi avatar danobi commented on June 2, 2024

What was the actual cause of the first stack trace? A GPF or something else?

Null ptr deref which was fixed in torvalds/linux@2b33d6ffa9e38f344418976b06 .

And is this kernel using the ORC unwinder or the frame pointer unwinder?

Frame pointer unwinder. FWIW this was a ubuntu 5.4.0-1080-aws kernel.

from drgn.

brenns10 avatar brenns10 commented on June 2, 2024

@osandov I'm not sure if this is the same or related issue, but I've seen stack traces which don't have the full expected context as well, at least compared to what crash gives me. For example, here's the same backtrace from both crash and drgn:

crash7latest> bt -c 7
PID: 102485  TASK: ffff978d3b680000  CPU: 7   COMMAND: "ping"
 #0 [fffffe0000161e38] crash_nmi_callback at ffffffff9605ef57
 #1 [fffffe0000161e48] nmi_handle at ffffffff96033ecd
 #2 [fffffe0000161ea0] default_do_nmi at ffffffff96034416
 #3 [fffffe0000161ec8] do_nmi at ffffffff960345f6
 #4 [fffffe0000161ef0] end_repeat_nmi at ffffffff96a0476e
    [exception RIP: mlx5_eq_int+176]
    RIP: ffffffffc1e8d1c0  RSP: ffff9779807c3e50  RFLAGS: 00000046
    RAX: 0000000000000001  RBX: ffff971924e897c0  RCX: ffff9728e0438000
    RDX: 00000000086a225f  RSI: 0000000000000046  RDI: 0000000000000002
    RBP: ffff9779807c3eb8   R8: ffff977980fe78c0   R9: ffff96be22473d80
    R10: 00000000000000ed  R11: 000000000000b741  R12: 0000000000000001
    R13: ffff9722fa380060  R14: 0000000000000404  R15: ffff976cd2205700
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
 #5 [ffff9779807c3e50] mlx5_eq_int at ffffffffc1e8d1c0 [mlx5_core]
 #6 [ffff9779807c3ec0] __handle_irq_event_percpu at ffffffff96103979
 #7 [ffff9779807c3f10] handle_irq_event_percpu at ffffffff96103b22
 #8 [ffff9779807c3f40] handle_irq_event at ffffffff96103bb0
 #9 [ffff9779807c3f68] handle_edge_irq at ffffffff9610779b
#10 [ffff9779807c3f88] handle_irq at ffffffff96032c2b
#11 [ffff9779807c3fb8] do_IRQ at ffffffff96a065a0
--- <IRQ stack> ---
#12 [ffffa3863bc739d8] ret_from_intr at ffffffff96a00bc6
    [exception RIP: dump_stack+65]
    RIP: ffffffff96871690  RSP: ffffa3863bc73a88  RFLAGS: 00000202
    RAX: 000000000000001f  RBX: 0000000000000286  RCX: 0000000000000830
    RDX: 0000000000000007  RSI: 00000000000008fb  RDI: 0000000000000286
    RBP: ffffa3863bc73a98   R8: 0000000000000004   R9: ffffffff97a052e9
    R10: 00078e561ecaf45e  R11: 0000000000000000  R12: 00000000ffffffff
    R13: 0000000000000019  R14: 000000000000001c  R15: 0000000000022970
    ORIG_RAX: ffffffffffffff39  CS: 0010  SS: 0018
#13 [ffffa3863bc73aa0] __csd_lock_wait at ffffffff96131af5
#14 [ffffa3863bc73b20] smp_call_function_many at ffffffff9613269a
#15 [ffffa3863bc73b78] on_each_cpu at ffffffff961327bd
#16 [ffffa3863bc73ba0] flush_tlb_kernel_range at ffffffff96085c4b
#17 [ffffa3863bc73be0] __purge_vmap_area_lazy at ffffffff96239f30
#18 [ffffa3863bc73c08] vm_unmap_aliases at ffffffff9623a403
#19 [ffffa3863bc73c50] change_page_attr_set_clr at ffffffff96081408
#20 [ffffa3863bc73cf8] set_memory_ro at ffffffff960824af
#21 [ffffa3863bc73d18] bpf_int_jit_compile at ffffffff96095870
#22 [ffffa3863bc73d90] bpf_prog_select_runtime at ffffffff961b0fbc
#23 [ffffa3863bc73db0] bpf_prepare_filter at ffffffff96759637
#24 [ffffa3863bc73e00] __get_filter at ffffffff96759961
#25 [ffffa3863bc73e40] sk_attach_filter at ffffffff96759e28
#26 [ffffa3863bc73e68] sock_setsockopt at ffffffff9671cc3c
#27 [ffffa3863bc73ee0] sys_setsockopt at ffffffff96717513
#28 [ffffa3863bc73f28] do_syscall_64 at ffffffff96003ca9
#29 [ffffa3863bc73f50] entry_SYSCALL_64_after_hwframe at ffffffff96a001b1
    RIP: 00007f03f15acc4a  RSP: 00007ffd8c526a78  RFLAGS: 00000203
    RAX: ffffffffffffffda  RBX: 00007ffd8c526ae0  RCX: 00007f03f15acc4a
    RDX: 000000000000001a  RSI: 0000000000000001  RDI: 0000000000000006
    RBP: 00007ffd8c528100   R8: 0000000000000010   R9: 0000000000003736
    R10: 000055c6b42fa0a0  R11: 0000000000000203  R12: 00007ffd8c526b20
    R13: 000055c6b42fa0c0  R14: 0000001d00000001  R15: 000055c6b42fc6bc
    ORIG_RAX: 0000000000000036  CS: 0033  SS: 002b

And from drgn:

>>> prog.stack_trace(102485)
#0  next_eqe_sw (/home/staging/linux-uek/drivers/net/ethernet/mellanox/mlx5/core//eq.c:111)
#1  mlx5_eq_int (/home/staging/linux-uek/drivers/net/ethernet/mellanox/mlx5/core//eq.c:414)
#2  __handle_irq_event_percpu (kernel/irq/handle.c:147)
#3  handle_irq_event_percpu (kernel/irq/handle.c:187)
#4  handle_irq_event (kernel/irq/handle.c:204)
#5  handle_edge_irq (kernel/irq/chip.c:793)
#6  generic_handle_irq_desc (include/linux/irqdesc.h:162)
#7  handle_irq (arch/x86/kernel/irq_64.c:87)
#8  do_IRQ (arch/x86/kernel/irq.c:246)
#9  common_interrupt+0x1c6/0x382 (arch/x86/entry/entry_64.S:590)

I can understand not having the NMI portion... it's just the call stack to get to the point of either kdump or halting a CPU during a crash. But the missing system call portion is a bother, since I do care very much about what the kernel was doing before it received an interrupt :)

Again, not sure if it's the same underlying issue, but it seems similar.

from drgn.

brenns10 avatar brenns10 commented on June 2, 2024

(Of course, the drgn stack trace is much nicer than crash's due to the unwinding of inline functions, which I love to see!)

from drgn.

brenns10 avatar brenns10 commented on June 2, 2024

Now that #179 is merged, I see a nice solution to this. I find that these sorts of "truncated stacks" appear at interrupt/fault boundaries. We usually see a struct pt_regs * variable in the stack trace. Now that' it's easy to automatically find variables in a stack trace, a helper could simply keep a look out for the last struct pt_regs * variable, and if found, it can do prog.stack_trace(pt_regs) to continue the trace.

from drgn.

marxin avatar marxin commented on June 2, 2024

Btw. would it be possible to learn drgn how to print NMI exception stack and IRQ stack?

from drgn.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.