Giter VIP home page Giter VIP logo

onyx's People

Contributors

heatd avatar jjuran avatar lovemeforareason avatar petershh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

onyx's Issues

Use and pass around iovec_iters in the network stack

The network stack was written pre-iovec_iter, and as such handrolls a lot of the iovec_iter logic. For instance:

  • unix.cpp was written post-iovec_iter and as such creates its own iovec_iter
  • tcp.cpp, udp.cpp, icmp.cpp do not use iovec_iter and handroll the logic

This would probably mean that we would want to pass around a structure different from msghdr, with a iovec_iter instead of a struct iovec.

#93 is related to this. While the conversion doesn't happen, sendmsg can't be used with !IOVEC_USER.

Unify IDE and AHCI ata code

Instead of having everything duplicated, the ATA code should just probe published drives on the ATA bus.

Also, SCSI on top of ATA? Like linux?

Strlen is broken

WORD_SIZE shouldn't be sizeof(size_t)/CHAR_BIT, but only the sizeof.

Convert various fds to implement read_iter and write_iter

Since e7871d3 we have a way to properly implement readv/writev for devices (legacy ->read and ->write are deprecated.). With read_iter and write_iter, writevs and readvs work just like the regular variants (think of the tty readv doing an internal read for an iov, getting data, then hanging for the next... yuck).

Serial input causes panic

Using the CI build of disk-image.img from https://github.com/heatd/Onyx/actions/runs/2907275605 (produced from 2cec04e) and the following QEMU incantation:

qemu-system-x86_64 -drive file=disk-image.img,format=raw,media=disk -boot d -enable-kvm -m 1G -cpu host,migratable=on,+invtsc -smp 4 -vga qxl -device usb-ehci -device usb-mouse -machine q35 -bios /usr/share/qemu/OVMF.fd  -netdev user,id=u1 -device virtio-net,netdev=u1 -serial stdio

Attempting to use the resulting serial console immediately results in a panic on the first key stroke:

Screenshot from 2022-08-23 07-17-34

No further output beyond /bin/login: username: is evident in the serial console.

For additional reference, two builds of QEMU were used with the same results:

  • QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.23)
  • QEMU emulator version 6.2.92 (v6.0.0-8451-gb992cef642)

And since I am using -cpu host as requested, the host CPU is a Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (Q3'15 Skylake, poor old thing is in dire need of replacement).

Replace the spinlocks with better implementations

Our current spinlocks are essentially

struct spinlock { unsigned int word; };

void lock(struct spinlock *s)
{
    while(!cmpxchg(&s->word, 0, 1) cpu_pause();
}

this is alright for raw throughput but is palpably unfair to other less fortunate CPUs.
Two options (and we'll need both) are available:

  1. Ticket locks
    These are pretty much just struct spinlock { u16 next_ticket, u16 curr_ticket; };. Waiters increment next_ticket (thru an atomic fetch_add), then wait for curr_ticket to be == their ticket. This is still cache-awful but at least it's much fairer. Throughput should take a hit, but that's okay. (note: while writing this up, i noticed that a spin_try_lock in this would be tricky, but it's doable with a single 32-bit cmpxchg).

  2. MCS locks
    MCS locks basically make waiters spin on their own cacheline, so it's OPTIMAL for systems with lots of threads and CPUs. The idea here is to essentially make a linked list of CPUs and use percpu accessors to access these separate structs. These are kept percpu and we should keep one struct per level (irqsave, normal (no preempt)).

We likely want both, MCS locks aren't necessarily needed in many cases, e.g riscv machines that have a small-ish amount of CPUs.

NOTE: it's super important to keep the current struct spinlock size, we do not want to bloat up all the various users of spinlocks in the kernel.

Add eBPF support

eBPF (and BPF) is useful to dynamically be able to trace functions. Using this and the current skeleton of nop'ed mcount calls, we'll be able to have really fast and efficient tracing

Finish TCP support

A good start is by finishing up the UDP corking and bringing it to TCP

Implement job control

Sending SIGHUPs to orphaned process groups, setting up the tty's concept of foreground pgrp, controlling terminals, and signals all need to be implemented

/proc support?

Some kernel support for /proc and pseudo filesystems in general is in order, maybe?

At least, dentries need to be invalidate-able.

Default minimal build (without CONFIG_ZSTD) cannot boot

The default builds create a zstd initrd, which fails to decompress at boot time since we don't build the zstd code.

The build scripts should attempt to detect this and fallback to no compression?

Or possibly build zstd by default. It's not too large.

SLAB crashes on network intensive workloads

Using ping -f 127.0.0.1 > /dev/null we can find various allocator-related crashes. Note that these were all found with UBSAN and KASAN enabled.

For instance:

#0  halt () at arch/x86_64/debug.cpp:15
#1  0xffffffff8102dc3e in panic (msg=0xffffffff812efa3d "Assertion %s failed in %s:%u, in function %s\n") at kernel/panic.cpp:129
#2  0xffffffff8102dcab in __assert_fail (assertion=0xffffffff812f3ec7 "!list_is_empty(&cache->partial_slabs)", file=0xffffffff8130473c "kernel/mm/slab.cpp", line=596, 
    function=0xffffffff812fc7e2 "kmem_cache_alloc_refill_mag") at kernel/panic.cpp:135
#3  0xffffffff812df505 in kmem_cache_alloc_refill_mag (cache=cache@entry=0xffff80131cbdd380, pcpu=pcpu@entry=0xffff80131cbdd880, flags=flags@entry=0) at kernel/mm/slab.cpp:596
#4  0xffffffff812de7c2 in kmem_cache_alloc (cache=0xffff80131cbdd380, flags=0) at kernel/mm/slab.cpp:797
#5  0xffffffff812e0379 in kmalloc (size=176, flags=0) at kernel/mm/slab.cpp:1237
#6  0xffffffff8101b22b in operator new (size=18446744071582674336) at kernel/cppnew.cpp:17
#7  0xffffffff810a0918 in vmo_create (size=size@entry=240, priv=priv@entry=0x0 <abi::abi_data>) at kernel/mm/vm_object.cpp:39
#8  0xffffffff810e44eb in packetbuf::allocate_space (this=this@entry=0xffffd000367f9a80, length=length@entry=240) at kernel/net/packetbuf.cpp:54
#9  0xffffffff810e5449 in packetbuf_clone (original=0xffffd000367f9ee0) at kernel/net/packetbuf.cpp:195
#10 0xffffffff811025fa in loopback_send_packet (buf=0xffffffff813a6da0 <buffer_lock>, nif=0x0 <abi::abi_data>) at kernel/net/loopback.cpp:36
#11 0xffffffff810ce114 in ip::v4::send_packet (flow=..., buf=buf@entry=0xffffd000367f9ee0, options=...) at kernel/net/ipv4/ipv4.cpp:347
#12 0xffffffff810c6a1c in icmp::icmp_socket::sendmsg (this=<optimized out>, msg=<optimized out>, flags=<optimized out>) at kernel/net/ipv4/icmp.cpp:310
#13 0xffffffff81128f41 in socket_sendmsg (sock=0xffff80131e783900, umsg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1322
#14 sys_sendmsg (sockfd=<optimized out>, msg=0x556f5b691c20, flags=0) at kernel/net/socket.cpp:1374
#15 0xffffffff811cde12 in do_syscall64 (frame=<optimized out>) at arch/x86_64/syscall.cpp:44
#16 0xffffffff811b9b0f in syscall_ENTRY64 () at arch/x86_64/entry.S:130
#17 0x0000000000000033 in abi::abi_data ()

and

Page fault inside list_remove (with 0xDEB5 aka LIST_REMOVE_POISON)

0xffff804d72890ca0:     0xffff804d72890cd0      0xffffffff812df9a6 <_ZL17kmem_free_to_slabP10slab_cacheP4slabPv+166>
0xffff804d72890cd0:     0xffff804d72890cf0      0xffffffff812dd898 <_ZN10quarantine3popEv+40>
0xffff804d72890cf0:     0xffff804d72890d30      0xffffffff812ddb98 <_ZN10quarantine5flushEv+120>
0xffff804d72890d30:     0xffff804d72890e50      0xffffffff810b2bb1 <_Z15page_do_reclaimP12reclaim_data+257>
0xffff804d72890e50:     0xffff804d72890fd8      0xffffffff8109fd60 <_ZL10pagedaemonPv+1168>
0xffff804d72890fd8:     0xffff804d72890ff8      0xffffffff811cec26 <_ZN3x868internal19kernel_thread_startEPv+70>

Are these triggered because of memory pressure + the KASAN quarantine? I can't tell.

Switch neighbours to queue pending packets

Currently we're (naively) waiting for ARP/NDP replies, which crashes the kernel if we do it on the bottom half.
Linux queues pending packets on arp_queue and dispatches them when switching a neighbour to NUD_REACHABLE.

dentry_wait_for_pending deadlock

There's a deadlock related to dentry_wait_for_pending. Couldn't get much more info than that.

Shower thoughts:

  • Lets say i'm going to create a file (dir inode write locked), do a lookup, find a pending dentry
  • The pending resolver tries to get it in a shared lock
    This sounds like a problem?

Stability of Onyx booted off hard drive?

I think it would be nice to have a copy of Onyx installed on a (virtual) hard drive. It would be useful for e.g. porting userspace stuff.

But IIRC there are some stability issues in such setups. What are those issues?

dcache tests

The dcache is very complex and could use some in-kernel unit tests to test things hard to reach from userspace (i.e not related with namei)

Hunt for sleep under no-preemption space

Add a is_preemption_enabled() in mutex_lock(), wait_queue stuff, etc. Should find some bugs over the kernel, usually I trigger some of them when stressing the kernel.

Better kernel crash call traces?

Right now, only addresses are printed in an event of crash, so one needs to run llvm-symbolize on those to get names of the functions involved. I think it would be good if this data was printed automatically. I'm thinking of linux's call trace as an example.

C++ mangled names may complicate things though.

Efficient stat counters

We have a strong need for efficient, percpu stat counters that do not require expensive atomics in fast paths. This may require a percpu memory allocator.

See the Paul McKenney perfbook for more details or ideas

[x86] Deal with non-zero BSP APIC ids

Some systems have CPUs that do not/may not have APIC ids with 0. Case in point was AMD 15h.

Check IO APIC redirection stuff (surely broken) and MSI stuff (probably broken) for this.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.