mkilgore / protura Goto Github PK

View Code? Open in Web Editor NEW

88.0 88.0 4.0 2.34 MB

Home Page: https://mkilgore.github.io/protura/

License: GNU General Public License v2.0

Makefile 1.97% Assembly 0.45% Perl 0.27% C 90.95% C++ 1.56% Shell 4.55% GDB 0.22% Vim Script 0.02%

protura's People

Contributors

Stargazers

Watchers

Forkers

abdullah-19 andybui01 jasonwer josementa

protura's Issues

console: Support resizing the screen (for fbcon)

This one is pretty messy. With the advent of fbcon, the console may not have to be 80x25 anymore. And while the basic screen drawing logic can be converted to check rows and columns fields to do the calculations (Just replace SCR_ROWS and SCR_COLS with those, more or less), dynamically allocating the screen information is harder. The big problem is that the console logic also houses the duplicate screens for support multiple VT's, and those need to be resized when switching screens in the console_swap_active_screen() logic. But that logic is all protected by spinlocks, so allocating memory in there is possible but not ideal. And we could have new screen buffers get supplied to console_swap_active_screen() but I don't really like that approach - it seems like leaking implementation information to the caller, basically.

What would likely make more sense is to start protecting the console with a mutex instead of a spinlock - there's a lot of logic going on in there and turning off interrupts for that is probably a bit heavy anyway. But we can't just put in a mutex because we call the console code directly from the kp code, which does so holding it's own spinlock. This is something that might be addressed as part of #27. If we could achieve that change, then we can just allocate the memory while we're holding the mutex and it's largely pretty easy.

The other detail is carrying over existing screen state, but that should be pretty easy - just copy the screen buffers and clip/extend them appropriately to keep things in the same spot (But potentially lose information, if the new screen is smaller).

coreutils: Add chmod

We don't currently have a chmod util, which is problematic because we can't set the execute bit on files via the command line. Unfortunately, chmod is a little harder to implement because our custom arg parser is not quite flexible enough for some of the weirder arguments like -x and +x. This may just require some custom argument parsing to handle the custom argument format.
chmod should at least allow toggling the rwx bits to be 'complete', supporting the -R flag would be a nice extra.

task: Use a `struct ktimer` instead of `wake_up`

This is something that's just left over from the early days. In the scheduler, all the tasks are kept in a single list and during the process of picking a new task wake_up is checked against the current tick to determine if a task can be done sleeping. This is only used for actual time-based sleeping, like usleep(). Checking the value every schedule is not very ideal though, and it's the only actual special case contained in the scheduling code (Because when it was added, we had nothing else to use). Now that we have struct ktimers, we can just embed one in the struct task and use that to wake-up the task after some time has passed and remove the special from the scheduler (And simplify it a decent amount in the process).

PCI: Don't enumerate every bus.

Currently, the code loops over all 256 possible PCI buses to find the valid ones. This is very inefficient, a better solution is to only enumerate bus 0, and then enumerate any other buses we find in the process. Other buses will be connected by a bridge device on the main bus, which we can check for when doing the enumeration (And then enumerate those new buses and repeat the process).

TCP: Add packet retransmission.

Currently, send() does not queue packets for retransmission if we don't receive an ACK. This shouldn't be too bad to add since we already have basic timer support for the delay-ack timer. The basics are:

Add a new retransmission timer, and associated functions in tcp_timer.c.
Modify tcp_send() to make copies of the struct packet entries and place them on a queue on the struct socket. Also arm the retransmission timer.
When the retransmission timer goes off, we make a copy of the packet (or packets?) on the queue and send a new copy. The TCP send logic should take care of updating the ACK/sequence information on it.
In tcp_input.c, when we update snd_una we also need to remove packets from the retransmission queue, and restart the retransmission timer.

There are some complexities in there, though we don't need to worry about all of them:

When snd_una is updated, it's possible they only update it half-way through a packet. We don't really have to worry about this much though, it's perfectly fine for us to resend a packet with partially ACK'd data. We just have to ensure we don't remove a packet from the queue until all the data in that packet is ACK'd.
We should really obey the window information given to us by the other side, and pause sending when the retransmission queue exceeds the window. This likely isn't actually all that hard to do, it just requires a wait queue that we sit on when the window is full. When snd_una is updated, we trigger the wait-queue.

As a note, this can be tested via configuring a TAP device on Linux to drop packets, and watch the packet transmission. Adding actual tests via ktest would be a big win, though potentially a lot harder.

toolchain: Don't have Makefile run git submodule update automatically

Currently, the Makefile runs git submodule update automatically before building the toolchain or newlib. This is convenient in some ways, but very annoying when doing newlib or other toolchain development, because when you go to build either of them after making new commits to newlib or the toolchain, they'll get dropped when the HEAD is reset by git submodule update.

I think it's worth having the Makefile run git submodule update on the first build, since if you forget it won't be obvious what's wrong. But we should touch a file or something to indicate we don't need to run it a second time, or (perhaps) condition it on the existence of the .git directory inside of toolchain. Either way, it would be nice if it didn't run on every build after the first.

block: Gracefully handle bad blocks

We currently don't do anything on bad blocks - in fact, they'll probably lock up the block-cache since the ATA driver may not be able to cope.

Supporting them in the block-cache is likely not that bad, I would picture something similar to the 'bad inode' concept - a special BLOCK_BAD marker that the drive can set. I think the worse parts are the implications it has on the code using the block cache - they need to handle bad blocks and return -EIO. That said it's still doable, but some of our primitives for the block cache may need a bit of work.

Another interesting point is that some file-systems can actually mark off what blocks on the system are bad, so we may need some kind of notification mechanism there for the file-system to be told when a bad-block is encountered, since with the current system the block-cache doesn't know about the file system on top (by design!).

block: Use a spinlock for the block cache, and call block_new() outside of the lock.

This is purely an optimization, but the block-cache gets hit a lot. A mutex here is more expensive then a spinlock, and the time spent holding the lock should be very brief as we only hold it to get a reference to a block - everything else is done outside of the mutex. The only 'expensive' operation that we can't do under a spinlock is the block_new() allocation, and that's easy enough to drop the lock for and free the duplicate if a race happens (Same as with an inode).

sh: Non-interactive shell should not use any job control

This is an interesting detail I stumbled upon. A non-interactive shell session should not use any job control. This also means a non-interactive shell should also not use process-groups (Which makes sense, a non-interactive shell is not a session leader and can't manage them). This is actually likely not all that hard, but requires touching a lot of the job support to turn off some features:

struct job should support having no pgrp associated. All of the non-job-control code doesn't use the pgrp, so this should work out OK (Just set it to -1, likely).
Keep struct job around, since it groups the struct progs together. job_make_forground() needs a new non-interactive counterpart, which does not handle the WIFSTOPPED() and WIFCONTINUED() statuses, along with not setting tcsetpgrp() and such. The non-interactive version should simply keep looping on waitpid() until job_is_exited() returns true for the provided job.

vfs: Enforce user permissions in open()

Currently, while we have UID/GID support for processes, we don't actually check these anywhere to restrict permissions. At the very least, open() should check the mode bits on the selected file/directories and return -EACCESS in the event the check does not pass.

userspace: Add dynamic-linking support

Requires #14
Dynamic linking is not necessarily all that complex, though wiring everything up will likely take a lot of fiddling. The approximate steps are (After we have a userspace mmap):

Write the dynamic loader. It does approximately the same thing as binfmt_elf, reading the ELF file and using mmap to do the heavy lifting. There's still some complexity that needs to be figured out in relation to the relocation information. It also needs to be statically linked (obviously...) and might(?) not be able to use libc, though if it is statically linked there might not be an issue there.
Wire up the toolchain (gcc) to use the dynamic loader by default.
Configure things to compile without using static linking (Some of the extras like ncurses need to have their builds adjusted).

TCP: Using close() or shutdown() does not terminate the TCP connection.

The semantics of close() and shutdown() for a TCP socket are unfortunately pretty unclear. The big issue is that there's an impedance mismatch between the TCP design/RFC and BSD sockets. TCP requires a two-way close where the connection cannot close until both parties choose to do so. If you choose to close your side, the TCP connection exists in a half-closed state until the other party closes their end. This makes some sense, but does not map well onto your typical BSD socket close(), which technically closes the socket for both read and write, even though there is no way to force the other computer to close their TCP connection and stop sending data. But any data they send after that point will simply be lost to the void.
The exact behavior still requires some investigation, but from what I've found the behavior basically boils down to a match how a pipe() works in this situation as closely as possible. For a pipe, if the reader closes their end and the writer tries to write to the closed pipe, they get an -EPIPE along with a SIGPIPE. Conversely, if the write closes their side, the reader can keep reading just fine but will eventually receive an EOF in the form of read() returning 0.
TCP sockets work the same way - if the other side closes their TCP connection first, then the TCP enters the half-open state and while you can still write to the socket just fine, eventually it will return 0 when you read from it, indicating EOF and that the other side is done. If you then close() your side at this point, the other half of the TCP close will happen and a proper TCP close will happen. Conversely, you can opt to close your side first via the use of shutdown(SHUT_WR). By shutting the socket for writing, after all the packets are transmitted the other side will see an EOF. You can then read from the other side until you get an EOF. It gets a bit messy here, but once you get the EOF the TCP connection is dead (You already closed your side, and the other side just closed theirs) and close() doesn't actually have much of anything to do for the TCP stuff.
The above is the happy case. In those cases, both sides successfully sends a FIN and then waits for the other to do the same. A FIN results in an EOF to the socket reader. The unhappy case is when not all data will reach the other side due to a bad closing sequence. Ex. You call close() or shutdown(SHUT_RD), and then the other side (which is not affected by these commands and still has an open connection) attempts to write some new data to the connection (which you will never process since you already closed the socket). In this situation (where a pipe would trigger a SIGPIPE) we send a RST to kill the connection (And then go into TIME_WAIT, I believe(?)). There are other cases that can trigger this behavior as well.
Another case is calling close() before the other side has shutdown their connection, which I'm not quite sure of the best/correct way to handle yet. We cannot wait forever for the other side to send us a FIN, but we also don't want to just unconditionally close using a RST rather than the correct process.

kernel: Product a core dump when a user program crashes

Currently, there really isn't much in the way of support for debugging userspace programs, which does make working on them a bit painful at times. We do support printing a stack trace of the userspace program if/when it crashes, which is definitely very useful, but producing a coredump file that could then be loaded in gdb on a separate system is probably the next step we want to take.

The actual format needs some research, but conceptually it doesn't seem too hard. The coredump format we'll want to use is ELF, and then we product a segment inside the ELF file likely per struct vm_map in the process address-space. The entire thing can likely be based off of the struct address_space rather than processes themselves.

event: Support poll on the event device

This was mostly just an oversight - event does not support poll because it wasn't needed at the time. This should be pretty trivial to implement - take the event's spinlock, check for any queue'd events, and call poll_table_add() with the event's wait-queue.

kp: Errors when writing to kernel log can deadlock the kernel, cause other problems

To give some background on this issue, back when the kernel was first started I wrote a "basic_printf" implementation with an abstraction over the actual output of printf - it would call provided putchar and putnstr functions to do the actual output. This implementation is really nice from the standpoint of not having to provide a string buffer in advance - the printf output is effectively streamed to the output as it gets created. It also can be easily used to create an snprintf function, by simply tying the output to a string buffer.

However, in hindsight, now that we support multiple kp output's (including one that just logs to an in-memory buffer) it's not that great of an approach. The format string and arguments have to be parsed per kp output, and if there is an error the locking around writing to the kp outputs (and kp as a whole) can lockup the kernel and prevent the panic stack trace from printing.

Fixing this is actually fairly straight forward - put a limit on the length of a kp string, use snprintf to write to it (Which can be done without holding any locks at all, just have per-cpu buffers), and then write that string to all of the outputs. That allows the outputs to prevent interleaved output while also not having to lock across the entire format string parsing.

ext2: Correctly check ext2 feature flags and abort mount if we don't support them.

Currently, we don't properly check the ext2 feature flags to ensure we support the ones listed. We check for some and print some warnings, but we don't actually do anything with that information. Now that we have a better handle on what we currently do and don't support, we should start utilizing these flags to refuse to mount ext2 drives that don't match the feature flags we support.

This may require some work on the mounting infrastructure as I'm not sure it properly handles mount failures at the moment.

docs: The build documentation should list dependencies that need to be installed on the host system

The full OS (The kernel, along with the cross compiler and various other utilities) has a fair number of build dependencies, most of them brought in from things like compiling the cross compiler or extra utilities. There is an approximate "list" of these build dependencies in the CI.yml file, but the build.md file should list these dependencies directly, along with what parts of the build process requires what dependencies.

block: Re-add ATA DMA support

During the process of completely rewriting the ATA driver, the DMA support was dropped with the intention of adding it back later. In theory, from what I've read, the DMA support should be listed in the IDENTIFY block we get back. Combined with the DMA I/O location being given to us in 4th BAR from the PCI device, we should be able to also make this support fairly generic like the ATA driver is.

kernel: Remove uninterruptable kernel tasks

These are basically a remnant of the times before interrupts were working and "non-interruptible" tasks were the default. At this point it makes sense to remove that cruft and simplify the kernel task creation logic.

block: ACHI support

This one is somewhat low on the priority list because the ATA support we do have covers most stuff already - even if a PC uses SATA, they typically still have an option to expose a typical ATA interface at the IDE PCI class/subclass. Still, AHCI seems like it could be a good block device driver to add, and it would be good to have more than one block device driver.

ext2: Invalid char or block device numbers crash the kernel

This one is pretty simple, char_dev_get() and block_dev_get() can both return NULL in the event the major/minor pair passed to them is invalid, however super.c in the ext2 driver fails to check for NULL and thus blows up when attempting to load those inodes. It should be researched what exactly happens in this situation, but we probably just need a 'no device' file_ops for this situation which returns -ENODEV when open is called.

kernel: Add basic USB support

Currently the kernel has zero USB support - which hasn't yet really mattered, but is definitely something important and would give us access to a variety of new devices. USB memory sticks in particular would be very nice for copying data in/out of an install on real hardware (But that's probably a long ways away). This issue is mostly for investigating the necessary details for USB support and implementing some of the basic building blocks.

keyboard: Repeat rate is not configured, and sometimes too low.

Currently, we don't configure the Keyboard's repeat rate, instead just leaving it as its default. This works fine for qemu, where the default is pretty good, but on virtual box it's fairly slow and a little annoying.

I believe you can configure the repeat rate via talking to the keyboard controller, but how to do that needs to be investigated. A sane default also needs to be picked, though a quick google shows Linux just sets it as high as possible (and the docs suggest the value is at least somewhat keyboard dependent...).

As a bonus, I believe there are escape sequences for turning the repeat rate on and off that could be supported in the VT100 layer, and also we could expose setting the repeat rate via a syscall or proc node (We don't have a direct keyboard node, but we could add one and allow setting the repeat rate via an ioctl...)

kernel: Add APIC and SMP support

Currently, we use the 8259 PIC, which works fine but cannot be used for SMP. To support SMP, we have to make use of the APIC (The IOAPIC and LAPIC). This by itself will require some messing with the lower-level x86 stuff, and perhaps creating a few new abstractions to support both APIC and 8259 PIC, but in general it should be doable with what we already have (kmmap can be used to access the memory-mapped registers for the APIC).

Unfortunately, once we implement APIC support, the PCI interrupt numbers no longer work, instead they are mapped to the INTA, INTB, INTC, and INTD lines, which requires reading the APCI tables to figure out which maps to what hardware interrupt, and also apparently requires parsing some other table to figure out if the BIOS has remapped the PCI devices onto different interrupt lines. And dropping PCI support for the moment is arguably worse than not supporting the APIC/SMP.

sh: Add variables and more shell syntax, likely rewrite parser.

The existing shell is not bad, but it uses a very simple hand-written parser and lexer and there is no intermediate representation - it just turns the commands directly into programs and jobs. This is unfortunately not very flexible or extendable to adding things like for, which require much more complex parsing (Where the inner parts of the for have to be reevaluated every loop, and for command itself involves multiple commands).
The ending conclusion is likely a completely rewritten parser and lexer, along with a new intermediate representation for commands, which can be used to start/reevaluate commands as necessary. This will largely just be a parse tree, but shell syntax is pretty convoluted so it's not the easiest thing to get right. A bison/flex parser/lexer would be a fun choice, but may not be the easiest thing to parse shell syntax with - a hand-written recursive-decent parser may do a better job. We really only need a bare minimum of functionality though, not a full set of commands, so it may be good enough. The big things I'd like to see support for:

Local shell variables (we only support exported env variables ATM)
Local environment variables for commands (Setting X=Y with a command)
Separating commands with newline or ;
Support for if
Support for for
Support for while
Support for $() command execution
Support for $? and $!
Support for && and ||

OS: Dynamically create the /dev entries at runtime

Currently, all the /dev notes are just statically allocated onto the FS. This isn't terrible, but it does need a lot of updating and isn't the greatest solution. It leads to situations where device nodes might exist that are not necessary (Ex. A node for every possible partition and disk). I think there's two general approachs that we could take:

A devfs approach - there is a kernel-managed file-system which exposes all the device nodes and is mounted on /dev. This advantage of this approach is likely it's simplicity since it would all be entirely in the kernel. The disadvantage of this approach (from my understanding) is that the kernel would either need to handle the permissions and UID/GID assignments, or rely on userspace to correctly redo them at some later point to match the proper groups.
Have a userspace daemon that receives events from the kernel and creates the proper nodes as they come in. The approach leads to less complexity in the kernel, and the userspace daemon can easily handle assigning the correct groups and permissions.

kernel: Add ARM architecture support

I actually attempted this a few years back and did make some small progress. In general, it's definitely possible, and it should uncover a lot of the more "non-portable" things that are currently sitting in the ./src directory, so adding a new architecture should really improve the kernel as a whole.

ARM itself covers a lot of different devices, so starting with generic support isn't really realistic. That said, the Raspberry Pi (either one version, or all of them) seems to be a good choice of device to focus on first - real hardware is easy to get, and qemu offers built-in support for emulating those devices in particular.

The gist for how this will work is that we add a new ./arch/arm or ./arch/raspi directory into the kernel. We then need to replicate most/all of the interfaces exposed by the ./arch/x86/include that are used by ./src - which covers a variety of low-level things like registering interrupt handlers, page table modifications, task switching, etc. There's only about ~6000 lines of ./arch/x86 code so it's not actually all that much stuff - that may just be a bit of a symptom of the amount of non-portable code in ./src though, we'll have to see.

kernel: Add PTY support

This will get pretty involved, and could be split up. The basic details are:

Each PTY needs an associated struct tty in the kernel. For just starting, it will be much easier to make it a static array with a fixed size. Making the dynamic means we need to start reference counting them, and there's a lot of complexity in ensuring they don't get destroyed while references to them still exist. These will just be mapped to their own major device number, with the PTYs being the minor. I think the existing struct tty_driver interface is sufficient for implementing the master TTY side, but it may need one or two updates, and the struct tty_driver for a PTY needs to be written.
Implement /dev/ptmx and the devpts filesystem. These need some research, but the basics (from my understanding) is that opening /dev/ptmx opens the master for the lowest unopened PTY (Just check them all one by one). When a free PTY is found, a char device node for that major/minor is automatically added as an entry into the devpts filesystem (If it's mounted). The char device node gets its UID/GID settings from the process that opened the master side (Getting these permissions right is most of the point of devpts).
Add the openpty(), forkpty(), and login_tty() calls, implemented via using the above new systems. They could potentially be placed into newlib, but they actually belong in a new libutil library.

init: /etc/inittab is executed in the wrong order.

This is pretty simple - We create a simple linked-list of struct tab_ent structures and then execute them. But we append to the list at the front, which means the list is actually created in reverse order.

We should just do away with the manual linked-list and make use of <protura/list.h> (Which we already provided to userspace). It's doubly linked so we can easily append to the end in O(1), and better yet it's all encapsulated in the list functions so we don't need to worry about re-implement it.

net: Add Unix Domain socket support

This is a somewhat involved change, but not too crazy as it doesn't actually require interfacing with any devices. Unix Domain sockets are similar to two-way pipes, and the implementation would be somewhat similar. The most interesting part will likely be integrating it with the common socket layer we already have - some of it may be a bit too IPv4 specific, though I believe that may already be taken care of.
Unix domain sockets also allow transferring a file descriptor from one process to another. We already allow processes to share the same file descriptor (that happens during a fork()), so this should be pretty straight forward. We don't actually have support for sendmsg and recvmsg though, so syscalls and wrappers would have to be added. That process may also mean retrofitting that onto the existing IPv4 stack (Though that work could potentially be put off until a later time).

getty: Should reset tty settings when it starts.

Currently, getty just assumes the tty is already setup when it acquires it, which is silly (Half the point of getty is to setup the the tty to a sane state). This also means that if something messes up the tty settings and kills the shell, it may be unrecoverable since you may not be able to login. To fix this, getty should reset the terminal settings with a sane set when it starts up, along with allow some parameters to be configured via new arguments (Though in our case, we currently always use the same settings anyway...)

mount: Print current list of mounts when no arguments are passed

Goes along with Issue #4 - when no arguments are passed, mount should query the kernel (likely through /proc/mounts) and print the current list of mounts.

kernel: VGA framebuffer support

The kernel should support using a VGA framebuffer for the terminal, and arbitrary read/write to it. Some quick notes:

We can already request the VGA settings via muiltiboot, so we don't need to worry about setting them ourselves, GRUB can do that.
Our VT layer is designed to be a little flexible, so we should be able to switch in a new VGA backend. The one thing that would likely be necessary is that the struct screen interface will need a new update call, which will tell the VGA system to update the screen with the new buffer contents. The text mode didn't need this because it writes directly to 0xB8000.
We'll need to add a basic ASCII font to the kernel for writing to the screen with.
Beyond getting the text console working, we'll want to expose the framebuffer to userspace for writing arbitrary pixels. That can likely be modeled after Linux's API (Though perhaps strip out some of the more complex stuff, I haven't look at it that closely). But there should be a /dev/fdX that you can open and read/write/ioctl to use the screen. Beyond that, we could start looking at more userspace utilities for creating windows/GUIs or similar, though that's a bit more than this issue needs to address.

kernel: Enforce the execute bit in `execve()`.

Currently, in src/fs/exec.c, check_credentials() never returns -EACCESS. This is largely because #11 is not completely, which makes this very inconvenient since on the command line you would have no way to manually toggle the execute bit. Once #11 is done, we should start enforcing that bit.

mount: support mounting a disk as read-only

Pretty self explanatory, we should be able to pass a flag into mount() and have the super mounted as read-only. We'll want to look at what Linux does/did for this case, it likely requires bubbling this logic up into some of the other FS code in some fashion so that it can check for the read-only mount and refuse write access.

We could also potentially have a read-only block-device flag, which would make the block-device refuse to write dirty blocks and output a warning instead, to ensure the disk definitely doesn't get touched even if the FS layer screws up.

mount: Add -a flag and /dev/fstab support.

Currently, mount does not support the -a flag to automatically mount everything in /etc/fstab. Consequently, the OS also does not support using /etc/fstab, which this change would allow. A few details:

mount currently does not acquire the current mount information from the kernel - this would need to change so that we don't try to remount already mounted filesystems. /proc/mounts does exist and is mostly sufficient, but probably not in quite the right format and might need to be adjusted.
Then, /etc/fstab needs to be parsed and mounted one at a time, cross-referencing against the list of already mounted

mm: Expose `mmap` syscall

Currently, we have kernel-only mmap support via struct vm_map. It supports mapping files and anonymous mappings. This could be exposed into the mmap syscall likely without too much trouble. The only thing we don't currently support is writing to files via mmap (Because loading exes does not require this support). Supporting writing is more complicated because doing it right requires use of a page cache, which we don't currently have (Though it doesn't really need to be that complex to be functional...). This could be put-off until later though, as the main use case right now is userspace dynamic linking, and anonymous mappings for memory allocation.

There is still some complex here involving mmap. In particular, you can mmap overlapping regions and punch holes in existing mappings - basic support could potentially skip this though. There also needs to be some basic logic for finding a hole in the address space large enough for the new mapping, when a fixed address is not given.

block: Support more than one ATA disk and fix partition abstraction

Currently, we only support one ATA disk (Which can have one master and one slave). The reason is mostly because the block-device support is very poor - they are statically allocated by major number, and the partition and block-size information is explicitly tied to the block-device structure. The actual ATA driver is based off of the struct ata_drive and could easily support multiple disks by simply allocating more struct ata_drives and initializing them, but we can't quite get there because we still need the block-device to tell us the partition information.

Looking at the Linux kernel, they take a much better approach - they seem to have a block-device structure per major/minor pair, with the separate struct gendisk representing the partition information and such.

Speaking more about Protura, we likely need to move the small partition logic to act on a different abstraction from the struct block_device (Linux has a struct gendisk that seems to have a similar purpose). Once that happens, we can get rid of the existing information on the struct block_devices themselves and they basically just become a map of major's to drivers. We can then figure out what to do with the actual struct block_devices themselves - Linux allocates a struct block_device per major/minor, and we could do that as well, which would simplify some of our logic at the expense of reference counting and caching and etc.

They seem to make use of the file-system logic to make some of the caching logic simpler, which we could probably do as well. I think the idea would be that the drivers would just create the disk abstraction (holding the partition information), and then request the relevant major/minor block device inode's and fill them in when the disk is detected. But we probably want start with separating the partition information first and then see where it goes.

keyboard: Do something about invalid mod key states after keyboard state changes

This one's pretty simple - when someone turns off the keyboard via keyboard_set_state(), none of the keyboard's "state" like key_pressed_map, led_status, or mod_count are touched. The result is that when you turn the keyboard back on it may be completely screwed up with no way to recover.

On Linux, my understanding is that when a program turns the keyboard back on they also have to feed the current state back into the kernel so that it knows what to expect. IMO this is kinda a messy solution though and a bit annoying for the callers, so I'd rather just not bother with it since this situation is a little obscure anyway. I think we probably have two options here:

Just keep updating the key_pressed_map even when the keyboard is off, then we can attempt to apply the current keyboard state based on the currently pressed keys. The only thing that can't work with this approach is the led_status controlling numlock and caps-lock, they would need a different solution.
Don't bother attempting to fix this, and instead make the magic Print Screen key also reset the keyboard state, so that we can at least recover by pressing that key while nothing else is held.

Because the first approach really can't work 100%, we probably need to do the second approach regardless.

block: Add a way to acquire exclusive access to a block device

Currently, block devices are just statically allocated and can be used by anybody. This in particular affects the file-system drivers because the block device could be messed with while they are using it.
In theory this is pretty simple to do, the biggest thing is probably that we'll need to start reference counting the block devices so that we know how many people are currently attempting to use it (Which would cause exclusive access to fail), or block people while someone is accessing it exclusively. After the referencing counting, just add a flag to the block device structure to mark that it's currently exclusively owned, and return an error instead of the block device from block_dev_get().

fs: Add fsync support

Currently, we only have sync support, which works well but is pretty heavy. fsync() allows syncing a single file to the file system, which should be much less resource intensive. It already has some candidate users in the form of /bin/init, which could use it to sync the init log, and probably some others.

I haven't yet tackled fsync() because struct inode's do not contain a direct list of the dirty struct block associated with them (And we don't need that information, since we hit up bmap() to get a block from a struct inode). That said, we don't actually any of that information if we don't want, we can simply implement fsync() in a file-system specific way. ext2 would be the only one at the moment, and to implement that we simply use the block list attached to the ext2 specific inode (the same that's used by bmap()) and sync every block before syncing the inode itself. We should be able to hold the read lock on the inode the entire time without issue, which will ensure the inode doesn't get resized or etc. during the syncing process.

There probably needs to be a bit of optimization here though as we probably don't want an fsync to also trigger reading the entire file off of the disk, which is what would happen if we just call bread() on every block to then sync it - if the block doesn't exist, it'll reach out to the disk to grab it. That said, in theory, while bread() currently always returns valid blocks, there's not a particular reason it has to do that. Inside bread() the block is already created and synced separately, and the block's contents can't be examined without locking it, so we could have a separate bread() call that does not read the block off the disk and leaves valid unset - and then in fsync we just check for the dirty flag.

The above approach would still trigger all the corresponding struct blocks to be allocated, so perhaps a better approach is a bread() that only returns the block if it already exists in the cache and returns NULL otherwise. Because during fsync() we would hold a lock on the inode we know nobody will be touching these particular blocks so this shouldn't be racy.

x86: boot: Copy multiboot information into the kernel before switching GDT and page directory.

Currently, we simply pass the multiboot header pointer verbatim to cmain. This generally works out ok, at that point the identity map of the first 16MB is still in place, but there's no real guarantee the pointer is valid after that point.
What we should do instead is reserve a spot of static memory in the kernel and copy the multiboot info into that memory on boot (before we setup the GDT or page directory). Then, in cmain we can simply reference the copy, which we will know is in a valid location.
The code for this will live in boot_multiboot.S (and be in assembly). The structure of static memory could be declared elsewhere.

kernel: Redo initialization logic to not require an explicit array

This idea comes from the Linux Kernel, and I've intended to implement for a while now. Similar to how the ktest_module list is detected, the Linux Kernel creates a list of 'initcall's via a separate section that just consists of a bunch of function pointers. Conceptually it's easy - create a pointer to the function in the right section, and on init we loop over all the functions in the section and call them one at a time.

The hard part is really just nailing down the order of the init process and what levels we need - Some initialization steps require other steps to already be complete, so the order needs to be expressed somehow. On Linux, there's a variety of initcall levels you can specify (which places the calls into different sections), and then the initcalls are run in order according to their levels. We just need to figure out what levels make sense and then change the existing init stuff over to the new system.

kernel: Add 64-bit support

Obviously, this is a pretty big card. There's already a (very) optimistic setting in the Makefile for BITS that doesn't do anything, but presumably we set that to select 64-bit. Ideally, we keep the same x86 architecture and just have combined 32-bit and 64-bit support under it (Which is what Linux does). That said, there's a lot of unknowns here, and the real challenge is figure out what new things we need to support to have any hope of x86-64 support, which will eventually be documented here. The new calling convention probably means rewriting some of our task bootstrapping logic (not too bad), needs more than two level page table support, and it may require APIC and ACPI support, and other various things that I don't yet know about. There's also likely a host of other things that aren't quite compatible that need to be adjusted (Though I've tried to use compatible types such as uintptr_t, which will hopefully ease the transition).

newlib: Add gethostbyname() support

This is pretty somewhat straightforward, to have a usable IP system we need to offer gethostbyname() (in newlib) to allow looking up an IP from a hostname. The final version should ideally support /etc/hosts along with basic DNS support. This can already be completed, as we have pretty much completely functioning UDP support. The messy part is just parsing the DNS request and getting that right.

slab: Don't call palloc() while still holding the slab spinlock.

This one's a bit embarrassing - the slab code is protected by a spinlock, and it calls palloc_va() while still holding that lock. The result is that even if the caller is not calling from an interrupt/atomic context, palloc_va() is still called from one anyway. Worse, we don't force the PAL_ATOMIC flag in __slab_frame_new(), so palloc_va() might attempt to sleep and hang the whole system.

This is probably not actually that hard to fix - just drop the lock before the allocation, and then reacquire it after. Appending a new frame is always a valid operation regardless of the state of the slab after reacquiring the logic, so the act of dropping the lock shouldn't really matter. The one catch would be potential fragmentation on races - on a race each caller would create a whole new frame and allocate a single object from them, tying up all that extra memory when they could have just used a single frame. If that's a concern we could probably handle this case by simply checking the frames again for empty objects after reacquiring the lock.

poll: Use a `struct delay_work` to trigger the timeout instead of `wake_up`

Currently, the poll() code sets wake_up directly to trigger the task to wake-up when the timeout is hit, but the details of that code is a bit suspect. poll() would be much simpler if it ignored the existing wake_up stuff and used a struct delay_work to just trigger the poll_table_wait_queue_callback() when the timer is up. That would unify all of the waiting logic (the timeout and the struct wait_queues would do the same thing) and remove some of the weird cases in there (Because the wait-queue and the timer would both trigger table.event, so we don't care which happens).

fs: Add FAT32 implementation

Goes along with the 'Add USB support' for the most part - the primary use case for FAT32 is reading USB storage devices formatting with it. It's otherwise somewhat useless, ext2 is much better and we don't have a need to use FAT32 for anything in particular. It would always be good to have another storage-based FS though, regardless.

docs: Add a README.md with a brief description of each documentation file

The docs folder should have a README.md listing all the files in docs and briefly describing each - similar to the list in the main README.md, but with a larger description.

kernel: Separate headers into userspace and kernel headers

The Linux kernel somewhat recently made this change - they now have a separate set of headers for userspace (the uapi headers), and those are included into the kernel headers which the kernel references.
The idea (which is a good one) is that it makes a hard separation between things that can be used in the kernel, and things that can be used by userspace. All userspace programs include the kernel headers, and a fair amount of those headers don't actually directly exclude parts that shouldn't be exposed to userspace (Either because they pollute the namespace, or because they can't be used from userspace).