Giter VIP home page Giter VIP logo

rxinu's People

Contributors

robert-w-gries avatar toor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rxinu's Issues

Creating Box wrapper for trait object points to itself instead of trait object

Problem

We see an exception when trying to call scheduler.kill() in process_ret on i686.

In test process!
[try to kill process]

ExceptionStack {
    error_code: 0x0,
    instruction_pointer: 0x14647c,
    code_segment: 0x8,
    cpu_flags: 0x206,
    stack_pointer: 0x40000d3c,
    stack_segment: 0x3
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Page fault while accessing 0x61

I tracked down the issue to an incorrect trait object value. The following gdb session shows that we get the proper pointer to SCHEDULER (with some name mangling):

(gdb) list
102	    let scheduler_ptr: *mut &DoesScheduling;
103	    asm!("pop $0" : "=r"(scheduler_ptr) : : "memory" : "intel", "volatile");
104	
105	    let scheduler = Box::from_raw(scheduler_ptr);
106	
107	    let curr_id: ProcessId = scheduler.getid();
108	    scheduler.kill(curr_id);
109	}
(gdb) n
105	    let scheduler = Box::from_raw(scheduler_ptr);
(gdb) p/x scheduler_ptr
$1 = 0x400010f4
(gdb) x 0x400010f4
0x400010f4:	0x0014b360
(gdb) x 0x14b360
0x14b360 <_ZN72_$LT$rxinu..scheduling..SCHEDULER$u20$as$u20$core..ops..deref..Deref$GT$5deref11__stability4LAZY17h402ae1403f065449E+4>:	0x00000001

scheduler_ptr represents a raw pointer to a &DoesScheduling trait object stored during process creation. We want to create a Box<&DoesScheduling> from this raw pointer called scheduler.

In x86_64, scheduler is a Box<&DoesScheduling> that points to our global SCHEDULER as expected. However, in i686, scheduler is a Box<&DoesScheduling> that points to itself.

i686 behavior

(gdb) p/x scheduler_ptr
$1 = 0x400010f4
(gdb) p *scheduler
$3 = rxinu::scheduling::&DoesScheduling {pointer: 0x400010f4 "\364\020\000", vtable: 0x14af60}

x86_64 behavior

(gdb) p *scheduler
$1 = rxinu::scheduling::&DoesScheduling {pointer: 0x147be8 <<rxinu::scheduling::SCHEDULER as core::ops::deref::Deref>::deref::__stability::LAZY+8> "\001\000", vtable: 0x147a10}

Workaround

The only workaround I've been able to find is removing the let scheduler = Box::from_raw(scheduler_ptr); line for i686 and replacing it with a hardcoded reference to SCHEDULER:

diff --git a/src/scheduling/process.rs b/src/scheduling/process.rs
index cf946da..e0df2fa 100644
--- a/src/scheduling/process.rs
+++ b/src/scheduling/process.rs
@@ -102,8 +102,14 @@ pub unsafe extern "C" fn process_ret() {
     let scheduler_ptr: *mut &DoesScheduling;
     asm!("pop $0" : "=r"(scheduler_ptr) : : "memory" : "intel", "volatile");
 
+    #[cfg(target_arch = "x86_64")]
     let scheduler = Box::from_raw(scheduler_ptr);
 
+    // TODO: there seems to be an issue with i686 and trait objects
+    // We need to investigate more
+    #[cfg(target_arch = "x86")]
+    let scheduler = &::scheduling::SCHEDULER;
+
     let curr_id: ProcessId = scheduler.getid();
     scheduler.kill(curr_id);
 }

Process allocation occurs on the heap?

It seems to me that you are using a Vec as the process stack in the scheduler. Surely this is bad practice, as accessing the heap to allocate Vec's is a relatively slow process compared to stack allocation. Would it be possible to utilise the stack allocator directly rather than using vectors for stacks?

Tracking issue for Minimal Viable Product

It is looking likely that Marquette undergraduates will work on the kernel as a part of Marquette's summer research program. rXinu is mostly stable but more work needs to be done to make it a project viable for new developers.

Tools

It is important to provide tools so that new developers can properly acclimate to the project. We need to accomplish the following:

  • Testing framework
  • Fewer dependencies
  • Removal of grub from suggested workflow

Features

rXinu will need to be a legitimate micro-kernel in order to open up avenues of research. It is not worth much as a project if the students are too limited in what they can contribute.

Minimal Viable Product

Stretch Goals

  • Message Passing
  • Syscalls
  • MIPS target
  • User mode
  • Higher half kernel

Code Readability

Documentation

Documentation is sorely needed in the scheduling module, and rXinu could benefit from more documentation in general.

  • scheduling module
  • ps/2 keyboard driver
  • serial driver

Refactoring

As I learn more of Rust and gain experience, I discover more idiomatic ways to express Rust and simplify the codebase.

  • PS/2 Keyboard
  • Scheduling Component

Setup pre-emption

  • Project 4 - #33
    • include/proc.h
    • include/queue.h
    • system/main.c - A "main program" for testing scheduling.
    • system/queue.c - An implementation of the queue data structure.
    • system/create.c - A partial function for creating a new process.
    • system/ctxsw.S - An incomplete assembly routine for switching process contexts.
    • system/ready.c - A function for adding a process to the ready queue.
    • system/resched.c - The primary scheduling code, equivalent to yield().
    • system/getstk.c - A rudimentary function for dynamically allocating new stacks for new processes.
  • Add PIT device driver for context switch - #44
  • Implement syscalls
    • Create - #33
    • Kill
    • Yield
    • Resume/Suspend
      * [ ] Arbitrary number of arguments
      * [ ] Enter Ring3

Remove usage of static mut from gdt and idt

Ideally, we can use lazy_static and Once to make GDT and IDT both static and non-mut.

The problem is the following compiler error:

We need to make it so that the IdtEntry struct does not cause data races

Using dynamic dispatch on DoesScheduling trait object causes invalid memory reference in rustc

I wanted to change the dynamic dispatch in process_ret() to use a DoesScheduling trait object instead of assuming the scheduler is the CoopScheduler type.

diff --git a/src/scheduling/cooperative_scheduler.rs b/src/scheduling/cooperative_scheduler.rs
index d65b2ab..4a1fbce 100644
--- a/src/scheduling/cooperative_scheduler.rs
+++ b/src/scheduling/cooperative_scheduler.rs
@@ -34,7 +34,7 @@ impl DoesScheduling for CoopScheduler {
         let stack_values: Vec<usize> = vec![
             new_proc as usize,
             process::process_ret as usize,
-            self as *const Scheduler as usize,
+            self as *const DoesScheduling as *const usize as usize,
         ];
 
         for (i, val) in stack_values.iter().enumerate() {
diff --git a/src/scheduling/process.rs b/src/scheduling/process.rs
index 60189a8..2077a17 100644
--- a/src/scheduling/process.rs
+++ b/src/scheduling/process.rs
@@ -98,7 +98,7 @@ impl Process {
 pub unsafe extern "C" fn process_ret() {
     use scheduling::{DoesScheduling, Scheduler};
 
-    let scheduler: &mut Scheduler;
+    let scheduler: &DoesScheduling;
     asm!("pop $0" : "=r"(scheduler) : : "memory" : "intel", "volatile");
 
     let curr_id: ProcessId = scheduler.getid();

However, running this code results in a rustc crash due to an invalid memory reference:

error: Could not compile `rxinu`.

Caused by:
  process didn't exit successfully: `rustc --crate-name rxinu src/lib.rs --crate-type staticlib --emit=dep-info,link -C debuginfo=2 --cfg feature="default" --cfg feature="serial" -C metadata=31a0723d36405193 -C extra-filename=-31a0723d36405193 --out-dir /home/rob/rxinu/target/x86_64-rxinu/debug/deps --target x86_64-rxinu -L dependency=/home/rob/rxinu/target/x86_64-rxinu/debug/deps -L dependency=/home/rob/rxinu/target/debug/deps --extern volatile=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libvolatile-39e0b219e05681b4.rlib --extern x86=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libx86-455753b7dd6b85e3.rlib --extern bitflags=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libbitflags-6b084702002cf111.rlib --extern spin=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libspin-9bd689ac3bbcfdfa.rlib --extern bit_field=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libbit_field-677c54a61c03a33f.rlib --extern linked_list_allocator=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/liblinked_list_allocator-ef23ed764b8b68b8.rlib --extern rlibc=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/librlibc-8eabc116dc6e8246.rlib --extern once=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libonce-2c4225e39ad50031.rlib --extern multiboot2=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/libmultiboot2-542aac65ec57e3e0.rlib --extern lazy_static=/home/rob/rxinu/target/x86_64-rxinu/debug/deps/liblazy_static-870f43b7a19a5d5a.rlib --sysroot /home/rob/.xargo` (signal: 11, SIGSEGV: invalid memory reference)
make: *** [Makefile:71: cargo] Error 101

i686 builds fail due to compiler_builtins issue

Travis failure

It is clearly an external issue. I have filed an issue in compiler_builtins repo for now. Will wait until issue is confirmed to be compiler_builtins issue before tracking here.

Edit: We need to use nightly-2018-04-07 or earlier to test i686

Possible PIT implementation.

I have forked rxinu and implemented a basic PIT driver. The PIT, under this model, operates in Mode 3, using lobyte/hibyte configuration. I have experimented with moving the resched() call into the timer irq, and saw that a couple of test processes ran, printing to screen. However, after that it stopped, which is something I need to debug. Anyway, I thought you might like to check out my most recent few commits and give your opinion on the code so far.

Screenshot of rxinu running with the new model:
image

Ideas about preemption.

I thought I'd just open this issue to share my ideas about preemptive multitasking. Currently, we have a very basic preemptive scheduler, using something akin to the Round Robin scheduling algorithm. A simple way to implement priority scheduling would be to add several different process queues. In pseudocode:

pub struct Scheduler {
    // ...
    process_queues: [RwLock<ProcessList>; n],
}

N.B I have renamed CoopScheduler to Scheduler, since our global kernel scheduler has not been a cooperative scheduler since PIT support was added, realistically speaking.

n here can just be however many process queues we want to have. Ideally, when a process is created, it has a set priority and the scheduler automagically pushes the new process to the correct queue. A helper method checks whether each queue is empty in turn when a rescheduling call is made. If any of the higher process queues are not empty, then the first available process there gets marked as the next process to run. Of course, this process would also have to be marked as ready for the scheduler to be allowed to run it. This in theory should not be too difficult to implement. If such a scheduling algorithm were to be implemented, we would be using Priority-based Round Robin scheduling of a kind.

Add docs directory and rXinu: The Book

The Book should contain the following info:

  • What is rXinu?
    • High-level description
    • Goals
  • Running rXinu
    • Emulators
    • Command line arguments
  • rXinu Design
    • Memory Management
    • Userspace
    • Drivers

Switch to x86_64 crate

Now that i686 support has been temporarily removed, we should move to the rust-osdev/x86_64 crate.

If i686 support is added back in, we should make and use a rust-osdev/i686 library in the kernel.

Changing process stack size causes strange page faults

In #64 I increased the process stack size as a hacky workaround for an issue where we hit a Page Fault after running about a dozen processes.

Error

It looks like the kernel gets into a bad state because the instruction pointer is a bad value.

In main process!

................
Error code: (empty)
ExceptionStack {
    instruction_pointer: 0x0,
    code_segment: 0x8,
    cpu_flags: 0x2,
    stack_pointer: 0x40004730,
    stack_segment: 0x10
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Page fault while accessing 0x0

Changes needed to hit issue

diff --git a/src/main.rs b/src/main.rs
index 23de134..bc5caec 100644
--- a/src/main.rs
+++ b/src/main.rs
@@ -81,7 +81,7 @@ pub extern "C" fn rxinu_main() {
     arch::console::clear_screen();
 
     kprintln!("In main process!\n");
-    syscall::create(created_process, String::from("rxinu_test"));
+    syscall::create(cycle_process_a, String::from("rxinu_test"));
 }
 
 pub extern "C" fn test_process() {
diff --git a/src/task/mod.rs b/src/task/mod.rs
index 9281f74..593cf94 100644
--- a/src/task/mod.rs
+++ b/src/task/mod.rs
@@ -19,7 +19,7 @@ pub trait Scheduling {
 
 const MAX_PROCS: usize = usize::max_value() - 1;
 // TODO: Investigage requirements for size of stack
-const INIT_STK_SIZE: usize = 1024 * 2;
+const INIT_STK_SIZE: usize = 1024 * 1;
 
 lazy_static! {
     pub static ref SCHEDULER: Scheduler = Scheduler::new();

Page fault while trying to deallocate kstack

rxinu code that triggers issue

    fn kill(&self, id: ProcessId) {
            [snip]
            proc_lock.kstack = None;  // This line page faults

Error message

ExceptionStack {
    error_code: 0x0,
    instruction_pointer: 0x13c16f,
    code_segment: 0x8,
    cpu_flags: 0x6,
    stack_pointer: 0x40005dd0,
    stack_segment: 0x6
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Instruction Pointer

0013c160 <_ZN21linked_list_allocator4hole4Hole4info17h2969c3bff7ef9cc1E>:
  13c160:	55                   	push   %ebp
  13c161:	89 e5                	mov    %esp,%ebp
  13c163:	83 ec 10             	sub    $0x10,%esp
  13c166:	8b 45 08             	mov    0x8(%ebp),%eax
  13c169:	89 45 f4             	mov    %eax,-0xc(%ebp)
  13c16c:	8b 45 f4             	mov    -0xc(%ebp),%eax
>>13c16f:	8b 08                	mov    (%eax),%ecx
  13c171:	89 45 f8             	mov    %eax,-0x8(%ebp)
  13c174:	89 4d fc             	mov    %ecx,-0x4(%ebp)
  13c177:	8b 45 f8             	mov    -0x8(%ebp),%eax
  13c17a:	8b 55 fc             	mov    -0x4(%ebp),%edx
  13c17d:	83 c4 10             	add    $0x10,%esp
  13c180:	5d                   	pop    %ebp
  13c181:	c3                   	ret    

Heap allocation doesn't properly detect out of memory error

This problem comes up while testing scheduling.

On x86_64, we can create around 50 processes. If we then start a process cycle where each process creates the other, we hit a page fault and sometimes a double fault.

Error code: (empty)
ExceptionStack {
    instruction_pointer: 0x1116b6,
    code_segment: 0x8,
    cpu_flags: 0x6,
    stack_pointer: 0x40017940,
    stack_segment: 0x10
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Page fault while accessing 0x3fffffff

The instruction pointer points to a BTreeMap related function:

000000000013f130 <_ZN219_$LT$alloc..btree..node..Handle$LT$alloc..btree..node..NodeRef$LT$alloc..btree..node..marker..Mut$LT$$u27$a$GT$$C$$u20$K$C$$u20$V$C$$u20$alloc..btree..node..marker..Leaf$GT$$C$$u20$alloc..btree..node..marker..KV$GT$$GT$5split17h962bd6a453e19508E>:
  13f130:	55                   	push   %rbp
  13f131:	48 89 e5             	mov    %rsp,%rbp
  13f134:	48 81 ec a0 0f 00 00 	sub    $0xfa0,%rsp
  13f13b:	48 89 f8             	mov    %rdi,%rax
  13f13e:	48 8d 8d 60 f1 ff ff 	lea    -0xea0(%rbp),%rcx
  13f145:	c6 85 56 f9 ff ff 00 	movb   $0x0,-0x6aa(%rbp)
  13f14c:	c6 85 55 f9 ff ff 00 	movb   $0x0,-0x6ab(%rbp)
  13f153:	c6 85 57 f9 ff ff 00 	movb   $0x0,-0x6a9(%rbp)
  13f15a:	c6 85 54 f9 ff ff 00 	movb   $0x0,-0x6ac(%rbp)
  13f161:	c6 85 53 f9 ff ff 00 	movb   $0x0,-0x6ad(%rbp)
  13f168:	48 89 bd 50 f1 ff ff 	mov    %rdi,-0xeb0(%rbp)
  13f16f:	48 89 cf             	mov    %rcx,%rdi
  13f172:	48 89 85 48 f1 ff ff 	mov    %rax,-0xeb8(%rbp)
  13f179:	48 89 b5 40 f1 ff ff 	mov    %rsi,-0xec0(%rbp)
  13f180:	e8 1b 25 00 00       	callq  1416a0 <_ZN55_$LT$alloc..btree..node..LeafNode$LT$K$C$$u20$V$GT$$GT$3new17h54747f5e44c2a0b4E>
  13f185:	eb 0e                	jmp    13f195 <_ZN219_$LT$alloc..btree..node..Handle$LT$alloc..btree..node..NodeRef$LT$alloc..btree..node..marker..Mut$LT$$u27$a$GT$$C$$u20$K$C$$u20$V$C$$u20$alloc..btree..node..marker..Leaf$GT$$C$$u20$alloc..btree..node..marker..KV$GT$$GT$5split17h962bd6a453e19508E+0x65>

Add testing framework for the kernel

Integration Tests

We should test every component:

  • Memory
  • Interrupts
  • Scheduling
  • Inter-process Communication

Original Post

We are starting to get into the meat of OS dev, so we'll need some proper testing tools.

What do we want in our testing framework?

  • Unit tests
  • Integration tests
  • Regression tests
  • Testing on actual hardware (stretch goal)

What can be done now?

Unit tests and Regression tests can be implemented without much effort. We already have CI with Travis, and unit tests should be able to run without any dependencies on external code/tools.

What needs to be done?

Regression tests are increasingly vital as the kernel grows more complex. A couple of lines of kprintln! output will not be sufficient to test preemption, inter-process communication, and file systems.

Ideally, we can create a test harness that uses a virtualization crate to run our kernel and perform tests.

KVM/libvirt bindings

rust-x86 looks to use the kvm crate to accomplish their testing. This is a good place to start. See gz/rust-x86#20 for a relevant conversation

A full list of virt crates can be found here

utest crate

Another intriguing option is the utest crate. However, there are limitations and requirements that might not mesh well with this kernel.

Base pointer is zero in the process_ret function and causing PROTECTION_VIOLATION

Error

In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!

Error code: PROTECTION_VIOLATION
ExceptionStack {
    instruction_pointer: 0x2160e1,
    code_segment: 0x8,
    cpu_flags: 0x206,
    stack_pointer: 0x400042d8,
    stack_segment: 0x10
}
InterruptDescription {
    vector: 14,
    mnemonic: "#PF",
    description: "Page Fault",
    irqtype: "Fault",
    source: "Any memory reference."
}

Page fault while accessing 0xffffffffffffffe8

Root Cause

When a process is finished and jumps to the process_ret function, the rbp value is 0x0.

The process_ret function has the following generated assembly:

00000000002160e0 <_ZN5rxinu10scheduling7process11process_ret17hc68f08b81b5fa6b7E>:
  2160e0:	58                   	pop    %rax
  2160e1:	48 89 45 e8          	mov    %rax,-0x18(%rbp)
  2160e5:	48 8b 7d e8          	mov    -0x18(%rbp),%rdi
  2160e9:	e8 f2 f2 ff ff       	callq  2153e0 <_ZN35_$LT$alloc..boxed..Box$LT$T$GT$$GT$8from_raw17hfba216aa9a4cbfcdE>
  2160ee:	48 89 45 f0          	mov    %rax,-0x10(%rbp)
  2160f2:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
  2160f6:	48 8b 38             	mov    (%rax),%rdi
  2160f9:	48 8b 40 08          	mov    0x8(%rax),%rax
  2160fd:	ff 50 20             	callq  *0x20(%rax)
  216100:	48 89 45 f8          	mov    %rax,-0x8(%rbp)
  216104:	48 8b 45 f0          	mov    -0x10(%rbp),%rax
  216108:	48 8b 38             	mov    (%rax),%rdi
  21610b:	48 8b 40 08          	mov    0x8(%rax),%rax
  21610f:	48 8b 75 f8          	mov    -0x8(%rbp),%rsi
  216113:	ff 50 28             	callq  *0x28(%rax)
  216116:	48 8d 7d f0          	lea    -0x10(%rbp),%rdi
  21611a:	e8 d1 60 ff ff       	callq  20c1f0 <_ZN4core3ptr13drop_in_place17h3f6ea748fd98a16cE>
  21611f:	c3                   	retq   

The page fault happens at instruction 0x2160e0 because we attempt to access memory located at -0x18(%rbp), which translates to value stored at (0x0 - 0x18).

Using gdb, we can see that this value is:

(gdb) p/x 0x0 - 0x18
$2 = 0xffffffffffffffe8

Note: This is a Day 1 scheduling error that is being revealed now that our bootloader properly protects this memory location.

Using wrong offset for base_pointer

process
    .context
    .set_base_pointer(stack.as_ptr() as usize + stack.len());

We need to multiply stack.len() by mem::size_of::<usize>() to get the proper offset.

WebAssembly target

Tracking phil-opp/blog_os#368

Motivation

Rust has the ability to compile to WebAssembly and all relevant browsers now ship support for the WebAssembly standard

This WebAssembly target could have several use cases for blog_os

Run the kernel in a browser

Worth it for the cool factor alone :-)

Allow the compilation and running of the kernel at different stages of your blog posts

After you explain a complex topic, such as paging, you generally provide code as a reference. It'd be nice to be able to compile that code within the blog post's webpage and run the full kernel. The reader could even fiddle with the code, re-run it, and see how their modifications affect the kernel.

The Rust by Example website has the functionality I'm talking about.

Experiment with the concept of kernels hosted on browsers.

I know that javascript PC emulators exist, but WebAssembly seems to have more potential in terms of performance and codability.

Extremely portable emulation

Everyone has a browser and everyone could run the kernel regardless of which OS they're using.

Random Notes

Use custom PriorityQueue struct with efficient remove

Currently, the preemptive scheduler does an ugly, inefficient clone() to copy all processes to a new ready_list, excluding the process we want to unready.

The solution is to write our own PriorityQueue implementation that supports a remove() API call.

Processes cannot be retrieved from process list

When scheduling more than 10 processes on i686, the kernel panics after trying to kill a process:

In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!
In test process!


PANIC in /home/rob/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/option.rs at line 891:
    Could not find process to kill

This issue was hit before, and it seemed related to a broken reference to the process list. More investigation is needed.

Mysterious endless spinlock in PIT init when compiling with FEATURES=vga

When building with make run FEATURES=vga, we hit an endless spinlock in PIT init function here

PIT.lock()[0].write(PIT_SET);
(gdb) 
<spin::mutex::Mutex<T>>::obtain_lock (self=0x146b98 <rxinu::device::pit::PIT>)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/spin-0.4.7/src/mutex.rs:169
169	                cpu_relax();

It should be impossible for PIT to already be locked at this point because this line in init() is the first instance where PIT is locked.

Strangely, removing code from a loop lower down in rust_main fixes the deadlock:

diff --git a/src/lib.rs b/src/lib.rs
index dae5ba2..6085fc9 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -59,16 +59,6 @@ pub extern "C" fn rust_main(multiboot_information_address: usize) {
     syscall::create(rxinu_main, String::from("rxinu_main"));
 
     loop {
-        #[cfg(feature = "serial")]
-        {
-            use device::uart_16550 as uart;
-            uart::read(1024);
-        }
-        #[cfg(feature = "vga")]
-        {
-            use device::keyboard::ps2 as kbd;
-            kbd::read(1024);
-        }
     }
 }

This needs more investigation.

  • Check if older versions of nightly compiler work with original code
  • Check for bugs related to file size

Random deadlocks caused by PIT interrupt during allocation

Problem

Now that the timer IRQ handler is calling resched(), any code that is not wrapped in a call to the interrupts::disable_then_restore() function can be interrupted to schedule a new process.

Since we do not support preemption in our kernel yet, we essentially wrap all scheduling code in disable_then_restore(). This prevents our kernel from being interrupted while holding important locks, such as the PIC locks or the process table locks.

However, there is still one remaining issue with deadlocks left. Currently, when processes use the allocator API to create structures like Vec or String, there is a chance that the kernel can interrupt the linked-list-allocator while it is holding the Heap lock.

We need to find a way to disable interrupts before using the allocator API, preferably without vendoring the linked-list-allocator code in this repo.

Possible Solutions

Naive solution

We can wrap every allocator call in a process with disable_then_restore(). This is impractical and poor design.

Use syscall for all memory allocation

I don't know if this will be feasible since the allocator api is invoked automatically by creating a Vec or String. It seems possible but undesirable since Vec and String creation would need to be wrapped to use the syscall.

Wrap the linked-list-allocator

We might be able to wrap the linked-list-allocator with our own allocator that just disables interrupts then calls the allocate/deallocate methods in linked-list-allocator. I think this is our best path forward.

Debugging

Call #4 in the call stack below is where we trigger the deadlock in linked-list-allocator. You can tell it's a deadlock because interrupts stop firing and the same lock check keeps getting executed in a loop.

#4  0x0000000000131ac4 in linked_list_allocator::{{impl}}::dealloc (self=0x40004090, 
    ptr=0x400022d8 "\000", layout=...)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/linked_list_allocator-0.4.3/src/lib.rs:176
    unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) {
        self.0.lock().deallocate(ptr, layout)
    }
Click to expand debug info
(gdb) list
1486	
1487	#[inline]
1488	unsafe fn atomic_load<T>(dst: *const T, order: Ordering) -> T {
1489	    match order {
1490	        Acquire => intrinsics::atomic_load_acq(dst),
1491	        Relaxed => intrinsics::atomic_load_relaxed(dst),
1492	        SeqCst => intrinsics::atomic_load(dst),
1493	        Release => panic!("there is no such thing as a release load"),
1494	        AcqRel => panic!("there is no such thing as an acquire/release load"),
1495	        __Nonexhaustive => panic!("invalid memory ordering"),
(gdb) info stack
#0  core::sync::atomic::atomic_load<u8> (dst=0x147018 <rxinu::HEAP_ALLOCATOR> "\001\000", 
    order=core::sync::atomic::Ordering::Relaxed)
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/sync/atomic.rs:1491
#1  0x000000000014581c in core::sync::atomic::AtomicBool::load (
    self=0x147018 <rxinu::HEAP_ALLOCATOR>, order=core::sync::atomic::Ordering::Relaxed)
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/sync/atomic.rs:316
#2  0x0000000000132d5f in spin::mutex::Mutex<linked_list_allocator::Heap>::obtain_lock<linked_list_allocator::Heap> (self=0x147018 <rxinu::HEAP_ALLOCATOR>)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/spin-0.4.6/src/mutex.rs:167
#3  0x0000000000132d95 in spin::mutex::Mutex<linked_list_allocator::Heap>::lock<linked_list_allocator::Heap> (self=0x147018 <rxinu::HEAP_ALLOCATOR>)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/spin-0.4.6/src/mutex.rs:191
#4  0x0000000000131ac4 in linked_list_allocator::{{impl}}::dealloc (self=0x40004090, 
    ptr=0x400022d8 "\000", layout=...)
    at /home/rob/.cargo/registry/src/github.com-1ecc6299db9ec823/linked_list_allocator-0.4.3/src/lib.rs:176
#5  0x0000000000129964 in rxinu::__rg_allocator_abi::__rg_dealloc (arg0=0x400022d8 "\000", 
    arg1=8192, arg2=8) at src/lib.rs:123
#6  0x000000000011456e in alloc::heap::{{impl}}::dealloc (self=0x40001b70, ptr=0x400022d8 "\000", 
    layout=...)
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/heap.rs:104
#7  0x0000000000111e70 in alloc::raw_vec::RawVec<usize, alloc::heap::Heap>::dealloc_buffer<usize,alloc::heap::Heap> (self=0x40001b70)
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/raw_vec.rs:687
#8  0x0000000000112f05 in alloc::raw_vec::{{impl}}::drop<usize,alloc::heap::Heap> (self=0x40001b70)
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/liballoc/raw_vec.rs:696
#9  0x0000000000141595 in core::ptr::drop_in_place<alloc::raw_vec::RawVec<usize, alloc::heap::Heap>>
    ()
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59
#10 0x00000000001411cf in core::ptr::drop_in_place<alloc::vec::Vec<usize>> ()
---Type <return> to continue, or q <return> to quit---
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59
#11 0x0000000000141b44 in core::ptr::drop_in_place<core::option::Option<alloc::vec::Vec<usize>>> ()
    at /home/rob/.rustup/toolchains/nightly-2017-12-23-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/src/libcore/ptr.rs:59
#12 0x0000000000121095 in rxinu::scheduling::cooperative_scheduler::{{impl}}::kill (
    self=0x148bc8 <<rxinu::scheduling::SCHEDULER as core::ops::deref::Deref>::deref::__stability::LAZY+8>, id=...) at src/scheduling/cooperative_scheduler.rs:84
#13 0x0000000000119703 in rxinu::scheduling::process::process_ret () at src/scheduling/process.rs:108
#14 0x0000000000148bc8 in <rxinu::scheduling::SCHEDULER as core::ops::deref::Deref>::deref::__stability::LAZY ()
#15 0x00000000001489f0 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x0000000000000001 in ?? ()
#18 0x0000000000000000 in ?? ()

Musical notes are printed instead of "IT DID NOT CRASH!" (WIP: To be copied to blog_os)

The initial problem

After implementing interrupts in my kernel, I noticed a strange issue where the kernel did not seem to properly print the stack trace of a page fault.

image of broken exception printing

Narrowing down the issue to printing

This issue is not specific to just printing exception stack frames. I also see the issue when printing "It did not crash!". The issue always causes a page fault cpu exception:

image of cpu exception

The virtual address that caused the exception points to a string that seemed to be in a sane location that should have not caused a page fault. Even weirder was that the page fault came up intermittently.

Garbage musical notes (undefined behavior?)

I played around with the kernel to narrow down where the issue seems to be triggered. The trigger point appears to be the loading of the IDT. To force the issue to appear, I added an #[inline(always)] before the interrupts::init function and added several print methods. I then saw the following strange behavior:

image of weird printing musical notes

Workaround?

I was stumped for several months on how to fix this issue. Nothing I tried seemed to fix it until I tried removing the lazy static definition of the IDT and moving to a static mut.

diff --git a/arch/x86/x86_64/src/interrupts/mod.rs b/arch/x86/x86_64/src/interrupts/mod.rs
index 34954a8..5f900dc 100644
--- a/arch/x86/x86_64/src/interrupts/mod.rs
+++ b/arch/x86/x86_64/src/interrupts/mod.rs
@@ -8,6 +8,7 @@ mod irq;
 
 const DOUBLE_FAULT_IST_INDEX: usize = 0;
 
+/*
 lazy_static! {
     static ref IDT: Idt = {
         let mut idt = Idt::new();
@@ -29,11 +30,15 @@ lazy_static! {
         idt
     };
 }
+*/
+
+static mut IDT: Option<Idt> = None;
 
 static TSS: Once<TaskStateSegment> = Once::new();
 static GDT: Once<gdt::Gdt> = Once::new();
 
 /// Initialize double fault stack and load gdt and idt 
+#[inline(always)]
 pub fn init(memory_controller: &mut MemoryController) {
     use x86_64::VirtualAddress;
     use x86_64::structures::gdt::SegmentSelector;
@@ -64,9 +69,34 @@ pub fn init(memory_controller: &mut MemoryController) {
         // reload code segment register and load TSS
         set_cs(code_selector);
         load_tss(tss_selector);
-    }
 
-    IDT.load();
+        // setup IDT
+        IDT = Some(Idt::new());
+        let idt = IDT.as_mut().unwrap();
+        idt.breakpoint.set_handler_fn(breakpoint_handler);
+        idt.divide_by_zero.set_handler_fn(divide_by_zero_handler);
+        idt.invalid_opcode.set_handler_fn(invalid_opcode_handler);
+        idt.page_fault.set_handler_fn(page_fault_handler);
+
+        /// The set_stack_index method is unsafe because the caller must ensure
+        /// that the used index is valid and not already used for another exception.
+        unsafe {
+            idt.double_fault.set_handler_fn(double_fault_handler)
+                .set_stack_index(DOUBLE_FAULT_IST_INDEX as u16);
+        }
+
+        idt.interrupts[2].set_handler_fn(irq::cascade);
+        idt.interrupts[3].set_handler_fn(irq::com2);
+        idt.interrupts[4].set_handler_fn(irq::com1);
+
+        idt.load();
+        println!("TEST!");
+        println!("TEST!");
+        println!("TEST!");
+        println!("TEST!");
+        println!("TEST!");
+        println!("TEST!");
+    }
 }

With this change, the kernel does not hit the random page fault and weird behavior. Please let me know if this is just hiding the real issue or if this is an actual workaround. I am also suspecting the issue is related to the lazy_static crate instead of this repository.

EDIT: This is a duplicate of #11

UART does not utilize buffers

Xinu Behavior

  • uartWrite
    • write method sends all bytes from given buffer
    • Checks for non-blocking mode then sends bytes to output buffer if available
  • uartRead
    • reads a requested number of bytes
    • reads from UART buffer

Current rXinu behavior

read(nbytes) and write(data_buffer_to_write) functions do not exist in the current API.

Change Needed

Need to add these functions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.