Giter VIP home page Giter VIP logo

linux-kernel-module-rust's People

Contributors

ahomescu avatar alex avatar bobo1239 avatar frewsxcv avatar geofft avatar luisgerhorst avatar nathansgreen avatar thedan64 avatar thinkier avatar xdevs23 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linux-kernel-module-rust's Issues

Architecture support (tracking issue)

Rust supported architectures: https://forge.rust-lang.org/platform-support.html For our purposes, we don't care much about Tier 1 vs. Tier 2, and even Tier 3 support is probably fine.

Another useful reference is Debian's arch status as of February 2018: https://alioth-lists.debian.net/pipermail/pkg-rust-maintainers/2018-February/001215.html (doesn't include all the arches the kernel supports)

arch description LLVM status Rust status
alpha DEC Alpha removed from LLVM in 2011
arc Synopsys ARC present needs port
arm ARM 32-bit present ARMv5 and up is Tier 2
arm64 ARM 64-bit aka AARCH64 present Tier 2
c6x TI C6000-series VLIW DSPs no LLVM port
csky C-Sky no LLVM port
h8300 Hitachi H8 no LLVM port
hexagon Qualcomm Hexagon DSP, part of Snapdragon SoCs present added July 2019
ia64 Intel Itanium removed from LLVM in 2009
m68k Motorola 680x0 port exists, proposed for merge work in progress
microblaze Xilinx MicroBlaze for FPGAs removed from LLVM
mips MIPS (R3000 / MIPS I and up) present for R4000 / MIPS II and up Tier 2
nds32 Andes AndeStar 32-bit proposed for merge in 2017 needs port
nios2 Altera Nios II for FPGAs removed from LLVM in January 2019
openrisc OpenRISC out-of-tree backend needs port
parisc Hewlett-Packard PA-RISC aka hppa no LLVM port
powerpc PowerPC / POWER 32-bit, 64-bit LE, 64-bit BE all present all Tier 2
riscv RISC-V present Tier 2
s390 z/Architecture 64-bit aka s390x present Tier 2
sh Hitachi SuperH / J2 experimental out-of-tree backend needs port
sparc Sun SPARC 32- and 64-bit present Tier 2 for 64-bit (SPARC v9), needs 32-bit port
um User-Mode Linux (on any userspace arch) N/A might just work?
x86 Intel x86 aka i386 and x86-64 aka amd64 present Tier 1
xtensa Tensilica Xtensa, used on ESP microcontrollers proposed for merge in March 2019 work in progress

Re ARMv4 - the linked issue implies ARMv4 is only a microcontroller for OS-less use, but arch/arm/mach-gemini/ looks like an actual Linux target for ARMv4. There are also a few ARMv4T boards (i.e., with Thumb support). It looks like LLVM wants to emit BLX instructions which switch to Thumb mode, so unclear if plain ARMv4 will work. (Though forcing it to emit non-Thumb code only doesn't sound like it should be too difficult....)

Re MIPS I - see simias/psx-sdk-rs#1. MIPS I has load delay slots, LLVM's codegen would need to be taught about it. https://github.com/impiaaa/llvm-project seems to have a patch.

Some possible approaches for merging into mainline and dealing with architectures with no LLVM support and no realistic plans (e.g., processors no longer manufactured):

Build fails if rustfmt is not installed

bindgen doesn't fail if rustfmt is unavailable, which seems probably reasonable:
https://github.com/rust-lang/rust-bindgen/blob/v0.51.0/src/lib.rs#L1890-L1898
but the build fails because we can't scrape LINUX_VERSION_CODE if it's not on a line by itself. We should do something clearer:

  • scrape the code more robustly (regex crate?)
  • in a subprocess, compile and run fn main() {println!("{}", LINUX_VERSION_CODE);}
  • figure out how to get the variable directly, using some libclang bindings, instead of scraping it
  • scrape it from the C sources
  • throw an error if rustfmt isn't installed

Add RCU bindings

There are some APIs like for_each_process that want you to hold an RCU read lock when calling them. We should support this, at minimum. We probably want to also have full support for RCU-protected data, but maybe we can put that on hold.

Upstream RCU docs: https://www.kernel.org/doc/Documentation/RCU/whatisRCU.txt

The rough idea of RCU is to implement a reader-writer lock-ish pattern where readers are low-overhead, under the assumption that all readers are short-lived - a valid assumption for kernels, where a "reader" is the execution of a single syscall/interrupt; anything that persists state beyond this will count as a "writer". In particular, readers should not be slowed down when a writer is trying to do an update, so ideally reads should happen concurrently. The way this is done is by having writers allocate and fill in a new structure, atomically update the pointer to it, synchronize memory and wait for existing readers to complete, and then deallocating the old structure. (This pointer can either be a single pointer to some single structure, or one of the pointers in a linked list, or something.) That way, existing readers see a consistent view - either the old or new version - and you're guaranteed that once you're finished synchronizing, any readers in progress are only seeing the new version.

(Multiple writers are not protected by RCU in any way. They should use a conventional mutex, or something.)

The basic RCU API in Rusty pseudocode is

extern {
    fn rcu_read_lock();
    fn rcu_read_unlock();
    fn synchronize_rcu();
    fn rcu_assign_pointer<T: Sized>(p: &*mut T, v: *mut T);
    fn rcu_dereference<T: Sized>(p: &*T) -> *T;
}

(Note the latter two in C are macros, into which p is syntactically passed by reference. If we want to bind them directly, we probably want to expose C helpers that take void ** and handle the type safety in the Rust wrapper.)

rcu_read_lock and rcu_read_unlock take no context info, they cause an RCU read-sized critical section for any RCU use in the kernel. Between those you may call rcu_dereference on anything. (That said, there are two other RCU "flavors" in the kernel, rcu_read_lock_bh etc. and rcu_read_lock_sched etc. They appear to have been consolidated last year but I think from the API they should still be treated as separate? In any case, the other flavors are rarely used and we can probably get away with ignoring them for now.)

The rules as I understand them are:

  • You can only read an RCU pointer between rcu_read_lock and rcu_read_unlock (aka "within a read-side critical section"), and you can only read it with rcu_dererence (though it compiles out on all architectures besides Alpha). rcu_dereference doesn't actually do the dereferencing itself, it simply gives you a pointer you can safely deference until rcu_read_unlock.
  • You cannot block in a read-side critical section.
  • You can only write an RCU pointer with rcu_assign_pointer, though you can do so at any point without advance notice.
  • You must keep the old object pointed to by an rcu_assign_pointer valid until you've called synchronize_rcu.

You may nest / overlap read-side critical sections, though. It's a reader lock, you can have more than one of those and writers can't do anything to invalidate the data you're reading until all reader locks are dropped. The only difference with RCU is that writers only wait on synchronize_rcu for existing reader locks, once they start, any new reader locks don't affect it (new reads are now guaranteed to be ordered after any previous rcu_assign_pointers).

I think the rough way to handle this is to

  • create an Rcu<T> pointer type
  • have an empty RcuRead object whose constructor calls rcu_read_lock(), whose destructor calls rcu_read_unlock(), and which is required to read an Rcu<T>
  • have ... some sort of operation to assign to an Rcu<T> that keeps the old value alive. Perhaps it returns an RcuDropGuard<T> object that holds on to a pointer to the old object, and that object's destructor calls synchronize_rcu() and then frees it?

I think it's memory-safe that we're using RAII / destructors here. The worst that can happen is that you deadlock (if you forget an RcuRead) or you keep the old value alive forever (if you forget RcuDropGuard<T>). The unsafety in std::thread::scoped::JoinGuard was that there were operations that were unsafe to do before the JoinGuard was dropped and safe after (like, deallocate data used by the thread). There are no operations that are enabled by either an RcuRead or an RcuDropGuard<T> going out of scope. (The RcuDropGuard<T> itself needs to handle deallocating the old object, for this reason, to guarantee that synchronize_rcu() was in fact called.)

I'm a little less sure about making sure you don't hold the result of reading an Rcu<T> after the originating RcuRead goes out of scope / after you call rcu_read_unlock. Can we use lifetimes here, by giving you a pointer that's constrained to the lifetime of RcuRead? Or do we need to insist on making you pass a callback/lambda?

For the short term, I'd like to at least introduce RcuRead and have it be a required parameter for binding things like for_each_process, and we can handle doing RCU pointer reads and writes in Rust later. (Which also lets us defer figuring out how to integrate the mutex needed for avoiding concurrent writes.)

More interesting RCU docs:

More description of things that RCU does and does not guarantee: https://www.kernel.org/doc/Documentation/RCU/Design/Requirements/Requirements.html

An RCU home page, more or less: http://www.rdrop.com/~paulmck/RCU/

"RCU's first-ever CVE, and how I lived to tell the tale": https://youtu.be/hZX1aokdNiY http://www.rdrop.com/~paulmck/RCU/cve.2019.01.23e.pdf (tl;dr: use-after-free because they locked the wrong RCU flavor, but the solution is a bit more complicated than that)

Make it possible to write modules in stable Rust

Even though you'll need the unstable compiler for a while for this crate, modules themselves should be possible to write in stable Rust, in the same way that libstd itself requires unstable but can be used from stable Rust.

Right now hello-world uses #![feature(alloc)] to use alloc::borrow::ToOwned and alloc::String. Both alloc::borrow and alloc::string are stably re-exported in libstd. So we should do the same thing. Strictly speaking, this permits those modules to change as long as libstd makes an API-compatible facade, but practically that's unlikely to happen and if it does we can just steal whatever facade libstd comes up with (or in the worst case, take a semver hit).

Potentially the way to do this is to re-export a module named std that contains a subset of what's in actual libstd, with the intention of consumers doing

#[macro_use]
extern crate linux_kernel_module;
use linux_kernel_module::std;
use linux_kernel_module::std::prelude::v1::*;

That way, those things from std that do/can exist in kernelspace can just be used like normal.

Do something reasonable in panic_fmt

Currently we just hang, which is not a world record holder for best UX.

In the short term using BUG is probably reasonable, but I'm not positive about how the stack will look. I think because of panic = abort it'll just be invoked in the frame where we panic'd, so it'll all be good?

i386 support

While looking at something else I remembered that the i386 kernel builds with -mregparm=3 -freg-struct-return, i.e., pass the first three arguments in registers (instead of on the stack, the normal ABI) and try to return structures in a register if they fit. I'm not sure if Rust and/or bindgen know how to deal with this.

If they don't, put a note in README.md and move on. If they do, let's add i386 to CI at some point....

Remove explicit list of symbols in build.rs?

It's a pain to have to constantly update it. Is there a reason not to just bind everything -- other than the compile time penalty (which should theoretically happen less often since we'll be editing build.rs less?)

Must use fallible allocation and not panic

The code seems to use Box::new at times, which panics on allocation failure.

This is obviously not acceptable, since the kernel must not panic on allocation failure, so this crate must NOT use Box, Vec, etc. and instead use fallible alternatives.

Add support for registering sysctls

Rough cut concept for the API:

struct MySysctlExample {
    sysctl_table: SysctlTableRegistration,
}

impl KernelModule for MySysctExample {
    impl init() -> KernelResult<Self> {
        let table = SysctlTable::new();
        let sub_table = SysctlTable::new();
        sub_table.register(INTEGER_VALUE, "name", some_parameter_about_valid_values_and_type?);
        table.register_subtable(INTEGER_VAULE, "name", sub_table);
        
        return Ok(MySysctlExample{
            sysctl_table: table.register();
        });
    }
}

Feedback welcome, going to spike on this tomorrow

`hello-world` fails to build

Trying to build hello-world as described here, but I'm getting this error:

   Compiling linux-kernel-module v0.1.0 (/home/vnz/work/playground/linux-kernel-module-rust)
error: failed to run custom build command for `linux-kernel-module v0.1.0 (/home/vnz/work/playground/linux-kernel-module-rust)`

Caused by:
  process didn't exit successfully: `/home/vnz/work/playground/linux-kernel-module-rust/hello-world/target/debug/build/linux-kernel-module-b8153acc5bce9d04/build-script-buil
d` (exit code: 1)
--- stdout
cargo:rerun-if-env-changed=KDIR
cargo:rerun-if-env-changed=CLANG
cargo:rerun-if-changed=kernel-cflags-finder/Makefile

--- stderr
kernel-cflags-finder did not succeed
stdout: -nostdinc -isystem /usr/lib/clang/8.0.1/include -I/usr/lib/modules/5.2.8-1-MANJARO/build/./arch/x86/include -I/usr/lib/modules/5.2.8-1-MANJARO/build/./arch/x86/inclu
de/generated -I/usr/lib/modules/5.2.8-1-MANJARO/build/./include -I/usr/lib/modules/5.2.8-1-MANJARO/build/./arch/x86/include/uapi -I/usr/lib/modules/5.2.8-1-MANJARO/build/./a
rch/x86/include/generated/uapi -I/usr/lib/modules/5.2.8-1-MANJARO/build/./include/uapi -I/usr/lib/modules/5.2.8-1-MANJARO/build/./include/generated/uapi -include /usr/lib/mo
dules/5.2.8-1-MANJARO/build/include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-P
IE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 -no-integrated-as -Werror=unknown-warning-option -mno-sse -mno-mmx -mno-sse2 -m
no-3dnow -mno-avx -m64 -mno-80387 -mstack-alignment=8 -mtune=generic -mno-red-zone -mcmodel=kernel -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCO
NFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_AVX512=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -Wno-sign-compare -fno-asy
nchronous-unwind-tables -mretpoline-external-thunk -fno-jump-tables -fno-delete-null-pointer-checks -Wno-address-of-packed-member -O2 -fplugin=./scripts/gcc-plugins/structle
ak_plugin.so -fplugin-arg-structleak_plugin-byref-all -DSTRUCTLEAK_PLUGIN -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -pg -Wdeclaration-af
ter-statement -Wvla -Wno-pointer-sign -Wno-unknown-warning-option -DMODULE

stderr: clang-8: error: unknown argument: '-fplugin-arg-structleak_plugin-byref-all'
make[2]: *** [scripts/Makefile.build:285: /home/vnz/work/playground/linux-kernel-module-rust/kernel-cflags-finder/dummy.o] Error 1
make[1]: *** [Makefile:1597: _module_/home/vnz/work/playground/linux-kernel-module-rust/kernel-cflags-finder] Error 2
make: *** [Makefile:33: all] Error 2

I'm on Manjaro Linux 18.0.4, with Linux kernel 5.2.8-1.

Add chrdev!

https://lwn.net/Articles/195805/ -- starts at "To that end, here".

There seems to me to be two essential pieces to this API:

  • allocating/obtaining device numbers
  • file ops structure

File Ops is probably an interface that people can implement, which we convert into the explicit vtable, does that sound right? From there it's probably just a matter of going through all the function pointers in it and turning them into interface methods. We'll be responsible for wrapping/unwrapping the private data and managing self's lifetime.

The device numbers stuff is probably just some types with imperative methods, although I haven't dug in deeply

Switch back to panic=unwind

I think we need to unwind when we panic so we appropriately drop things, poison mutexes, etc. I don't really know how to integrate with the kernel's unwinder: the challenge is we need to unwind every Rust function on the stack, even if there's C code in between, but not unwind past the end of the stack. (I'm also not sure the kernel has a meaningful unwinder.) We'll need to define an appropriate / meaningful eh_personality lang item, see comments in src/libpanic_unwind/gcc.rs for details and an example.

Or, I can be convinced that not dropping things and leaving mutexes locked is sound. (Seems suboptimal though)

Run hello-world through tests

It uses String which none of the others do, and it has its own makefile. We should have coverage for it since it's the user-facing example.

Upstream support for building rust modules to mainline kernel

Kees thinks this is a good next step -- what we can do is upstream support for building rust with an arbitrary target file + xbuild, but not include any of our code (including our build.rs).

We think we want to support something that looks roughly like:

obj-m += helloworld.o
helloworld-crates := Cargo.toml

all:
        $(MAKE) -C $(KDIR) M=$(CURDIR)

clean:
        $(MAKE) -C $(KDIR) M=$(CURDIR) clean

Are there things in libstd that we want?

While most of std::collections is reexported from liballoc (see #37), std::collections::HashMap and std::collections::HashSet are not. As far as I can tell, this is because they use std::hashmap_random_keys(), which returns a 128-bit random value. We can do that in kernelspace pretty easily.

Can we manage to export a HashMap and HashSet that people can use? (Will this involve an automated filter-branch of upstream libstd, or is there a better way to do that?)

Also in libstd is std::sync::RwLock and other things in std::sync, which uses OS native sync functionality (e.g. libc::pthread_rwlock_rdlock on UNIX). Can we make those work? (That's probably a whole subproject on its own re kernel sync primitives, but once we have that, should we try to wire up libstd's RwLock implementation?)

Anything else? std::time? The types (but not functions) from std::net?

bindgen derives Debug on a packed struct

ubuntu@ip-172-16-0-28:~/linux-kernel-module-rust/hello-world$ RUST_TARGET_PATH=$(pwd)/.. cargo xbuild --target x86_64-linux-kernel-module                                                                                                                                          Compiling linux-kernel-module v0.1.0 (file:///home/ubuntu/linux-kernel-module-rust)                                                  warning: #[derive] can't be used on a non-Copy #[repr(packed)] struct (error E0133)
    --> /home/ubuntu/linux-kernel-module-rust/hello-world/target/x86_64-linux-kernel-module/debug/build/linux-kernel-module-ba51b8057957ede2/out/bindings.rs:2455:10
     |
2455 | #[derive(Debug, Default)]
     |          ^^^^^
     |
     = note: #[warn(safe_packed_borrows)] on by default
     = warning: this was previously accepted by the compiler but is being phased out; it will become a hard error in a future release!
     = note: for more information, see issue #46043 <https://github.com/rust-lang/rust/issues/46043>

tl;dr: if packing a struct causes the field to get misaligned, accesses are unsafe, and the derived implementation of Debug involves accessing these fields.

The struct in question is desc_struct (which is the type of a member of thread_struct, which we need).

Do we care? Also I think this is a bindgen bug and we can't do anything about it?

Investigate LLVM C backend

https://github.com/JuliaComputing/llvm-cbe

Looks like it's being actively developed. If it works enough that we can load a module on x86_64 by generating C and having gcc compile that, then that could be our answer for architectures LLVM doesn't support. If so we should get it into CI to make sure it doesn't regress (and document how to do it).

modinfo needs a const char [], not a const char *

We use this (from src/lib.rs) to set modinfo:

#[link_section = ".modinfo"]
#[allow(non_upper_case_globals)]
// TODO: Generate a name the same way the kernel's `__MODULE_INFO` does.
// TODO: This needs to be a `[u8; _]`, since the kernel defines this as a  `const char []`.
// See https://github.com/rust-lang/rfcs/pull/2545
pub static $name: &'static [u8] = concat!(stringify!($name), "=", $value, '\0').as_bytes();

This (likely) stores a pointer to a string in the modinfo section instead of storing the string directly.

Write and publish docs

We should have docs for at least all public APIs, and published rustdocs linked from the README.

It's not super straightforward to use this repo with crates.io (you need a local checkout for getting to the target JSON file, see #1). docs.rs automatically publishes crates on crates.io; I'm not sure if we want to publish the crate for the sake of using that (+ getting into things like crater runs?), or just self-host docs. (I run enough websites that I'm not going to be sad about self-hosting docs somewhere....)

Come up with a shorter idiom for handling errors in C calls

Right now error handling for C functions is fairly verbose:

let res = unsafe { bindings::call() };
if res != 0 {
    return Err(error::Error::from_kernel_errno(res));
}

I think we want a try!()-style macro for checking the return value and propagating it upwards. The one challenge is not all kernel functions have the same return value convention, for some != 0 means error, for others < 0 means error, and I'm sure there's some third convention.

[Suggestion] An organization for the Rust-in-Linux effort

@alex asked about the requirements for eventually merging something into the kernel in alex/linux#3 (comment)

While there are advantages of developing within the kernel (e.g. nowadays we have staging), this would be a treewide effort, not to mention multi-repo. To submit any changes to the kernel eventually, ideally everything required should be in stable Rust and a clean patch series is prepared for submission where everyone can discuss it in the LKML.

Therefore, in order to prepare everything, it would be nice to have something like the ClangBuiltLinux organization for all the Rust-in-Linux (Rust-for-Linux?) related things:

  • The Linux fork with the framework and any other changes required (e.g. like those @alex started), plus a branch with an example driver, etc.
  • A rustc fork (and any others) with all the features needed to build the Linux fork until they are in the official repo(s) (e.g. cargo xbuild as @joshtriplett mentioned)
  • The CI setup
  • etc.

Figure out how to make std work

There are a couple of annoying things about #![no_std], including:

  • Cargo doesn't handle having different flags on a crate that's used as both a build-dependency and a runtime dependency (rust-lang/cargo#5730, rust-osdev/cargo-xbuild#10). This means that none of the dependencies of bindgen that are optionally-no-std are usable. Notably, this includes byteorder, which is both useful in its own right and an indirect dependency of serde-json-core.
  • Most no-std crates interpret that to mean no-alloc either, although allocation works totally fine for us. Regular serde-json is capable of handling dynamic structures (e.g., deserializing a Vec); serde-json-core is not.
  • While in theory you could add a feature flag serde-json for no-std but yes-alloc (serde-rs/json#362), in practice it seems all of its serialization uses the std::io::Write trait and the impl Write for Vec<u8> implementation, so it would be a sizable rewrite to make the crate functional without std.
  • As noted in #38 and #16, there are other things we likely want from libstd, including hash maps, hash sets, an std::ffi.

I think we should try to port over libstd, by making all the filesystem, subprocess, etc.operations just unconditionally return Err. (Bonus points if we can raise compile-time errors with cfg but that seems like a bigger delta against upstream libstd than I'd like.)

I'm not sure how to do that, though - xargo would have been the obvious answer in the past. I asked over in the rust-osdev Gitter if anyone has recommendations. And as #38 mentions, we'll probably want a bot to automatically rebase our version of it, until it gets upstreamed. (But I suspect this is a reasonable thing to want to upstream and other projects will want it?)

usercopy handling and KERNEL_DS

This project currently seems to assume that a pointer checked with access_ok() always points to userspace; however, thanks to the set_fs(KERNEL_DS) mechanism, that's not necessarily true.

access_ok() doesn't actually always check that the given pointer points to userspace. If you want real kernel memory safety around access_ok(), you'll have to split userspace access into two types:

  • One that passes userspace pointers wrapped in opaque structs that can't be directly accessed from safe code, with helpers for safe slicing and such, just like for kernel buffers. This variant would also need to have object lifetime checking such that a handle to a "userspace" memory region can't be persisted beyond the end of the current function. (I don't know much about rust, but this seems like something rust probably provides?)
  • One that accepts raw userspace pointers, but does a stronger check than access_ok() (comparison to TASK_SIZE_MAX or so). (Of course this would still permit corrupting userspace memory, but I assume that's acceptable.)

Longer explanation of the problem:

Linux has various pieces of kernel code that normally operate on userspace pointers, and are optimized for this case - for example, filesystem read and write handlers (->f_op->read(), ->f_op->write(), ...). These handlers take userspace pointers (annotated with __user and, somewhat redundantly, accessed through helpers that check for access_ok()). However, sometimes the kernel needs to directly call into such APIs, operating on kernel buffers. In current kernel versions, this is in particular true for the sys_splice() syscall, which can move data directly between a pipe and a file without going through userspace memory. (There have been discussions about cleaning this up to make it less hazardous, but that hasn't happened yet.)

Normally, the access_ok() checks performed by filesystem code would prevent this from working - after all, these checks are explicitly designed to block the use of kernel pointers. To work around this, the kernel has a function set_fs() that can be used to temporarily override the address limit used for access_ok() checks so that all pointers are accepted. Therefore, if your code can run in a context where address limit checks have been disabled, UserSlicePtr could AFAICS be used to access arbitrary kernel pointers.

Kernel code that does this looks like this (example from fs/read_write.c):

ssize_t kernel_read(struct file *file, void *buf, size_t count, loff_t *pos)
{
	mm_segment_t old_fs;
	ssize_t result;

	old_fs = get_fs();
	set_fs(get_ds());
	/* The cast to a user pointer is valid due to the set_fs() */
	result = vfs_read(file, (void __user *)buf, count, pos);
	set_fs(old_fs);
	return result;
}

There were two recent LWN articles related to this issue:

Examples of kernel bugs caused by this behavior:

Use stable Rust (tracking ticket for unstable features we use)

More ambitious than #37, and won't be possible for a long while yet. Here's what we use currently:

  • #![feature(alloc)]: to be able to use liballoc (Box, String, Vec, etc.). rust-lang/rust#27783 I think, though I don't see any recent progress about liballoc specifically. (But as noted in #37 most things we need are re-exported through libstd, so maybe we can advocate for pushing the unstable attribute down to the things that aren't re-exported?)
  • #![feature(global_allocator)]: to be able to define an allocator backed by kmalloc. PR rust-lang/rust#51241 from today stabilizes global_allocator and I think all the things under allocator_api that we use. (Also, as part of its work, it pushes down some of liballoc's wide instability to particular modules; we might want to advocate to do more of that to stabilize the re-exported parts of liballoc to solve the previous one.)
  • #![feature(allocator_api)]: See rust-lang/rust#51540
  • #![feature(lang_items)] for #[lang = "oom"]: See rust-lang/rust#51540
  • #![feature(use_extern_macros)]: I think we don't need it, #40.
  • #![feature(panic_implementation)]: for #[panic_implementation]
  • #![feature(alloc_error_handler)]: See either rust-lang/rust#51540 or rust-lang/rfcs#2492.
  • #[feaure(const_str_as_bytes)]: See rust-lang/rust#63770

make C strings a little more ergonomic

  1. See if we can get std::ffi::CString and std::ffi:CStr to work; they're actually in libstd, but I think CString only uses alloc/containers and CStr doesn't even need that. (And decide if we want it?)
  2. Find something better than b"foo\0" for compiile-time static C strings. Options include https://github.com/mzabaluev/rust-c-str and https://github.com/abonander/const-cstr (both of which use std::ffi::CStr) or just coding it ourselves as suggested in rust-lang/rfcs#400 (comment) . If we write it ourselves I might argue for calling it c! to keep it easy to read/write.

Support things that would be required for binderfs

Decide what to do about `malloc()` in filesystem::register

Upside: it works cleanly and has a nice API

Downside: extra malloc, having the file_system_type in heap memory means it's writable which means the function ptrs in it can be targets for control flow hijacking.

Possible solutions are: using a macro to convert the trait implementer to a static, mapping the memory read-only, or deciding it's all fine since our rust modules won't be full of exploitable vulnerabilities (<--- probably not this).

usercopy uses u64 on 4.15.0-1007-aws

On the Ubuntu 18.04 EC2 box (which I still haven't shut down oops):

error[E0308]: mismatched types
   --> /home/ubuntu/t2/src/user_ptr.rs:103:17
    |
103 |                 data.len() as u32,   
    |                 ^^^^^^^^^^^^^^^^^ expected u64, found u32
help: you can cast an `u32` to `u64`, which will zero-extend the source value
    |
103 |                 (data.len() as u32).into(),
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0308]: mismatched types
   --> /home/ubuntu/t2/src/user_ptr.rs:129:17
    |
129 |                 data.len() as u32,   
    |                 ^^^^^^^^^^^^^^^^^ expected u64, found u32
help: you can cast an `u32` to `u64`, which will zero-extend the source value
    |
129 |                 (data.len() as u32).into(),                                                                                           |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                        
error: aborting due to 2 previous errors                                                                                                                                                                                                                                        ```

Changing it to u64 fixes it.

Probably something switched to `size_t` or whatever?

Build fails on v5.2+

Since torvalds/linux@cdd750b, the __c_flags variable used by kernel-cflags-helper is gone, and we should just use _c_flags. Probably the right thing to do is use __c_flags if it exists and fall back to _c_flags, for correctness with older kernels. (You can also do version comparisons with like, $(shell expr) or something but that sounds like a pain.)

Maybe it's worth seeing if there's a cleaner way of doing all this.

Originally reported in lizhuohua/linux-kernel-module-rust#1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.