microsoft / cheriot-sail Goto Github PK

Sail code model of the CHERIoT ISA

License: Other

Makefile 3.17% Isabelle 0.01% Coq 0.71% OCaml 0.12% Shell 0.26% TeX 94.38% Perl 0.74% Jupyter Notebook 0.11% Python 0.50%

cheriot-sail's Issues

Broken pattern matches on otypes

In src/cheri_cap_common.sail's isCapSentry function and the execute clause for CSeal in src/cheri_insts.sail there are pattern matchs on an otype that uses names for global let bindings in some of the cases. For example,

  match unsigned(cap.otype) {
    otype_sentry => true,
    otype_sentry_id => true,
    otype_sentry_ie => true,
    _  => false
  }

Unfortunately Sail treats these as fresh variable bindings and so it always uses the first clause. (The current development version of Sail now produces "redundant case" warnings for these.)

Changing them to use comparisons should work:

  match unsigned(cap.otype) {
    ty if ty == otype_sentry => true,
    ...

Transitive load permissions vs. sealed capabilities

A second follow-up to microsoft/cheriot-ibex#13

There's inconsistency between
https://github.com/microsoft/cheriot-sail/blob/19ad1e982c0f876bbb3f21f39b3cee6935278adc/src/cheri_insts.sail#L775-L782 in that (missing) transitive load global's effects are applied even to sealed capabilities, while load mutable's are not.

We think that the correct thing to do is actually more nuanced:

For unsealed load targets,
- Missing transitive load mutable should clear both store and transitive load mutable permissions.
- Missing transitive load global should clear both the global "permission" and the transitive load global permission.
For sealed load targets,
- Missing transitive load mutable should, as now, not impact the permissions under seal (that is, the authority missing this permission will have no effect on the loaded capability).
- Missing transitive load global should clear the global permission of a sealed capability, but not clear the transitive load global permission under seal.

CSRs that can be accessed without `PERMIT_ACCESS_SYSTEM_REGISTERS` should be specific to compartments

If a compartment wants to read, for instance, the global time, it can do so with a sentry that has PERMIT_ACCESS_SYSTEM_REGISTERS permission. That way, if we have a system that has empty import and export lists for each compartment, and disjoint PCC and CGP caps between compartments, the compartments will be trivially isolated from each other. In terms of performance impact, a CSRR instruction will be replaced by a <CJALR; CSRR; CRet> sequence.

For the CSRs that are read/write (fflags, etc), we need to spill those during a context switch in the switcher (https://github.com/microsoft/cheriot-rtos/blob/main/sdk/core/switcher/entry.S)

This repo is missing a LICENSE file

This repository is currently missing a LICENSE file.

A license helps users understand how to use your project in a compliant manner. You can find the standard MIT license Microsoft uses at: https://github.com/microsoft/repo-templates/blob/main/shared/LICENSE.

If you would like to learn more about open source licenses, please visit the document at https://aka.ms/license (Microsoft-internal guidance).

v2? isentry support

We have considered adding indirect sentries to CHERIoT. At least one rationale is to allow library calls to pick up a data segment beyond .text/PCC. (See additional discussion in #39.)

Optimise bounds encoding

There is an opportunity to adopt a trick from CHERI Concentrate to slightly optimise the capability bounds encoding. In CHERI Concentrate the top bit of the length is implied from the exponent in a similar way to the IEEE floating point mantissa. We could adopt the same technique by reducing the T field by one bit to 8-bits and using this freed bit to create a full 5-bit exponent. This would result in some spare exponent values, one of which could be used to encode the denormal case. The resulting encoding would have the same precision and representable bounds as the existing one, but would also support exponents greater than 14, giving better precision for very large allocations.

Stack high water mark may not be updated correctly for writes spanning MSHWMB

As described in microsoft/cheriot-ibex#14 the MSHWMB will not be updated if a write occurs below MSHWMB but with a length that means bytes above MSHWMB are written. We should fix this and also change the spec to require 16-byte alignment of MSHWMB and MSHWM.

Cap loads through authorities missing load cap and a transitive perm will result in unexpected bit patterns

Following up on microsoft/cheriot-ibex#13, we think that https://github.com/microsoft/cheriot-sail/blob/19ad1e982c0f876bbb3f21f39b3cee6935278adc/src/cheri_insts.sail#L783-L785 should be before, rather than after, https://github.com/microsoft/cheriot-sail/blob/19ad1e982c0f876bbb3f21f39b3cee6935278adc/src/cheri_insts.sail#L775-L782

As written, loading a capability through an authority lacking capability load permission as well as a transitive load {mutable,global} permission results in the transitive permissions being stripped from the loaded value before the tag is cleared. We think it would be less surprising if the tag were cleared and the permission bits not altered, though the use cases we have in mind (telemetry / crash reporting) are pretty narrow.

Why the need for Special Capability Registers and a special instruction for SCR access, instead of simply memory mapping all system registers and using simple loads/stores?

What is the disadvantage of getting rid of Permit_Access_System_Registers permission (SR) and relying only on capabilities' bounds (and other permissions) to control access to system registers (which are all memory mapped)?

MCAUSE/MTVAL for illegal SCR addresses

When r/w an unimplemented SCR address (e.g., 22), Looks like sail is trapping with MCAUSE = 2 and MTVAL=0. Just want to confirm that's indeed the intention (in the case of PERM_SR violation it returns MCAUSE = 0x1c).

v2: Split sentries by direction

We have reserved space for, but do not presently implement, differentiating forward and backward sentries. There would then be at least five forms: forward with IRQ {preserved, disabled, enabled} and backward with IRQ {enabled, disabled}.

The ISA itself would need some tweaking: cjalr would need to seal PCC as one of the reverse types rather than one of the forward types, and cj and cjalr would need to differentiate between return and non-return transfers. Likely, absent any other encoding changes, this would enshrine cj ra (that is, cjalr x0, ra) as being a return instruction and prohibit its use for forward transfers, requiring modest toolchain work.

Intended use for reserved bit in capabilities

Sorry to raise it as an issue, perhaps this should go into a mailing list - please let me know if you have one.

I was wondering if there's any intended use for the reserved bit in the CherIoT? The main reason I would like to reclaim the space is to have better granularities for bigger bounds, between 2^(9+14) to 2^32. It looks like lack of big buffer support in CherIoT is a reason to avoid it even for applications that don't require 64-bit datatypes except for big buffer references.

misa reports incorrect value in Sail

Sail currently sets bits C, D, E, F, M, V and X in misa. This is wrong as D (double float), F (float) and V (vector) are not supported. ~~Also B (bit vector) is supported and should be set~~ (removed as I don't think this is standardized).

The -F and -W options to the simulator can be used to disable the floating point and vector extensions, but we should change the default in sail-riscv/c_emulator/riscv_platform_impl.c as we don't even compile the instruction definitions.

v2? cgp as SCR

Right now, CGP is just an ISA name for one of the GPRs reachable by all register selectors. We have contemplated instead converting it to a SCR. That SCR would likely be ambiently readable, via AUICGP as well as the existing SCR access instructions, but would require ASR to modify, restricting modification to the TCB switcher.¹ As we already consider library code as running within the calling compartment,² this does not change the security model.

The effect on code density should be investigated: does freeing up a GPR eliminate enough spills to offset the cost of fetching CGP via AUICGP in all cases?

Requiring ASR to modify means that a compartment and the libraries it calls cannot foul up its CGP. The switcher could also rely on the constancy of this SCR outside itself, but it's not clear that there would be significant gains. ↩
Though the fast unsealer and perhaps other "privileged libraries" blur the text here, as they manipulate secrets not exposed to the calling compartment. ↩

Libraries accessing Read/Write data through caps stored within PCC

From my understanding, libraries work as follows: if one compartment calls a library function, it can rest assured that the data it sent the library cannot go outside the caps present in the passed arguments. If a second compartment calls the same library function again, the library function cannot access the old data passed by the first compartment unless the second compartment also passes in the necessary caps. The library relies on not having a read/write cap to global memory to ensure the above.

But, one can have a cap with read/write permissions inside the PCC region of a library that can be loaded by the library. This violates the above invariant, and the library can pass confidential data obtained from the first caller to the second caller. Is this behavior expected? If so, how are library calls not subject to the same sanitization as compartments, via the switcher? What is the expected use case of libraries?

Why is MC implicitly given to caps with EX permission? Is there any use case where you want to read caps from the PCC region? I can understand reading read-only data from the PCC region.

Allowing delayed check for jump/branch instructions

In the case of branch/jump instructions, would like to have the option to delay the PCC-based branch target checks (address bound/permission) to instruction fetch time (errors associated with the branch target instruction), rather than handling it at EX time (error associated with the branch/jump themselves).

The benefit is better performance (checking logic will not be in the critical instruction feth timing path, for example) and fewer changes to execution logic (espically branch instructions).

Extending support for DII execution mode

The following features when in DII mode are needed in order to use the sail model in the design verification flow.

Handling interrupts. If an instruction packet is flagged with an interrupt, the sail model should perform interrupt handling (e.g., update MSTATUS.MIE if interrupt is taken) based on the current MIE status.
Similar to #1, it's also desirable to be able to model a bus error for the load/store and instruction fetch interfaces. Actually - if we can have a generalized way to make the sail model to talk to a volatile peripheral it would be great.
Adding a status-dump pseudo-instruction so that we can dump all architecturally visible registers and use that as a checkpoint against RTL simulation.

Sealing and Unsealing Authorities for addresses rather than otypes

The current sealing and unsealing authorities permit sealing and unsealing of specific otypes. This restricts the number of different data authorities to 7 (otype encoded between 9 to 15). It would be useful to have a address-based sealing/unsealing authority as follows:

Compartment A has a sealing and unsealing authority for a range of addresses.
Compartment A can pass any cap within that range after sealing it, using the sealing authority for that range, to compartment B
When the compartment B calls a function of compartment A by passing the sealed cap, compartment A can unseal the cap using the unsealing authority for that range.

This gives a way to have a lot more "types" for sealing.

Is there any issue in having such a sealing type? Perhaps, one of the 7 data sealing otypes can be allotted for such an address-based sealing so that the hardware can perform the necessary checks. I agree that the SW can use one of the otypes and perform such a check manually, but it would be simpler to do the checks in hardware.

ISA Specification bug: Bounds check during taken branches and jumps is either redundant or wrong

It looks like the bounds check uses "minimum-inst-bytes" when checking if a taken address is in-bounds. This means that if the jump/branch target is a 32-bit instruction, where the last 16 bits are out-of-bounds, we will get an out-of-bounds error after fetching the target instruction, which kind of defeats the purpose w.r.t. triaging why the branch or the jump failed; we might as well not do out-of-bounds checks during jumps/taken-branches and wait for the instruction fetch to perform the bounds check.

A potential scenario where this exception (as it is spec-ed currently) becomes useless is as follows:
We have a compressed trampoline initially, allowing us to jump correctly to the trampoline and out of it with just a 2-byte bounds. Now the trampoline is removed and the code is inlined (perhaps some JIT-optimization). When we are debugging/trap handling an out-of-bounds exception because of this, we will still want access to the old PC, so that we can fix the PC bounds there. The way the spec is, we will lose the previous PC information in this scenario, which is not ideal.

Potential fixes:
a) check for 4-bytes in-bounds on taken-branch/jump. This has the problem if the target is a single instruction compressed trampoline with precisely set bounds - we will cause a spurious exception while executing the previous branch/jump.
b) store the previous PC and taken-ness in some micro-architectural state, and if the new instruction's fetch fails check for boundedness, set the EPC with the previous PC if the taken-ness is set. If taken-ness is not set, set EPC to current PC.
c) store the previous PC and taken-ness in some architectural state, and every boundedness failure during instruction fetch stores the previous PC and current PC. This could be easier for writing the trap handlers - the return address for the trap handler is still the current PC, but the fix of the bounds will be done on the previous PC as opposed to (b) where the trap handler has to compute the PC to return to.

### Tasks

Have CJALR create !G sentries?

Some discussion with Murali about CFI has prompted the question: "can we have CJALR create local (!G) sentries instead?" Very little software attempts to do anything with return addresses other than stash them to the stack or use them to return; the examples that leap to mind are things like setjmp() and the somehow still holding on {get,make,set,swap}context() family of last century, neither of which we have, or have perceived need of, in CHERIoT-RTOS.

I think it would suffice to change https://github.com/microsoft/cheriot-sail/blob/344c9456a6031160183221e5881bd7f3efa3481f/src/cheri_insts.sail#L180 to instead read

C(cd) = { sealCap(linkCap, to_bits(cap_otype_width, sentry_type)) with global = false };

FWIW, perhaps unsurprisingly, having made that change, we still pass the RTOS test suite.

Type error while building with upstream Sail

I am building with upstream Sail, commit: b81e2823f8b4ffcf14cb0b11c260d79d3a19b0dc

I'm getting the following type error:

Type error:
src/cheri_riscv_types.sail:143.31-33:
143 |    EXC_LOAD_CAP_PAGE_FAULT => 26,
    |                               ^^
    | int(26) is not a subtype of {('e : Int), (0 <= 'e & 'e <= 2). int('e)}
    | as (0 <= 'ex2448167 & 'ex2448167 <= 2) could not be proven
    | 
    | type variable 'ex2448167:
    | src/cheri_riscv_types.sail:143.31-33:
    | 143 |    EXC_LOAD_CAP_PAGE_FAULT => 26,
    |     |                               ^^ bound here
    |     | has constraint: 26 == 'ex2448167

In the code the type definition says the execution code must be less than xlen:
https://github.com/microsoft/cheriot-sail/blob/ef8cfbccdbaf760798449827b5d82ffa2ba9a582/src/cheri_riscv_types.sail#L140

I'm not exactly sure why the Sail compiler thinks xlen is equal to 2.

CSub redundant?

The CSub instruction is currently functionally identical to the RISC-V's sub instruction. The only reason to keep it is to separate pointer operations from integer ones, which could become important in future ISA versions. For example:

capability addresses might be reduced to less than XLEN width, freeing up capability encoding bits for other uses. In this case CSub may or may not be the same as sub.
if capability registers were separate from integer ones (split register files).

At the very least we should add some discussion of this to the page in ISA reference.

v2? Permuted capability encoding optimized for 33-bit busses

Microsoft's CHERIoT-ibex is optimized to be small and so uses a 33-bit memory bus. As a result, capability fetches take two bus cycles (and the two out-of-band bits are ANDed together to form the capability tag, reminiscent of IBM's System/38.)

The current capability layout does not allow for much parallel processing or decompression in the 2nd of those two bus cycles: the cursor/address field is not particularly useful unto itself and the bounds metadata fields are not particularly meaningful without some of the cursor/address bits.

A long while ago (how time flies), Kunyan and I sketched a layout in which we shuffle the bits around a bit, such that the two words shake out containing...

the reserved bit, the 4-bit exponent field, the 9-bit T field, the 9-bit B field, and the middle 9 bits of the address (that is, the ones that overlap the shifted T and B fields in decoding).
the 6-bit permissions, the 3-bit otype, and the residual 23 top and bottom bits of the address (the exponent also serves to divide the 23 bits into top and bottom).

This gives all the information needed for decompression in the first word, so that logic can happen in parallel with the 2nd bus cycle.

The non-contiguousness of the address bits is exciting and strongly differentiates CGetLow from CGetAddr. See https://github.com/CTSRD-CHERI/cheri-specification/wiki/Tracking-discussion-for-CGetAddr-vs-CToPtr-vs-as-integer-alias-vs-CGetLow

Clean up MTDC

We inherited the MTDC (trap default capability) special capability register from upstream but it isn't needed as we don't support hybrid mode. It would make more sense to get rid of it and keep only MScratchC, MEPCC and MTCC. This would need a few changes in ISA and RTOS:

MTDC currently contains the RW root on reset. We'd need to put it somewhere else, perhaps in the c1/x1 GPR?
The RTOS would need to switch to using MScratchC as a scratch register instead of MTDC! This should be trivial.

Incorrect behaviour when registers greater than 15 used as destination in certain instructions

@mndstrmr points out in microsoft/cheriot-ibex#30 that the sail behaves incorrectly in cases where registers greater than 15 (which are not valid in RV32E) are used in instructions that have side-effects other than writing the destination register such as CSRRW.
This is due to the way that we use a Sail exception in the register read / write functions to detect the invalid register rather than detecting it during decode.
This was for expedience i.e. to avoid many changes to the upstream sail-riscv repo.

It would be best if the upstream had support for RV32E. In fact we could do with clarification of the RISC-V specification as there are two reasonable interpretations at the moment:

raise a reserved instruction exception (i.e. fail at decode stage)
ignore the top bit of the register fields

We are attempting to implement 1. and Ibex used to do 2. 2 is easier to implement but probably less useful.

microsoft / cheriot-sail Goto Github PK

cheriot-sail's Issues

Footnotes

Recommend Projects

Recommend Topics

Recommend Org