Giter VIP home page Giter VIP logo

scarv-cpu's Introduction

SCARV: processor core implementation

Acting as a component part of the wider SCARV project, the RISC-V compatible SCARV micro-controller (comprising a processor core and SoC) is the eponymous, capstone output, e.g., representing a demonstrator for the research oriented XCrypto ISE and the industry oriented RISC-V Scalar Cryptography ISE. The main repository acts as a general container for associated resources; this specific submodule houses the processor core implementation.

Branches:

Contents:

Overview

This is a 5-stage single issue in order CPU core, implementing the RISC-V 32-bit integer base architecture, along with the Compressed and Multiply extensions. It's a micro-controller, with no cache, branch prediction or virtual memory.

Pipeline Diagram

Documentation

See the docs/ folder for information on the design requirements and the pipeline structure.

Quickstart

  • Install the following tools installed to use all parts of the design flow:

  • Checkout the repository and required submodules.

    $> git clone [email protected]:scarv-cpu/scarv-cpu.git
    $> cd scarv-cpu/
    $> git submodule update --init --remote
  • Setup tool environment variables.

    $> export YOSYS_ROOT=<path to yosys installation>
    $> export RISCV=<path to toolchain installation>
  • Configure the project environment.

    $> source bin/conf.sh
  • Build the verilator simulation model:

    $> make verilator_build
  • Run the basic RISC-V compliance tests:

    $> make riscv-compliance-build
    $> make riscv-compliance-run
  • Run the standard Yosys Synthesis flow:

    $> make synthesise

Acknowledgements

This work has been supported in part by EPSRC via grant EP/R012288/1, under the RISE programme.

scarv-cpu's People

Contributors

ben-marshall avatar danpage avatar flaviens avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

scarv-cpu's Issues

Core Frontend - XCrypto and Bitmanip

Add instruction decode and operand gathering for the new XCrypto and Bitmanipulation instructions.

  • Individual Instruction Decode.
  • Operand Gathering.
  • Additional pipeline fields:
    • Pack width specifier for the ALU.
    • Widening the s*_uop field where needed to fit extra instructions.
  • Extra Register File read port for RS3.
    • Associated forwarding logic.
    • Pipeline documentation updates.

SKW: Merge masking ISE functionality

  • Currently, the masking ISE is implemented on the scarv/xcrypto/masking-ise branch only.
  • This has/will diverge heavily from the scarv/skywater/main branch.
  • The masking ISE functionality can partially be copied over (particularly the ALU) but much of it will simply need to be re-implemented:
    • Masking ISE Decode.
    • Operand Reversal.
    • 4-read-2-write.
    • Forwarding.

SKW: Static branch prediction.

Currently, all control flow changes have a 5 cycle penalty, since they wait until the instruction reaches writeback before triggering the change.
Two improvements are possible within the scope of a microcontroller like the SCARV-CPU:

  • Decode stage taking of jump with immediate instructions.
  • "Always taken" or "predict taken if backwards" branch prediction.

Step 1, tracked here, will just implement static early branching in the decode stage for control flow changes where it is possible.

  • Instructions eligible: c.j,c.jal,jal.

SKW: Bitmanip xperm instruction

Implement the Bitmanip xperm.* instructions. Currently blocked by upstream bitmanip since the instruction has no opcode.

  • Decode
  • Execute
  • Verfication

Run an ELF, deadlock and memory protocol

Hi there!

  1. Do you have instructions on how to run an ELF on the scarv-cpu and get the execution trace?

I wrote a little system with an scarv-cpu, adapters and simple memories, but I get a deadlock with a program as simple as below, therefore I'd like to compare with the expected execution.

  la t0, inf_loop
  csrw mtvec, t0
inf_loop: # aligned on 8 bytes
  j inf_loop

Digging into the CPU, I found that there is some instruction fetch pending (imem_recv is high but imem_ack is low), which prevents the CPU from doing the required control flow fetch. Interestingly I don't usually get problems without CSR instructions, probably because then the CPU stalls less.

Therefore, I believe I don't get the protocol at the top of the core. Is there documentation somewhere?
As far as I could reverse engineer (assuming parameter FRV_MAX_REQS_OUTSTANDING = 1) , imem_req signals a mem request and imem_gnt handshakes with it (and may depend on imem_req). Then, once the data is available, the memory side asserts imem_recv and waits for the CPU to handshake using imem_ack.
In my adapter implementation, I do not assert imem_gnt if currently imem_recv && !imem_ack, because I would overwrite the data that I am offering.

  1. Where am I wrongly understanding the protocol? Do you have documentation somewhere?

Thanks!
Flavien

SKW: Crypto Bitmanip Instructions

Decode:

  • rotate
  • grev
  • [un]shfl
  • clmul[h]
  • andn, xnor, orn
  • pack
  • xperm See #39

Execute:

  • rotate
  • grev
  • [un]shfl
  • clmul[h]
  • andn, xnor, orn
  • pack
  • xperm See #39

Verification:

  • rotate
  • grev
  • [un]shfl
  • clmul[h]
  • andn, xnor, orn
  • pack
  • xperm See #39

SKW: Scalar Crypto ISE Implementation

Implement the RV32 scalar cryptography extensions.

EX Stage FU:

  • AES
  • SM3
  • SM4
  • SHA

Decode Integration:

  • AES
  • SM3
  • SM4
  • SHA

Formal Checkers:

  • AES
  • SM3
  • SM4
  • SHA

Note: Issue #35 tracks implementation of the Bitmanip instructions that the crypto ISE borrows.

Masking ISE: finite field instructions

Tracking the status of the finite field instructions: mask.f.mul and mask.f.aff.

  • Decoder and operand selection RTL.
  • Execute stage operation selection / routing.
  • Masking ALU instruction implementation.
  • Unit test.
  • Formal checker.

Note: This work should be done only on the scarv/xcrypto/masking-ise branch.

Load in shadow of `mul`/`div` doesn't write loaded data back to GPRs.

If a load is immediately before a mul* or div* instruction, then it does not progress into the write-back stage fast enough to "catch" returning read data. The writeback stage then ignores the returned load data and no GPR write occurs.

  • There is no unit test to catch this.
  • The riscv-formal framework doesn't catch it because the proofs aren't allowed to run for long enough.

Fixes:

  • Allow memory stage instructions to progress before EX stage instructions are finished.
  • Add unit test to catch regressions on this.

ALU input preshift

Need to specify a pre-shift amount for the ALU RHS input when doing certain instructions. This is because they auto-align their "offset" operand to their natural datatype.

  • xc.ldr.h[u]
  • xc.str.h
  • xc.ldr.w
  • xc.str.w
  • xc.scatter.h
  • xc.gather.h

Change memory interface to AHB

Long term todo for standardising the memory interface.

  • Currently using a generic two channel system, one for requests and one for responses.
  • This is a hangover from the scarv-cpu parent project and is needlessly complex.
  • AHB is appropriate for a micro-controller, and makes building the SoC / interconnect arround it
    much simpler than would be for a channel based thing.

SKW: Additional Bitmanip Instructions

Add support for complete sub-sets of the Bitmanip extension: zbb, zbp.

Decode:

  • clz, ctz
  • pcnt
  • min[u], max[u]
  • sext.h, sext.b
  • gorc
  • slo[i], sro[i].

Execute:

  • clz, ctz
  • pcnt
  • min[u], max[u]
  • sext.h, sext.b
  • gorc
  • slo[i], sro[i].

Verification:

  • clz, ctz
  • pcnt
  • min[u], max[u]
  • sext.h, sext.b
  • gorc
  • slo[i], sro[i].

RVFI Extensions

Extend the RVFI internal signals and ports to handle the increase in operands needed by XCrypto and Bitmanip.

  • rs3_addr
  • rs3_data
  • mem_rstrb -> mem_rstrb_0/1/2/3
  • mem_wstrb -> mem_wstrb_0/1/2/3
  • mem_addr -> mem_addr_0/1/2/3
  • mem_rdata -> mem_rdata_0/1/2/3
  • mem_wdata -> mem_wdata_0/1/2/3

These signals will be ignored by the normal riscv-formal flow and checkers.

A new flow and set of checkers will be built for the XCrypto and Bitmanip instructions, orthogonal to the riscv-formal flow.

SKW: Remove XCrypto functionality, leaving FENL only.

Depends on:

Tasks:

  • Remove XCrypto instructions from decoder.
  • Remove XCrypto execution units.
    • frv_asi
    • frv_bitwise
    • RNG - Remove for now. Will be re-added in future. See #24.
  • ALU: Replace Packed Adder
  • ALU: Replace Packed Shift/Rotate
  • MDU: Remove packed instruction options.
  • LSU: remove indexed load/store.
  • Remove XCrypto feature parameters.

SKW: Move timers and counters out of the CPU hierarchy.

Depends on:

  • #22 - memory interface change.

Tasks:

  • Change module interface to support new memory interface. - SoC level issue now.
  • Add counter module interfaces for instruction retired etc.
  • Add core top-level interfaces for instruction retired etc.

Notes:

  • They will later be part of the SoC.

SKW: Upgrade the `MDU`

Depends on:

Tasks:

  • Replace the existing packed/carryless multiplier module with a better one, possibly adapted from this
  • Ensure all riscv-compliance tests are passing.

SKW: Fix fetch buffer performance and overflows.

The fetch buffer currently has one functional bug and one performance bug:

  • It can end up trying to fit too much data in the fetch buffer, which results in lost instructions.
  • It cannot handle mixed 32/16-bit instruction sequences at full speed.

Tasks:

  • Assertion to prove the buffer never overflows.
  • Fix the overflow bug.
  • See that the core can run at 1IPC in a straight line.

Vectored Interrupt Support

Add vectored interrupt support to the scarv-cpu.

  • Requires #16 to be fixed first.

frv_csrs module:

  • mtvec support for "vectored" and "direct" interrupt modes.
  • Only support 64-byte aligned mtvec.base values when in vectored interrupt mode, so we can just "or" the scaled cause value into it to get the target vector.
  • Support any 4-byte alignment when in direct mode.

frv_pipeline_writeback module:

  • Calculate correct target address based on cause, interrupt mode and control flow change cause.

Core Support Package.

A library / single header which contains intrinsics, constants and functions related to managing the core.

  • CSR Access functions.
  • Machine-mode timer control and access.
  • User mode timer control and access.
  • Time base access.

Failing XCFI Coverage

Some of the XCFI Coverage proofs are failing unexpectedly a/o 4906ac7

insn_clmul_bmc/logfile.txt:45:SBY 14:26:52 [
insn_clmul_cov/logfile.txt:48:SBY 14:23:39 [
insn_clmulh_bmc/logfile.txt:45:SBY 14:27:14 [
insn_clmulh_cov/logfile.txt:48:SBY 14:24:53 [
insn_clmulr_bmc/logfile.txt:45:SBY 14:27:47 [
insn_clmulr_cov/logfile.txt:48:SBY 14:24:55 [
insn_xc_aesmix_dec_cov/logfile.txt:54:SBY 14:25:07 [
insn_xc_aesmix_enc_cov/logfile.txt:54:SBY 14:23:50 [
insn_xc_aessub_dec_cov/logfile.txt:55:SBY 14:23:57 [
insn_xc_aessub_decrot_cov/logfile.txt:54:SBY 14:23:56 [
insn_xc_aessub_enc_cov/logfile.txt:54:SBY 14:23:56 [
insn_xc_aessub_encrot_cov/logfile.txt:54:SBY 14:25:16 [
insn_xc_mmul_3_bmc/logfile.txt:45:SBY 14:26:17 [
insn_xc_mmul_3_cov/logfile.txt:48:SBY 14:23:38 [
insn_xc_pror_i_cov/logfile.txt:54:SBY 14:25:06 [
insn_xc_psll_i_cov/logfile.txt:54:SBY 14:25:58 [
insn_xc_sha256_s0_cov/logfile.txt:54:SBY 14:25:05 [
insn_xc_sha256_s1_cov/logfile.txt:54:SBY 14:25:02 [
insn_xc_sha256_s2_cov/logfile.txt:54:SBY 14:23:50 [
insn_xc_sha256_s3_cov/logfile.txt:54:SBY 14:26:11 [
insn_xc_sha3_x1_cov/logfile.txt:54:SBY 14:24:59 [
insn_xc_sha3_x2_cov/logfile.txt:54:SBY 14:25:04 [
insn_xc_sha3_x4_cov/logfile.txt:54:SBY 14:23:50 [
insn_xc_sha3_xy_cov/logfile.txt:54:SBY 14:23:47 [
insn_xc_sha3_yx_cov/logfile.txt:54:SBY 14:26:11 [

Add a Travis CI flow.

Depends on:

  • #20 - riscv-compliance checks passing.

Tasks:

  • Tool setup scripts. See this example
  • RISC-V compliance check flow.
  • Unit Tests Flow.
  • Yosys Synthesis Flow.
  • Designer Assertions Flow.
  • (partial) riscv-formal flow.
  • Embench flow. See #27.

Integrate xcrypto-rtl::xc_malu module

Use the xc_malu moduel from the xcrypto-rtl repository in place of the current frv_alu_muldiv module.

  • Remove the MP Instructions Class
    • Merge xc.mror into the Integer ALU class bitmanip class.
    • Merge xc.madd,xc.msub, xc.macc,xc.mmul into the Mul/Div Class.
    • Alter frv_common.vh: Remove *MPI* constants.
    • Alter decoder to remove MPI class and re-distribute instructions as above.
  • Remove old frv_alu_muldiv and replace with xc_malu.
  • Add formal checkers as per #5
    • xc.madd
    • xc.msub
    • xc.macc
    • xc.mmul
    • Update with riscv-formal pseudo function for mul*, div*, and rem* instructions.

CSR illegal instruction traps.

Currently, the CPU has no way for the frv_csrs module to indicate a bad write to a CSR field, or non-existent CSR.

  • Add output wire from frv_csrs to indicate a bad write to a field or a non-existant CSR.
  • Add illegal opcode trap in writeback stage when this happen.
  • Test for bad field write
  • Test for non-existant CSR write.

Implementation Profiles

We should define 2 or 3 implementation profiles:

  • minimal: The bare minimum required to run the core, with XCrypto included.
  • standard: Includes things like counters and other small extras.
  • full: All of the above, plus use the "fast" versions of each functional unit where applicable.

This sort of profiling helps tame the implementation space, and is similar to how commercial core configurations are specified. This in turn makes verification and implementation runs easier.

A suggested set of configurations:

Feature minimal standard full
RV32IMC x x x
XCrypto x x x
External/Software Interrupts x x x
Operand Forwarding x x
Counters x x
Fast Multiply x
Fast AES SBox x
Fast AES Mixcolumn x
Fast Long Accumulate/Sub x

xcrypto-formal Flow and Checkers

The new XCrypto and borrowed Bitmanip instructions need a new set of formal checkers and a flow to run them.

Each checker will be a single verilog module following the same template as the riscv-formal checkers, but with the additional signals from scarv/scarv-soc#1.

Symbiyoys will be used to manage the verification flow and running of the BMC engines.

  • Checker template.
  • Macros.
  • Symbiyosys flow.

Instruction Checkers:

  • xc_ldr_b

  • xc_ldr_h

  • xc_ldr_w

  • xc_ldr_bu

  • xc_ldr_hu

  • xc_str_b

  • xc_str_h

  • xc_str_w

  • xc_mmul_3

  • xc_macc_1

  • xc_madd_3

  • xc_msub_3

  • xc_rngtest

  • xc_rngsamp

  • xc_rngseed

  • xc_padd

  • xc_psub

  • xc_pror

  • xc_psll

  • xc_psrl

  • xc_pror_i

  • xc_psll_i

  • xc_psrl_i

  • xc_pmul_l

  • xc_pmul_h

  • xc_pclmul_l

  • xc_pclmul_h

  • xc_scatter_b

  • xc_scatter_h

  • xc_gather_b

  • xc_gather_h

  • xc_aessub_enc

  • xc_aessub_encrot

  • xc_aessub_dec

  • xc_aessub_decrot

  • xc_aesmix_enc

  • xc_aesmix_dec

  • xc_sha3_xy

  • xc_sha3_x1

  • xc_sha3_x2

  • xc_sha3_x4

  • xc_sha3_yx

  • xc_sha256_s0

  • xc_sha256_s1

  • xc_sha256_s2

  • xc_sha256_s3

  • b_cmov

  • b_ror

  • b_rori

  • xc_lut

  • xc_bop

  • xc_mror

  • b_fsl

  • b_fsr

  • b_fsri

  • b_clmul

  • b_clmulr

  • b_clmulh

  • b_bdep

  • b_bext

  • b_grev

  • b_grevi

  • Depends on scarv/scarv-soc#1

External interrupt causes and NMI

Currently the CPU supports two external interrupt sources:

  • int_external
  • int_software

These should be extended in the following ways:

  • Add a non-maskable interrupt pin (NMI). This should have it's own documented cause code.
  • Add a cause field to the int_external line which finds its way to mcause when such an interrupt is taken.
  • Add a cause field to the int_software line which finds its way to mcause when such an interrupt is taken.

Core Complex

Implement a "core complex" module, which wraps the CPU core with:

  • A dual port, tightly coupled RAM.
  • A single port, tightly coupled ROM.
  • The TRNG / pollentropy entropy source.
  • The memory mapped machine timers and counters. See#28.
  • The physical memory protection registers, when implemented. See#37.
  • An "external" memory port out of the CCX to peripherals in a wider SoC system.

The core-complex wraps the CPU in a "drop-in" style module, containing everything the CPU needs to run. This simplifies the construction of the SoC significantly, and lets people re-use it more easily.

Todo List:

  • RAM / ROM modules.
  • scarv_ccx_top module.
  • Interconnect
    • Arbiter
    • Router
    • Hooks for PMP checks. - Decided PMP will live inside the core level.
    • Memory interface definition.
  • Memory mapped IO module for:
    • TRNG access.
    • Memory mapped counters.

External RNG Interface

Define and implement a CPU core level external interface for a random number generator.

Must support:

  • Seeding
  • Sampling
  • Health Checking

Better XCrypto feature selection

Currently, XCrypto instruction classes can be dis/enabled using top level core parameters.
These only affect the decoder, so lots of downstream logic in the decoder and the execute stage is left in place when it should be optimised away.
This issue captures progress toward making that logic properly parameterisable.

  • RANDOMNESS
  • MEMORY
  • BIT
  • PACKED
  • MULTIARITH
  • AES
  • SHA2
  • SHA3
  • LEAK

SKW: Change memory interfaces to Wishbone style:

Depends on:

  • None

Tasks:

  • Instruction memory interface.
  • Data memory interface.
  • Simulation / testbench memory interface models.

Notes:

  • req/gnt interface.
  • CPU always accepts responses instantly.

Core Execute: XCrypto and Bitmanip functionality

Functional unit modification and additions for XCrypto and Bitmanip.

  • New functional unit for SHA256/SHA512/SHA3
  • New functional unit for AES
  • New multi-precision functional unit.
  • New RNG functional unit.
  • Swap ALU functional unit for packed implementation from scarv/xcrypto-rtl
    • Add funnel shift implementation.
    • Add cmov implementation.
    • Add xc.lut and xc.bop implementation.
  • FU result multiplexing for next pipeline stage.

Scatter/Gather Implementation

New memory and writeback stage functionality for indexed load/store and scatter/gather instructions.

  • xc.scatter.b
  • xc.scatter.h
  • xc.gather.b
  • xc.gather.h

Embench IoT Flow

The Embench flow currently has some benchmarks which fail their verification. These need fixing, but there is no point doing this until everything is passing the riscv-formal flow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.