openhwgroup / core-v-xif Goto Github PK

RISC-V eXtension interface that provides a generalized framework suitable to implement custom coprocessors and ISA extensions

Home Page: https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en

License: Other

SystemVerilog 92.04% Makefile 7.96%

core-v-xif's People

Contributors

Stargazers

Watchers

core-v-xif's Issues

How to deal with synchronous exception or debug trigger match on non-last memory request?

Should the following memory requests be generated by the coprocessor (so always ending with a memory request with last=1) or should the coprocessor immediately stop generating memory requests for the same id?

Spec unclear about how speculative memory transactions are handled

What if a coprocessor signals a memory transaction as speculative, but at the same time or later the CPU signals commit_kill = 0 for that instruction (is it then still speculative)?
What happens to the memory request and memory result handshakes when a speculative memory transaction receives a commit_kill = 0 (is it allowed to complete or will it get dropped immediately)?

Are exceptions generated by a coprocessor a use-case

The current definition of CV-X-IF allows a coprocessor to raise an exception. This is true also without the memory interface implemented. RISC-V has taken an approach to not raise exceptions beyond illegal instruction exceptions for unprivileged instructions.
Floating point exceptions only result in a CSR being updated. The vector specification does mention several conditions under which a trap occurs, but those are all illegal instruction exceptions which can be caught in the decoder, i.e. can be communicated via not accepting the instruction for offload.

Long story short, I don't see the use-case for issue_resp.exc, and the exception signals in the result interface are optionally tied to the memory interface.

Missing image for building documentation

I tried running the build, but i am getting the following:

WARNING: logo file '../images/openhw-landscape.svg' does not exist

I assume that some assumption is being made that a user will have this image provided at the given location. I think it would be very reasonable to put a copy of the logo into the repository to make it self-contained

a killed instruction can not signal a synchronous exception

"A synchronous exception will lead to a trap in CPU unless the corresponding instruction is killed" how can a killed instruction signal a synchronous exception? knowing that there is no result for killed instructions.
@silabs-robin

Clarify err - NMI interaction

Bus errors are signaled to the coprocessor for offloaded loads/stores via the err signal (see https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/x_ext.html#memory-result-interface). It is also stated that NMI is signaled for such bus errors. It is the intention that the processor (as opposed to the coprocessor) signals the NMI, but this is not explicitly stated. Maybe NMIs should not even be mandated, but only allowed (such that the processor can itself decide how it wants to handle bus errors (precise or imprecise).

Should the parameters be signed?

Hello,
I have noticed that many parameters in the standard are 'int' instead of 'unsigned int'. Since the legal values of these parameters are all positive integers, I think the parameters does not need to be signed int.

Clarification on unaligned accesses

Specifically, if an unaligned memory operation crosses a X_MEM_WIDTH boundary, then it shall be broken into multiple transactions on the memory (request/response) interface by the coprocessor.

Why would this be mandated to be performed by the coprocessor? It is not necessary if the CPU itself is able to perform unaligned memory operations (in whatever way).

Parameter to inform coprocessor about possible load/store exceptions?

The possibility for a synchronous exception during the handling of offloaded loads/stores as desribed in https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/x_ext.html#memory-request-response-interface) has a potentially significant impact on the coprocessor design and/or performance (e.g. consider an exception during an offloaded vector load instruction). We might want to add a (hardware compile time) parameter for coprocessors so that they can know if they are used in a context where such exceptions are possible (both 'yes' and 'no' would need to be supported by the coprocessor). A processor should simply document whether it can potentially generate exceptions during the handling of offloaded loads/stores or not.

It could also be considered to let the processor provide such information at run-time, but that seems overly complex.

Address width on x_mem_req_t needs to be generalized

Address width on memory request needs to be generalized in width:

What are the virtual address widths supported by RISC-V?

Tables are difficult to read

limit width/force breaking of long text

XIF result interface `we` signal type is not consistent with description

The type definition of the we signal in the result packet of the XIF result interface is not consistent with its description. The type of that signal is defined as logic [X_RFW_WIDTH-XLEN:0], which yields a signal with a width of 1 bit in case X_RFW_WIDTH == XLEN and a width of 33 bits in case X_RFW_WIDTH is 64 and XLEN is 32. However, the description states that:

we is 2 bits wide when XLEN = 32 and X_RFR_WIDTH = 64, and 1 bit wide otherwise. If we is 2 bits wide, then we[1] is only allowed to be 1 if we[0] is 1 as well (i.e. for dual writeback).

I assume that the type of the we signal is meant to be logic [X_RFW_WIDTH/XLEN-1:0] instead. Also, the description uses X_RFR_WIDTH instead of X_RFW_WIDTH.

Update copyright notice

Clarify the mode should follow the effective privilege level

Currently, mode is specified to be the privilege level of the CPU. However, it should really be the effective privilege level. If machine mode software changes its effective privilege mode by setting mstatus.MPRV, instructions should be offloaded at the effective level.

Add support for fault-only-first loads to memory (request/response) interface

Register file data towards coprocessor

The current interface has the register file data as part of the instruction issue, although with valid bits that are apparently meant to be allowed to be asserted several cycles later.

I'm sure this will be fine for some micro-architectures, but when forwarding is needed it may cause unnecessary stalls.

For example, on the NOEL-V (stages: fetch, decode, register access, execute, memory, exception, write-back) we currently issue FPU instructions from the execute stage (will be moved to an earlier point in the future), but results from the past few integer instructions may not be available until a couple of cycles later (besides memory accesses, there is also another set of ALUs in the exception stage). So, waiting on that data forwarding before issueing would cause a couple of stall cycles.

Instead we treat the register accesses the same way as memory reads, and pass along the data after the exception stage (providing the instruction ID to identify it). That way we can each cycle issue an FPU instruction that requires data from the integer side - it will just not see the data until some cycles later (the FPU pipeline must accept it whenever it arrives, stalling at some point if necessary).

Note that there is no requirement that the register data must be late. The exact same interface (thanks to the supplied ID) could be used to pass data earlier (but doing it from multiple points in the pipeline would make things more complicated, obviously).

In our case we use the exact same mechanism as for memory reads, since the FPU instructions never take more than one integer argument. For CV-X-IF there will need to be several "channels" for register data (well, I suppose it could be possible to allow for multiple consecutive data returns on fewer "channels", if a register number was passed along - might be useful for smaller implementations).

Add parameter with specification version

In the current list of parameters, i would suggest adding major and minor version of the specification. This will improve determining compatibility of components

Make ordering rules more explicit

Clarify following line from spec:

A commit interface handshake cannot be initiated before the corresponding issue interface handshake is initiated.
A memory (request/response) interface handshake cannot be initiated before the corresponding issue interface handshake is initiated.
A memory result interface transactions cannot be initiated before the corresponding memory request interface handshake is completed. Note that a coprocessor shall be able to tolerate memory result transactions for which it did not perform the corresponding memory request handshake itself.
A result interface handshake cannot be initiated before the corresponding issue interface handshake is initiated.
A result interface handshake cannot be initiated before the corresponding commit interface handshake is initiated (and the instruction is allowed to commit).

Proposed rewrite:

A commit interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
A memory (request/response) interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
Memory result interface transactions cannot be initiated before the corresponding memory request interface handshake is completed. They are allowed to be initiated at the same time as or after completion of the memory request interface handshake. Note that a coprocessor shall be able to tolerate memory result transactions for which it did not perform the corresponding memory request handshake itself.
A result interface handshake cannot be initiated before the corresponding issue interface handshake is initiated. It is allowed to be initiated at the same time or later.
A result interface handshake cannot be initiated before the corresponding commit interface handshake is initiated (and the instruction is allowed to commit). It is allowed to be initiated at the same time or later.

List dependencies for building documentation in readme

Dependencies should be explicit, not for a user to find out in trial and error. From what I can see, those packages are needed

pip install recommonmark
pip install sphinxcontrib-svg2pdfconverter
pip install sphinx_github_changelog
pip install sphinx-rtd-theme

Specify how the commit handshake should be interpreted when an instruction is not accepted by a coprocessor

Even if a coprocessor does not accept a specific offloaded instruction, then the CPU still has to provide a related handshake over the commit interface. It should be stated that the coprocessor will receive this handshake, but that it can ignore the value of commit_kill (as from the coprocessor's point of view there is no offloaded instruction to commit or kill). This prevents a combinatorial path from the issue response (accept) to the commit interface.

Inconsistency between issue and result interface

Issue response uses writeback and dualwrite signals to communicate to the CPU that it will writeback to one or two registers. There is no requirement that states dualwrite may not be high when writeback is low.
Result uses a we signal, which is two bits wide for register pairs. The bit corresponding to the odd register may only be asserted if the one for the even is asserted.

The issue solution is a bit cleaner when it gets to how this is supposed to be used, if you would add the requirement that either dualwrite shall be driven low when writeback is low, or if you state that the CPU should ignore dualwrite signal if writeback is 0.

The we solution has the advantage of being parameterized, i.e. you do not have the dualwrite signal in the interface when not supporting pairs. Thus, I would be inclined to change the writeback signal to the type of we.

In more general terms, do we actually need those signals on the result interface? The CPU knows which registers this instruction is writing back to based of the issue response. Maybe this can be removed alltogether.

Width of uncompressed instructions

In the compressed response type and the issue interface, the instr field is defined as logic [31:0]. However, lowest bits will always be b11. Why carry this all the way through? Imo logic [31:2] would make a lot of sense, as you wouldn't want a coprocessor to look at the lower two bits.

Spec unclear about retracting memory request transaction

The following contradict each other:

A memory request transaction is defined as the combination of all mem_req signals during which mem_valid is 1 and the id remains unchanged. I.e. a new transaction can be started by just changing the id signal and keeping the valid signal asserted.
- The valid signals are not allowed to be retracted by a |coprocessor| (e.g. once mem_valid is asserted it must remain asserted until the handshake with mem_ready has been performed).

x_mem_resp_t type needs to be expanded with debug trigger match information

If a debug trigger with 'before timing (mcontrol.timing=0)' has been configured on the data address of an offloaded load or store, then such load/store shall not be allowed to complete in case of a trigger match. The coprocessor (which would normally have been told that it was allowed to commit such load/store already) needs to be informed of such trigger match and that the instruction is not allowed to complete.

The proposal is to add this information to the x_mem_resp_t type.

Parameterization of capabilities

I have a question on expectations and decision making in the interface. As an example, lets take dual read ability in the issue interface.

The specification will require the interface to include a logic [2:0] dualread signal no matter what. There is no parameterization of the signal based on X_DUALREAD. So the specification admits that some cores will not support dual read (or just limited), but at the same time wants the dualread signal to be present. Looking at implementations, e.g. CV32E40X does carry the dualread signal, while not being dualread capable.

Can someone give me background why it is done this way?

XIF needs access to mstatus

The vector spec states:

Attempts to execute any vector instruction, or to access the vector CSRs, raise an illegal-instruction exception when mstatus.VS is set to Off.
When mstatus.VS is set to Initial or Clean, executing any instruction that changes vector state, including the vector CSRs, will change mstatus.VS to Dirty.

Therefore mstatus needs to become part of the XIF protocol (e.g. part of the issue/result interface)

A similar requirement exists for the FPU

Add rules to prevent deadlock

Rules need to be added in the specification to prevent system deadlock caused by interfaces waiting on each other in a circular fashion.

Issue with changelog may require update

When trying to generate the documentation, the following exception was seen:

Exception occurred:
  File "C:\Users\nxa23603\AppData\Local\Programs\Python\Python310\lib\site-packages\sphinx_github_changelog\changelog.py", line 44, in compute_changelog
    return no_token(changelog_url=options["changelog-url"])
KeyError: 'changelog-url'

From the documentation, I concluded I have to add this specified

:changelog-url: https://docs.openhwgroup.org/projects/openhw-group-core-v-xif/en/latest/preface.html

Exception is gone, and documentation is generated. I assume this happens after a certain version of sphinx_github_changelog

Elaborate on use-case for changing issue_req.id

From the documentation, it is not clear what the benefit of changing id is over deasserting valid.

Custom CSRs

The manual states:

The eXtension interface enables extension of CPU with:

[...]

Custom CSRs and related instructions.

However, I don't really see something related to CSRs. What is the intended direction? Is it intended, that it is possible to add CSRs in the custom ranges of CSRs (e.g. 0x800-0x8FF for M-mode R/W CSRs). As those would be addressable by Zicsr, I think this would require a dedicated interface to add such CSRs to the core.

Opinions?

Question about stability of invalid signals during XIF transactions

Hi, I have a question regarding the stability of signals during XIF transactions.

At least the memory request/response interface and the result interface require that signals remain stable during an ongoing transaction (which starts once the associated *_valid signal is asserted and ends when both the *_valid and *_ready handshake signals are asserted simultaneously).

Some of the signals that are part of these transactions may be invalid. For instance, a result transaction with we set to 0 does not need to provide valid data and rd signals, since there is no writeback anyways (please correct me if that assumption is wrong). The same appears to apply to the signal exccode when exc is 0.

Does the requirement that these signals must remain stable include signals that do not provide valid data for that transaction, or would these be allowed to change? I assume that the intent of this requirement is to allow the CPU to start acting on behalf of a transaction while acknowledging it later. Since the CPU will not use invalid data, there appears to be no harm if invalid data changes during the transaction.

Sequence of result and commit interface transactions

I would like to discuss the following part of the specification

A coprocessor is only allowed to provide a result for an instruction once the core has indicated (via the commit interface) that this instruction is allowed to be committed.

As CPUs become more complex, issuing multiple instructions, the chances get significantly higher that copro instruction complete execution before being committed. This will stall the pipeline quite a bit. Allowing instructions to writeback the result before being committed can solve this. What the CPU does with this is secondary. It could use the result within a reorder buffer.

Of course, it is a parameter of the CPU if it is able to deal with results being written back before they are committed. On the other hand, this does take away complexity from the coprocessor, especially if it doesn't have state of its own.

Looking forward to your input.

Increase minimum ID_WIDTH

ID_WIDTH is currently specified as being between 1 and 32. Requiring support for ID_WIDTH = 1 causes artificial corner cases already in relatively short pipelines. E.g. a CPU would have to put back pressure on its compressed interface if there is already more than 1 other offloaded instruction in flight (which would be quite normal). Mandating a minimum width of say 3 would allow for 8 in-flight instructions, which would not require a back pressure mechanism to be implemented on normal use cases for the current OpenHW cores that are on the roadmap. On the CPU side this does not imply additional hardware cost as the CPU is still free to only support only 1 or 2 in-flight offloaded instructions (and tie of some MSBs of the ID). On the coprocessor side the increased minimum does imply an increased minimum cost (but not added complexity).

rs_valid needs clarification

rs_valid is currently a bit underspecified.

the use of those signals is not very explicit. The spec does state:

A coprocessor can (only) accept an offloaded instruction when: The required source registers are marked valid by the offloading core (issue_valid is 1 and required bit(s) rs_valid are 1).

but it is not quite clear what required means.
the definition of rs_valid is unclear for X_RFR_WIDTH=64, XLEN=32. What does it mean, if a bit is asserted? Without further detailing, I would assume this means that the entire register pair is valid. This would be a huge performance hit for instructions that do not make use of pairs, as they would still have to wait for the entire pair to be ready. My suggestion would be to change the Type from logic [X_NUM_RS-1:0] to logic [X_NUM_RS+X_DUALREAD-1:0] to be able to specify the validity of each individual x register

Do id values on commit interface need to be incremental?

The spec currently states "The id values of subsequent commit transactions will increment (and wrap around)".

I added that myself, but there is no strong reason for this requirement other than that it will look logical. Once we have a bit more implementation feedback we need to decide if we keep or relax this requirement.

Allow combinatorial path inside processor from memory request interface to memory response interface

The memory response (mem_resp) will depend on the memory request (except for simple processors that cannot cause a synchronous exception or 'before timing' debug trigger match based on the memory request. Therefore the processor cannot provide a default memory response and if we forbid a combinatorial path, then the memory response can be sent earliest in the next cycle. Therefore the memory request transaction will need to be 1 for at least two cycles per memory request which limits the load/store bandwidth.

I will make a PR to allow such combinatorial paths and to disallow a combinatorial path from memory response to memory request within the coprocessor (to avoid loops).

Reduce the focus of the core-v-xif project 1.0 to the X-interface (core to/from co-processor)

The current spec specifies what happens between the core and the adapter (X-if) and between the adapter and the co-processor (C-if).

We shall limit the spec between the core and the co-processor and move to an "example folder" the adapter implementation/specification together with the C-if for multiple co-processor implementations.

Size width of x_mem_req_t too limited - replace by byte enables

The size bitfield of x_mem_req_t needs to be update to support load/store widths as required per the v1.0 RISC-V 'V' Vector Extension.

Also X_MEM_WIDTH should be adapted if needed to support the Vector spec.

Contradictory requirements for result signal stability

Hi @Silabs-ArjanB

The lines below seem to contradict each other. First it says, all result signals except data must be stable (data must only be stable if we is not 0 during the result transaction) and in line 892 it says the result signals should remain stable, which implies that all result signals (including data) should remain stable.

One could remove the second sentence of line 892 to resolve this contradiction.

core-v-xif/docs/source/x_ext.rst

Lines 866 to 868 in 78a2907

 A result transaction starts in the cycle that ``result_valid`` = 1 and ends in the cycle that both ``result_valid`` = 1 and ``result_ready`` = 1. The signals in ``result`` are 

 valid when ``result_valid`` is 1. The signals in ``result`` shall remain stable during a result transaction, except that ``data`` is only required to remain stable during 

 result transactions in which ``we`` is not 0.

core-v-xif/docs/source/x_ext.rst

Line 892 in 78a2907

 The signals in ``result`` are valid when ``result_valid`` is 1. These signals remain stable during a result transaction. 

Additionally, one could remove the stability condition for the ecsdata signal, if the corresponding ecswe field is low.

Let me know what you think. I can also create a PR if you agree with these changes.

Best
Moritz

Contradiction in handshake dependency rules

Hi,

The section about handshake dependencies states that:

Transactions with an earlier issued id shall not depend on transactions with a later issued id (e.g. a coprocessor is not allowed to delay generating issue_ready = 1 because it first wants to see result_ready = 1 for an older instruction).

It seems to me that the example contradicts the rule. To my understanding, an earlier issued id would correspond to an older instruction, whereas a later issued id corresponds to a newer instruction. Hence, the rule states that older instructions (with earlier issued id) shall not depend on newer instructions (with later issued id), but the example claims the opposite, i.e., that a newer instruction must not depend on an older one.

I believe that the example's requirement that issue_ready must not be delayed due to waiting for result_ready of an older instruction cannot be met. As the coprocessor's pipeline fills up it will eventually need to delay accepting further instructions until earlier results are transmitted to the CPU.

Some parameters need clarification

In particular, X_MISA and X_ECS_XS lack sufficient explanation to understand what they would be used for in a coprocessor. They also do not quite appear in the rest of the specification like other parameters.

Spec not clear about when memory request transactions are allowed

The following sentence in the spec is not very clear:

A coprocessor is required to (only) perform a memory request transaction(s) for non-killed instructions that it earlier accepted via the issue interface as load/store instructions (i.e. loadstore is 1).

It is allowed for a coprocessor to cause memory request transactions for oflloaded instructions that have not yet received commit vs. kill info via the commit interface e.g. by setting spec = 1). Once commit_kill = 1 has been signaled, then the memory request transaction should no longer be generated (also need to be more precise if that applies in the same cycle already or only in following cycles).

Error in description related to re-use of IDs

The following text needs to be updated:

The id is a unique identification number for offloaded instructions. An id value can be reused after an earlier instruction related to the same id value has fully completed (i.e. because it was not accepted for offload, because it was killed or because it retired). The same id value will be used for all transaction packets on all interfaces that logically relate to the same instruction.

To:

The id is a unique identification number for offloaded instructions. An id value can be reused after an earlier instruction related to the same id value has fully completed (i.e. because it was not accepted for offload and the related commit handshake has been performed, because it was killed or because it performed the related result handshake). The same id value will be used for all transaction packets on all interfaces that logically relate to the same instruction.

Add system verilog package as reference

Writeback to x0

Hi @Silabs-ArjanB,

In both the "F" and "V" RISC-V standard extensions there exist instructions (e.g. vset{i}vl{i} rd, {...}, csrrw rd, fcsr, rs) that will write back to the general purpose register file only if rd != x0. Is it allowed for a coprocessor to assert writeback in the issue transaction and we in the result transaction even if rd == x0? Or more general, should a core be able to ignore writebacks to x0 (and associated signals) in general?

I know from @michael-platzer that in Vicuna writeback and we are asserted even if rd == x0. The same is true for the floating-point coprocessor I developed a while ago.

Coprocessor should take RV32E into account

CORE-V-XIF needs to define how XIF can be used together with RV32E.

An initial proposal:

A processor configured to use RV32E will declare all instructions that would use a source or destination register within x16-x31 if the RV32I base would have been used as illegal and attempt them for offload.
A processor will never indicate that a register file source is valid via rs_valid if the related source is within x16-x31
A coprocessor shall only be allowed to accept such instructions if it is not actually using any register file register within the x16-x31 range (i.e. a coprocessor shall not wait for validity via rs_valid if the related source is within x16-x31 as it will never happen and the coprocessor shall ignore rs if the related source is within x16-x31. The coprocessor shall also never write to x16-x31 via the result interface)

x_mem_result_ready needs to be added for subsystem sharing

In order to support the sharing of a subsystem between multiple cores using the x-interface, an x_mem_result_ready would have to be added. The current spec states that there is no ready because the subsystem must be ready to accept the memory result at any time. With multiple cores, this cannot be guaranteed as it would then even be possible for multiple cores to raise the x_mem_valid at the same time. The subsystem would in this case only be able to react to one of those data packets, violating the property that it must be ready to accept the memory result at any time for every core raising the valid.

The standard needs to carify CPU or coprocessor should kill instructions after an exception

Hello,
In the interface, there are three types of exceptions that both CPU and the co-processor knows, since these exception signals are passed through the interface. The three types are:

Co-processor raises an exception, and informs it to CPU via result interface.
CPU raises an exception while using load & store unit, which is dealing with a request from memory interface from co-processor. CPU informs co-processor this exception via 'mem_resp' signal in memory interface.
Same as the second case, the only different is that CPU informs co-processor this exception via memory result interface.

To support precise exception, after an exception is raised, CPU should kill every instruction after that instruction. For the three types of exception above, the instructions in co-processor that are issued but not committed should all be killed. The question is, since both CPU and accelerator knows this exception, should CPU kill these instructions via commit interface or should the accelerator kill these instructions by itself. I think this should be clarified by the standard.

I think the second one "The accelerator kill these instructions by itself. " is more reasonable. If the CPU kills the instruction via commit interface, it would take multiple cycles because CPU can only kill one instruction at a cycle. And if we choose the second choice, CPU can keep running without waiting for killing instructions.

Adding an User signal to CPU to COPROC interfaces

Shall we add a USER interface in the ISSUE interface channel (x_issue_req_t) ?

Maybe also in the memory channel (x_mem_req_t).

Can be tight to 0 if no USER signal is specified.

Generally, it can be extended to every interface (compressed, etc) but it may look ugly to have such USER signals everywhere - However, I think it may be useful for future extensions

@Silabs-ArjanB @vogelpi @moimfeld

	A result transaction starts in the cycle that ``result_valid`` = 1 and ends in the cycle that both ``result_valid`` = 1 and ``result_ready`` = 1. The signals in ``result`` are
	valid when ``result_valid`` is 1. The signals in ``result`` shall remain stable during a result transaction, except that ``data`` is only required to remain stable during
	result transactions in which ``we`` is not 0.

openhwgroup / core-v-xif Goto Github PK

core-v-xif's People

Contributors

Stargazers

Watchers

Forkers

core-v-xif's Issues

Recommend Projects

Recommend Topics

Recommend Org