Giter VIP home page Giter VIP logo

mra_tools's Introduction

Tools to extract ARM's Machine Readable Architecture Specification.

These tools unpack the ASL spec from inside the XML so that the spec is easy to process.

See blog post for an explanation of the structure of ARM's releases and a description of the innards of these tools and see blog post for some ideas on what can be done with the specification once it has been unpacked.

Usage

The following commands will download ARM's specification and unpack it.

mkdir -p v8.6
cd v8.6

wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/SysReg_xml_v86A-2019-12.tar.gz
wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/A64_ISA_xml_v86A-2019-12.tar.gz
wget https://developer.arm.com/-/media/developer/products/architecture/armv8-a-architecture/2019-12/AArch32_ISA_xml_v86A-2019-12.tar.gz

tar zxf A64_ISA_xml_v86A-2019-12.tar.gz
tar zxf AArch32_ISA_xml_v86A-2019-12.tar.gz
tar zxf SysReg_xml_v86A-2019-12.tar.gz

cd ..

make all

You may need to manually add function prototypes for these functions to arch.asl

bits(4) _MemTag[AddressDescriptor desc]
_MemTag[AddressDescriptor desc] = bits(4) value;

Generates:

  • arch.asl: all the ASL support code (This file uses an alternative syntax for bitslices that is easier to parse. Remove the --altslicesyntax flag from the Makefile to get the original ASL.)
  • arch.tag: all the instruction encodings and decode/execute ASL
  • arch_instrs.tag: all the instruction encodings and decode/execute ASL (alternate format)
  • arch_decode.tag: instruction decode trees in ASL
  • regs.asl: type of each system register

You can also extract various subsets of the full architecture specification. For example, if you want a subset of the usermode AArch64 instructions, you can use the following command.

make FILTER=--filter=usermode.json all

The subset selected may not contain all the instructions you would want --- see Subsetting for more details.

Help

$ bin/instrs2asl.py  -h
usage: instrs2asl.py [-h] [--verbose] [--altslicesyntax] [--demangle]
                     [--output FILE] [--filter [FILE [FILE ...]]]
                     [--arch {AArch32,AArch64}]
                     <dir> [<dir> ...]

Unpack ARM instruction XML files extracting the encoding information and ASL
code within it.

positional arguments:
  <dir>                 input directories

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Use verbose output
  --altslicesyntax      Convert to alternative slice syntax
  --demangle            Demangle instruction ASL
  --output FILE, -o FILE
                        Basename for output files
  --filter [FILE [FILE ...]]
                        Optional input json file to filter definitions
  --arch {AArch32,AArch64}
                        Optional list of architecture states to extract

Subsetting

Various subsets of the architecture can be generated using these additional flags

--arch=AArch32
--arch=AArch64
--arch=AArch32 --arch=AArch64

For finer control, you can specify a specific filter that selects exactly which instructions and subset of the call graph to include

make FILTER=--filter=usermode.json all

The filter is controlled by a json file that has this format:

{
    "instructions": [
        // regexp list goes here
    ],
    "roots": [
        // root definitions go here
    ],
    "cuts": [
        // cut functions go here
    ],
    "canaries": [
        // canary definitions go here
    ]
}

The four parts of this are:

  • 'instructions' and 'roots' define what you want to include

    • 'instructions' is a list of regexps that match instruction names For example "aarch64/branch/conditional/.*". You can find the list of instruction names by looking in the file arch.tag.

      grep TAG arch.tag | grep decode
      
    • 'roots' is a list of functions that you wish to keep even though they are not referred to by instructions. For example, after executing an instruction in Thumb mode, you should call "AArch32.ITAdvance()" (which has 0 arguments) so add "AArch32.ITAdvance.0" to the list of roots. The ".0" suffix indicates that the function has 0 arguments.

  • 'cuts' defines what you want to exclude.

    This should be a list of functions that you wish to provide your own implementations for. For example, if all you are interested in is usermode execution, you might want to omit all the code to implement page table lookups and replace the functions to read or write memory by adding the following to the cut list

      "AArch64.MemSingle.read.4",
      "AArch64.MemSingle.write.4",
    

    This will cause the definitions of these functions to be replaced by function prototypes.

    Choosing the right set of cuts will depend on what functionality from the part you extract and on what you want to implement in your analysis/simulation framework.

  • 'canaries' are optional but are useful when trying to understand why your 'cuts' are not behaving as intended.

    Any uncut path from the instructions or roots to a canary is reported.

    For example, if you are trying to eliminate as much of the AArch32 support as possible, you might want to omit the function "ELUsingAArch32.1". But there are many possible code paths to that function and it is hard to find which functions to cut. So add "ELUsingAArch32.1" to the list of canaries and you will get a report that looks a bit like this:

      Canary ELUsingAArch32.1 ELIsInHost.1 IsInHost.0 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 ELIsInHost.1 S1TranslationRegime.0 ESR[ AArch64.ReportException.2 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 ELIsInHost.1 S1TranslationRegime.0 AArch64.ReportException.2 AArch64.TakeException.4 AArch64.UndefinedFault.0
      ...
      Canary ELUsingAArch32.1 ELIsInHost.1 S1TranslationRegime.0 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 ELIsInHost.1 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 S1TranslationRegime.0 ESR[ AArch64.ReportException.2 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 S1TranslationRegime.0 AArch64.ReportException.2 AArch64.TakeException.4 AArch64.UndefinedFault.0
      ...
      Canary ELUsingAArch32.1 S1TranslationRegime.0 VBAR.read.0 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 S1TranslationRegime.0 AArch64.TakeException.4 AArch64.UndefinedFault.0
      Canary ELUsingAArch32.1 AArch64.TakeException.4 AArch64.UndefinedFault.0
    

    This shows that the final calls to ELUsingAArch32.1 are from ELIsInHost.1, S1TranslationRegime.0 and AArch64.TakeException. So we could choose to cut all those functions.

    It also shows that the root call to ELUsingAArch32.1 is AArch64.UndefinedFault.0 so the easiest fix is to cut just that function.

Currently implemented

  • Unpack all the ASL code in the 'shared_pseudocode' file to giant ASL file
  • Unpack instructions to 'tagfile' format
  • Quick and dirty unpack of system register spec to ASL file

All generated files include ARM's license notice.

Shared pseudocode

The shared pseudocode is sorted so that definitions come before uses.

Tagfile format for functions

A tagfile consists of sections that start with a line of the form "TAG:$label:$kind". There are five different kinds:

  • diagram: Instruction encoding consisting of:

    • An initial line that specifies the encoding: A64, A32, T32 or T16
    • Field specifiers of the form "hi:lo name constants" where the name "_" is used for anonymous fields and the each constant is of the form:
      • 0 or 1
      • x meaning don't care
      • (0) or (1) meaning 'should be 0/1' (UNPREDICTABLE if not)

    For example:

      T32
      31:25 _ 1110101
      24:21 op1 1000
      20:20 S x
      19:16 Rn 1101
      15:15 _ (0)
      14:12 imm3 xxx
      11:8 Rd xxxx
      7:6 imm2 xx
      5:4 type xx
      3:0 Rm xxxx
    
  • decode: ASL code to decode an instruction encoding

  • postdecode: Additional code to continue decoding an instruction encoding

  • execute: ASL code to execute after postdecode

  • index: Identifies the different parts of an instruction encoding and consists of:

    • Decode: [tag of decode section]@[tag of diagram section]
    • Postdecode: optional [tag of postdecode section]
    • Execute: [tag of execute section]

    There can be multiple Decode lines all sharing the same postdecode and execute parts.

  • asl: ASL definitions (e.g., function definitions)

Register spec

At the moment, we unpack all the information about fields and declare a variable with the right name and with named fields. This uses an unofficial ASL extension to declare a number the location of each field.

__register 32 {
    31:31 N, 30:30 Z, 29:29 C, 28:28 V, 27:27 Q, 24:24 J, 22:22 PAN, 19:16 GE,
    9:9 E, 8:8 A, 7:7 I, 6:6 F, 5:5 T, 7:2, 1:0 IT, 3:0 M
} CPSR;

The system register specification also contains a lot of information about how to refer to a system register, permission checking, constant value fields, etc. but none of that is being extracted at the moment.

Experimental parser, etc.

There is an experimental parser for the language written in ocaml. This requires some tools to be installed. The following instructions are for a Mac.

brew install ocaml opam
opam install menhir core

Test it using the following

make test

At the moment, all it does is parse the ASL code extracted from the XML files. It does not have a parser or typechecker.

mra_tools's People

Contributors

alastairreid avatar lev-rumyantsev avatar ojno avatar smattr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mra_tools's Issues

Parser failing on nested statements

This function (the second one in t.asl) is failing on the last line

boolean HaveEL(bits(2) el)
    if el IN {EL1,EL0} then
        return TRUE;                             // EL1 and EL0 must exist
    return boolean IMPLEMENTATION_DEFINED;

I think the problem is the handling of nested statements in the if-statement

'_sre.SRE_Match' object is not subscriptable at bin/instrs2asl.py:line 131

$ make all
bin/instrs2asl.py v8.2/ISA_v82A_AArch32_xml_00bet3.2 v8.2/ISA_v82A_A64_xml_00bet3.2
Traceback (most recent call last):
File "bin/instrs2asl.py", line 404, in
sys.exit(main())
File "bin/instrs2asl.py", line 319, in main
(shared,names) = readShared([ f for d in args.dir for f in glob.glob(os.path.join(d, 'shared_pseudocode.xml'))])
File "bin/instrs2asl.py", line 131, in readShared
r.defs.add(m[2])
TypeError: '_sre.SRE_Match' object is not subscriptable
make: *** [arch.asl] Error 1

how to decode ASL

Hi,

I have been trying to understand how to use the scripts. - mainly instr2asl.py to no avail.

I'm mainly interested in understanding and possibly converting the ASL to C or python...

for example, I have this code: - can I use the script to decode it?

len = HighestSetBit(immN:NOT(imms));
bits(6)` levels;
levels = ZeroExtend(Ones(len), 6);

Many thanks
Hoss

PMCEID field oddness

After building regs.asl, PMCEID's fields look odd,

$ grep PMCEID[0-1]_EL0 regs.asl
__register 64 { 63:32 IDhi, 63:32 ID } PMCEID1_EL0;
__register 64 { 31:0 IDhi, 31:0 ID } PMCEID0_EL0;

it seems to me like this should be { 63:32 IDhi, 31:0 ID } for both,
like in the xml field_lsb and field_msb are

but the script generating regs.asl is interpreting the names "IDhi[62:32]" and "IDhi[31:0]" as bit slices,
but the name and the fields, do not appear to be in harmony. I can imagine situations in which a simulator might not notice such an issue unless tests are interleaving changes to both fields.

Is this working as intended?

Stand alone slices and slice syntax

Hi,

This problem is more about the specification than the parsing tools. Is a standalone slice like in <X, Y> valid ASL? The specification says nothing about that case but the SHA1C instruction has pseudocode containing <Y, X> = ROL(Y:X, 32);. To me that seems like the person meant to write (Y, X) = ROL(Y:X, 32); or Y:X = ROL(Y:X, 32); (if concatenations are assignable).

The other problem is that the new slice syntax overlaps with the array-style function syntax. A free token that comes to my mind would be <<< and >>> or { and }.

PS: If this question is off-topic feel free to close it and/or direct me to the appropriate place to ask the question.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.