SPARC Architecture and Compiler Design

Abstract

Present the architecture of a modern RISC digital computer circa 1996, its relationship to the Unix operating system and the C programming language. Understand the representation of high level languages in a form executable by such a computer and the underlying machine programming language and structure. The project will consider the C programming language, SPARC architecture, boolean logic, number systems, and computer arithmetic; macro assembly language programming and subroutine linkages; the operating system interface and input/output; understanding the output of the C compiler; the use of the C programming language to generate specific assembly language instructions.

Objective

Understand the architecture of a RISC machine, specifically a Sun SPARCstation 5 workstation; Understand the microSPARC-II (code-named Swift) microprocessor implementing the SPARC V8 instruction set architecture (ISA) developed by Sun Microsystems; Learn the C language grammar and assembly language programming; the form of assembly language generated by a compiler; the interface to an operating system; and design a complete SPARC C compiler using flex and bison.

Text Used

Richard Paul, “SPARC Architecture, Assembly Language Programming, and C.” Prentice Hall.
Brian Kernighan and Dennis Ritchie, “The C Programming Language.” Second Edition, Prentice Hall.
Samuel P. Harbison and Guy L. Steel Jr., “C: A Reference Manual” Third Edition, Prentice Hall.
Alfred V. Aho et. al., “Compilers: Principles, Techniques, and Tools” Addison-Wesley.
John R. Levine et. al., “lex & yacc” Second Edition, O'Reilly & Associates.
Richard M. Stallman, “GNU Emacs Manual.” Free Software Foundation.

Usage

To compile the SPARC assembly compiler code only requires a clang environment, but to assemble the resulting assembly code generated by the compiler would require a cross-platform environment or a real/virtual SPARCstation 5/10/20 machine. You can run on any Linux machine and as root execute the following:

apt install git make clang vim bison flex
git clone https://github.com/ekbann/sparc-compiler
cd sparc-compiler
make
./CC < tests/test1.c

I also successfully compiled on a macOS machine using the Xcode environment running from a Terminal. An alternative is to use QEMU and Buildroot to make tiny virtual machines. This tutorial describes how to compile or assemble simple user-level programs for a Sparc V8 target and step through their execution using Qemu and gdb. This tutorial assumes you're using Linux.

Cross Compiler

A cross compiler is needed when the machine on which the compiler is running (called the host) is of a different architecture (say x86) than the machine for which the executable is to be produced (called target, which is Sparc V8 in our case). The simplest way of obtaining a working cross compiler is to use Buildroot.

Download the latest buildroot tarball and untar it or simply clone from GitHub:

$ sudo apt install rsync
$ git clone https://github.com/buildroot/buildroot.git buildroot

Navigate to the untared buildroot directory and run the following commands:

$ cd buildroot
$ make qemu_sparc_ss10_defconfig
$ make menuconfig

This will open a graphical interface. Go to Toolchain–> , scroll down and select the option Build cross gdb for the host . We have selected this option because the default buildroot configuration for sparc V8 (called qemu_sparc_ss10_defconfig) does not include cross-gdb by default. Save and exit the graphical interface. Now run make:

$ make

This will download and build required packages and can take a while. At the end of make, we get a working cross compiler toolchain. The binaries (sparc-linux-gcc, sparc-linux-as, sparc-linux-gdb etc) are present in folder: <path-to-buildroot>/output/host/usr/bin. Add this location to your system's PATH variable to use the cross compiler binaries outside buildroot.

Compiling and Assembling a Program

Consider a simple assembly program Foo.s:

Foo.s

.global _start
_start:
		  	  ! comments start with '!'
	mov 2, %g1        !
	mov 3, %g2        !
	add %g1, %g2, %g3 ! g3 should now contain 5
	nop
	nop
	nop

Assemble and link it to get an executable Foo. (The -g option is to include debugging symbols in the generated executable).

$ sparc-linux-as -g -o Foo.o Foo.s
$ sparc-linux-ld -g -o Foo   Foo.o

Instead of assembly, you can start with a simple C program Bar.c:

Bar.c

int a,b,c=0;
int main()
{
	a=2;
	b=3;
	c=a+b;
	return 0;
};

Compile, assemble and link it as follows.

$ sparc-linux-gcc -g -S      -o Bar.s Bar.c
$ sparc-linux-as  -g         -o Bar.o Bar.s
$ sparc-linux-ld  -g -e main -o Bar   Bar.o

The -e option points out location of the first executable instruction (the entry point) to the linker. We set the entry point to the function main() in our case. The disassembled instructions in Bar can be viewed using objdump utility

$ sparc-linux-objdump -d -S Bar

Running on Qemu with gdb

Install package qemu-user. This installs binaries for several targets, example qemu-alpha, qemu-mips, qemu-sparc.

$ sudo apt-get install qemu-user

In a terminal start qemu-sparc and set it up for remote debugging with gdb.

$ qemu-sparc -g 1234 Foo

In another terminal, open gdb.

$ sparc-linux-gdb  Foo

Inside gdb, attach to qemu :

(gdb) target remote :1234
In gdb, press s to step through assembly instructuctions. Use command `info reg <reg-name>` to examine register contents.

(gdb) target remote :1234
Remote debugging using :1234
_start () at Foo.s:4
4		mov 2, %g1        !
(gdb) s
5		mov 3, %g2        !
(gdb) s
6		add %g1, %g2, %g3 ! g3 should now contain 5
(gdb) info reg g3
g3             0x0	0
(gdb) s
7		nop
(gdb) info reg g3
g3             0x5	5
(gdb)

Here are some useful tutorials on gdb:

Design Notes

Summary of the important aspects of my compiler:

The code was originally written in 1996 on a Sun SPARCstation 20 (32-bit RISC architecture) running Solaris 2.6 using gcc 2.7.x which was not ANSI C compliant. Some parts of the code had to be re-written or fixed to eliminate compiler warnings and errors, but the use of pointers in the old days was somewhat hazardous and sometimes relied on Undefined Behavior to make the code work. Using a modern gcc 10.2.x broke the code causing occasional segmentation fault (Try compiling with gcc and running ./CC < tests/gcc-segfault.c). Luckily, using clang instead mantained those UB and the compiler ran smoothly. Perhaps one day I'll re-write the compiler with proper pointer usage.
All external_decls are assigned modifier type EXTERN unless specifically defined in the source code.
I added a debug directive, debug(node_dump_on) and its counterpart debug(node_dump_off), to keep track of the creation of syntax tree nodes. The output of a few sample nodes is:

node_type: STATEMENT [0x600000f7c240]
	  left:	LEAF [0x600000f7c1e0]
	 right: NODE [0x0]

node_type: STATEMENT [0x600000f7c2a0]
	  left:	LEAF [0x600000f7c180]
	 right: NODE [0x600000f7c240]

The number in square brackets is a pointer to that specific node. At the end of parsing the source code, this directive outputs the pointer to the ROOT of the program syntax tree, e.g.

syntax tree root = [0x600000f7c2a0]

This allows the user to manually reconstruct the syntax tree to verify if the syntax tree was constructed properly. Another way is to use the directive debug(statement_dump) to get a verbose view of the syntax tree:

*** STATEMENT DUMP

=, e_var, t_int, c_scalar
  x, e_var, t_int, c_scalar
  3, e_const, t_int, c_scalar

=, e_var, t_int, c_scalar
  y, e_var, t_int, c_scalar
  10, e_const, t_int, c_scalar

=, e_var, t_int, c_scalar
  z, e_var, t_int, c_scalar
  +, e_var, t_int, c_scalar
    x, e_var, t_int, c_scalar
    y, e_var, t_int, c_scalar

The debug directive debug(symtab_dump) dumps the symbol table at the current context level. After the closing brace of a compound statement (see CC.y statement ) the compiler will delete the closing context level because those symbols are not required anymore.

*** SYMBOL TABLE DUMP, e_<entry type>, t_<variable type>, c_<constructor type>

bucket 24
<"main" scope 0, e_fn, t_void, c_scalar, references: 1>

bucket 4
<""hello"" scope 0, e_const, t_char, c_array, references: 1>

bucket 2
<"'c'" scope 0, e_const, t_char, c_scalar, references: 1>

The debug directive debug(comment_on) and its counterpart debug(comment_off) toggles the output of comments in the source code. Very useful if one wants to analyze a specific segment of code. Only variable identification and some trivial operations such as ++, --, and register flushing are implemented. Future version of my compiler will have detailed comments in the output code.
A global structure pointer named fn_p is used to store the main function entry in the symbol table so that type checking can be performed on RETURN nodes.
Multi-source code is implemented allowing one to compile multiple sources into objects and then link them together, e.g. compiling main(), init(), sort(), and dump_array() from the tests/sort directory and linking them to an executable.
I added a node type ARRAY to represent an ID and an index.
Test code snippets can be found in the tests directory along with the multi-source programs in tests/sort and tests/euclid.
There is a basic design flaw in the insertion of constants in the symbol table. By utilizing one table entry per INTEGER CONSTANT, there are conflicts in p->where if the integer constant is used both in the LVAL and RVAL expressions, e.g. array[1] = 1;. Future version of my compiler should treat each constant as a unique entry. This bug should also affect FCON, CCON, and SCON.

The following features has not yet been implemented in my compiler:

FLOATS, and related functions ITOF and FTOI;
Special chars \ooo and \xhh, i.e. octal and hex ASCII code;
Passing ARRAY pointers to external functions. (This has been implemented but does not yet work.)

Some Words of Wisdom

A good portion of my code was written ad hoc with little structuring relying heavily on my intuition. Proper planning was done on the design of the node structure and syntax tree generation using a hash table. Debugging consisted of using many strategically located printf's with minimal use of the debugger gdb for code tracing. Writing the compiler this way has shown me many different ways where things can go wrong, especially with the improper use of pointers from pre-ANSI C era that didn't follow the modern ANSI C17 standards. Rewriting my compiler entirely from scratch with my knowledge gained here would probably produce a very efficient and clean code without redundancies.

ekbann / sparc-compiler Goto Github PK

sparc-compiler's Introduction

SPARC Architecture and Compiler Design

Abstract

Objective

Text Used

Usage

Cross Compiler

Compiling and Assembling a Program

Running on Qemu with gdb

Design Notes

Some Words of Wisdom

sparc-compiler's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent