- This is a final assignment in which you build a compiler for a minimum subset of C language, dubbed 'minC'
- This page explains how you should work on it
parser/
: a C -> XML parser written in Python + tatsu (parser generator), which converts a C source into an equivalent XMLrs/minc
: a directory for each languagetest/
: test programs
minc_grammar.y
: grammar definition of the minimum C language we work on, which is converted into a working python program (minc_grammar.py
) using tatsu parser generator library cd ~/minC Compiler/parser tatsu minc_grammar.y > minc_grammar.pyminc_to_xml.py
: the main parser program that drives minc_grammar.py to convert a C source into an equivalent XML python3 minc_to_xml.py example/ex.c > ex.xml- examine the result and understand how each C construct is converted into XML cat example/ex.c cat ex.xml
- you don't have to modify
minc_grammar.y
orminc_to_xml.py
unless you extend the grammar for extra points - yet you are encouraged to see how simple does tatsu (or any parser generator, for that matter) make it to write a parser; just take a look at
minc_grammar.y
- it is unnecessary and unusual to convert the source program first into XML and then to the abstract syntax tree
- more commonly, you use a parser generator for the language you write your compiler with Rust, which allows you to directly build the abstract syntax tree you can manipulate in that language
- for example, C/C++ has flex/bison parser generator, whose grammar description file (analogous to minc_grammar.y) allows you to build any C/C++ data structure in it; OCaml has ocamllex/ocamlyacc (Menhir is a newer version of ocamlyacc). there is a parser generator that supports multiple languages, most notably ANTLR, which supports Java and Python code generation. in circumstances where there is a tool available for your language, it is much more straightforward and convenient to use these tools without going through XML
- the reasons why we go through XML are
- I could not find a popular parser generator for some of the languages (Go and Julia)
- even if one exists in each language, there will be differences between them that make it difficult/tricky/cumbersome to explain them
- so I arrived at a parser generator (tatsu) for Python as a common middle ground and XML as a common data structure all languages can easily read
- in each rust directory , there is a toplevel directory
minc
- the code given there is a skeleton of a compiler that reads an XML file, builds its abstract syntax tree, and finally calls the code generator
- the code generator is currently almost empty and raises an exception when called
rs/
minc/
src/
minc_ast.rs
--- abstract syntax tree (AST) definitionminc_parse.rs
--- XML -> AST converterminc_cogen.rs
--- AST -> assembly codemain.rs
--- the main file
cd rs/minc
cargo build
- be sure you have generated ex.xml by
minc_to_xml.py
- try to compile a small program and see that the code generator raises an exception
./target/debug/minc ../../parser/ex.xml ex.s
- take a look at the source code that caused the exception cat src/minc_cogen.rs
- your job is to implement
ast_to_asm_program
function
test/
src/
f00.c, f01.c, f02.c, ...,
--- test programs each definint a functionf
main.c
--- a file that calls functionf
Makefile
--- executes all the tests
the Makefile
does the following for each test program (src/f00.c, src/f01.c,
...)
- convert
src/fXX.c
toxml/fXX.xml
withparser/minc_to_xml.py
- compile
xml/fXX.xml
tominc/fXX.s
with the minC compiler you are supposed to write - compile
main.c
andminc/fXX.s
into an executable - compile
main.c
andsrc/fXX.c
into an executable with gcc - execute the two executables and compare their outputs
- in the
Makefile
- you must set the variable
mincc
to the path to your compiler (relative to the test directory)
mincc := ../ml/minc/_build/default/bin/main.exe
- you can set the variable
srcs
to change the functions tested. the default is all the 75 files insrc/
srcs := $(wildcard src/f*.c)
for example, if you set this to
srcs := $(wildcard src/f00.c)
you can only test src/f00.c
make -n
will show you what will be executed without actually executing it
cd ~/notebooks/pl10/test
make -n srcs=src/f00.c
-
if the test fails, identify which file it failed for and how
-
it first converts the C source file into XML using minc_to_xml.py; if the test fails here, it's not your fault, unless you modified the grammar
echo "# convert src/f00.c to xml/f00.xml"
../parser/minc_to_xml.py src/f00.c > xml/f00.xml
- next it calls your compiler to convert the XML file into asm (the command line depends on the language you chose)
echo "# compile xml/f00.xml to asm with your minC compiler"
../ml/minc/_build/default/bin/main.exe xml/f00.xml asm/f00.s
if this fails, examine the original C source file (f00.c in the case above) and XML source file to examine what kind of source code makes it fail
- then it calls the gcc to compile the assembly you generated into an executable
echo "# generate the executable that calls f with your minC compiler"
gcc -o minc/f00.exe -DTEST_NO=0 main.c asm/f00.s -O0 -g
if you generate a syntactically invalid assembly code, gcc will complain here. read the gcc error message, examine the assembly you generated (asm/f00.s in the case above) and understand why it failed
- then it calls executables generated by gcc as well as your minC compiler
echo "# run the executable generated by gcc"
./gcc/f00.exe | tee out/f00.gcc
echo "# run the executable generated by your minC compiler"
./minc/f00.exe | tee out/f00.minc
the gcc-generated executable is unlikely to fail. if your compiler-generated executable fails, you might be able to see the reason just by looking aat the assembly code you generated; otherwise, run the executable with a debugger. GDB, for example, can step-execute an assembly program and allow you to examine the value of registers.
- it finally compares the output of the two executables
echo "# take the diff of the two"
diff out/f00.gcc out/f00.minc
if it fails, the debugging strategy is the same as above; you might be able to see the reason just by looking at the assembly code you generated; otherwise, run the executable with a debugger. GDB, for example, can step-execute an assembly program and allow you to examine the value of registers.
- if you work on any of the extension, you are likely to add test programs too
- src/src/fun.c contains all the C functions to test, each one being guarded by
#if TEST_NO == ???
#endif
add your test case at the end of the file, using the next number as your TEST_NO
- modify the following line in
test/src/src/Makefile
tests := $(shell seq -f "%02.0f" 0 74)
to reflect the tests you added. for example, if you add two test cases (TEST_NO = 75 and 76), you should change it to
tests := $(shell seq -f "%02.0f" 0 76)
and run make in test/src/src
cd ~/notebooks/pl10/test/src/src
make
- do a similar thing for
test/main.c
; as long as all functions take onlylong
arguments and return along
value, you can use the samemain
function. in that case you just change
#if 0 <= TEST_NO && TEST_NO <= 74
...
#endif
to, say,
#if 0 <= TEST_NO && TEST_NO <= 76
...
#endif
- if you support types other than
long
, you are likely to add a different main function. modifymain.c
accordingly
- implement the compiler for minC (you will mainly write
minc_cogen.rs
) - your code generator must be heavily commented (explain how it works, as if you are writing a report, except you write it in the source code)
- pass all the tests, that is, the following command executes without an error
cd test
make -B
- by default, the entire test stops as soon as any test fails. you may want to try
-k
option for make, to skip files that fails and go ahead to others
cd test
make -B -k
- write the summary of tests that passed/failed, in
results.csv
file in thetest/
directory - P : passed
- C : cannot assemble (your compiler fails to produce assembly)
- I : invalid assembly (your compiler produced an assembly code but it does not compile with gcc)
- R : runtime error (your compiler produced an executable, but it does not terminate successfully)
- W : wrong result (your compiler produced an executable that terminates successfully, but the output result is different from that of gcc-generated executable) cat results.csv
-
you get extra points by doing more than required above
-
you can extend the minC language in any ways, but possible extensions include (but are not limited to):
- syntactic extensions
- [difficulty level 2] for loop
- [difficulty level 2.5] initializing declaration (e.g., int x = y + 2)
- type extensions and type checks
- note that supporting a type other than long requires the compiler know the type of an expression; it's a heavy lifting
- [difficulty level 3] pointers (long*, long**, etc.)
- [difficulty level 4] floating point numbers (double)
- [difficulty level 5] types of different sizes (int, float)
- [difficulty level 6] structures and typedefs
- optimization
- use registers more aggressively
- inline-expand function calls
-
if you have done extra work beyond requirements, describe what you did below