Giter VIP home page Giter VIP logo

cci's Introduction

CCI: C11 Compiler Infrastructure

Build Status Codecov

⚠️ This project is discontinued, and is getting rewritten in Rust at feroldi/atlas. The reason for that is that I don't feel like writing C++ code as a hobby anymore.

This is an experimental project of a C compiler written in C++20. The implementation follows the ISO/IEC 9899:2011 standard, i.e., C11. The main purpose of this project is to teach myself compiler data structures, language design and optimization techniques.

Building

Use cmake to build the project. The following sequence of commands builds the library, tools and unit tests:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

You may also specify a toolchain when generating build files by defining CMAKE_TOOLCHAIN_FILE to one of the supported toolchains in cmake/toolchains/. Both Clang and GCC are able to compile this project. So, for example, if you're going to build with GCC, you may specify the GCC toolchain like so:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/gcc.cmake
cmake --build .

The same goes for Clang. Just replace the gcc in -DCMAKE_TOOLCHAIN_FILE=../cmake/toolchains/gcc.cmake with clang.

Usage

This is still a work in progress project. Usage is to be done.

Running tests

This project makes use of GoogleTest for unit tests, so you'll need to install it beforehand. After installing GoogleTest, go to the build/ directory we created, and run ctest. For example:

cd build
ctest --output-on-failure

If you're not going to run unit tests, it's possible to disable them by specifying BUILD_TESTING=NO at the build generation step like so:

mkdir build && cd build
cmake -DBUILD_TESTING=NO -DCMAKE_BUILD_TYPE=Release ..
cmake --build .

Compiler design

This document is an attempt to describe the API and project design.

Summary:

  • General
    • Meaning of "Infrastructure" in CCI
    • Project's directory structure

General

There are a few non-obvious choices and terminologies used in this project, so this section is intended to explain them.

Meaning of "Infrastructure" in CCI

CCI stands for C11 Compiler Infrastructure. That means this is not just a tool you can use to compile C code. CCI has an API, which you can use to manipulate C code. The goal is for it to allow you to scan code, generate and traverse a parse tree, generate an IR, produce an executable, write a back-end for it, and so on.

Project's directory structure

  • include/: Exposes the CCI's API you can use to write your own applications. There are functions for scanning, parsing, diagnosing, analysing, generating IRs etc.
  • lib/: This is where most of CCI's code base lives. APIs are implemented here.
  • src/: This is where some CCI tools live, where each directory is a separate project.
    • For example, the CCI compiler tool lives under src/cci/.
  • unittest/: Contains unit tests.
  • doc/: Documentation and manuals.
  • cmake/: Contains some modules used across the build system.

Almost all directories have a README.md file explaining their structure and purpose, what they do and solve etc.

Why C11?

C11 is a great, challenging language to make a compiler for. It's also true that one can learn a lot by writing a compiler. That being so, C11 seems to be an option that gets the most out of the experience.

License

This project is licensed under the MIT license. See LICENSE.

cci's People

Contributors

feroldi avatar sarcasm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

cci's Issues

Command-line interface

Implement an initial command-line interface that accepts parameters/arguments, flags/options and source files like so:

ccompiler [OPTIONS] files ...

Initial options:

  1. -pedantic - More warnings.
  2. -pedantic-errors - Pedantic warnings become errors.
  3. -Woption - Diagnostic options.
  4. -fsyntax-only - Analyses source codes, but only output diagnostics if any.
  5. -o - Binary output (file name).
  6. -On - Optimization level.

declaration

declaration:
  declaration-specifiers init-declarator-list ';'
  declaration-specifiers ';'
  static-assert-declaration

declaration-specifiers:
  declaration-specifier+

declaration-specifier:
  storage-class-specifier
  type-specifier
  type-qualifier
  function-specifier
  alignment-specifier

init-declarator-list:
  init-declarator
  init-declarator-list ',' init-declarator

init-declarator:
  declarator
  declarator '=' initializer

storage-class-specifier:
  'typedef'
  'extern'
  'static'
  '_Thread_local'
  'auto'
  'register'

function-specifier:
  ('inline'
  '_Noreturn'
  '__stdcall')
  '__declspec' '(' identifier ')'

alignment-specifier:
  '_Alignas' '(' type-name ')'
  '_Alignas' '(' constant-expression ')'

direct-declarator:
  identifier
  '(' declarator ')'
  direct-declarator '[' type-qualifier-list? assignment-expression? ']'
  direct-declarator '[' 'static' type-qualifier-list? assignment-expression ']'
  direct-declarator '[' type-qualifier-list 'static' assignment-expression ']'
  direct-declarator '[' type-qualifier-list? '*' ']'
  direct-declarator '(' parameter-type-list? ')'

identifier-list:
  identifier
  identifier-list ',' identifier

declarator:
  pointer? direct-declarator

parameter-type-list:
  parameter-list
  parameter-list ',' '...'

parameter-list:
  parameter-declaration
  parameter-list ',' parameter-declaration

parameter-declaration:
  declaration-specifiers declarator
  declaration-specifiers2 abstract-declarator?

Flag diagnostic messages accordingly

  • ProgramContext::warn on opts.warning_as_error:
program.warn("this is an warning!");

<source>:n:m: error: this is an warning! [-Werror]
  • ProgramContext::pedantic on opts.pedantic_errors:
program.pedantic("this is pedantic!");

<source>:n:m: error: this is pedantic! [-pedantic-errors]
  • ProgramContext::pedantic on opts.pedantic:
program.pedantic("this is pedantic!");

<source>:n:m: warning: this is pedantic! [-pedantic]
  • ProgramContext::pedantic on opts.pedantic && opts.warning_as_error:
program.pedantic("this is pedantic!");

<source>:n:m: error: this is pedantic! [-Werror, -pedantic]

fragments

identifier-nondigit:
  nondigit
  universal-character-name

nondigit:
  [a-zA-Z_]

digit:
  [0-9]

universal-character-name:
  '\\u' hex-quad
  '\\u' hex-quad hex-quad

hex-quad:
  hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit

integer-constant:
  decimal-constant integer-suffix?
  octal-constant integer-suffix?
  hexadecimal-constant integer-suffix?
	binary-constant

binary-constant:
  '0' [bB] [0-1]+

decimal-constant:
  nonzero-digit digit*

octal-constant:
  '0' octal-digit*

hexadecimal-constant:
  hexadecimal-prefix hexadecimal-digit+

hexadecimal-prefix:
  '0' [xX]

nonzero-digit:
  [1-9]

octal-digit:
  [0-7]

hexadecimal-digit:
  [0-9a-fA-F]

integer-suffix:
  unsigned-suffix long-suffix?
  unsigned-suffix long-long-suffix
  long-suffix unsigned-suffix?
  long-long-suffix unsigned-suffix?

unsigned-suffix:
  [uU]

long-suffix:
  [lL]

long-long-suffix:
  'll' | 'LL'

floating-constant:
  decimal-floating-constant
  hexadecimal-floating-constant

decimal-floating-constant:
  fractional-constant exponent-part? floating-suffix?
  digit-sequence exponent-part floating-suffix?

hexadecimal-floating-constant:
  hexadecimal-prefix hexadecimal-fractional-constant binary-exponent-part floating-suffix?
  hexadecimal-prefix hexadecimal-digit-sequence binary-exponent-part floating-suffix?

fractional-constant:
  digit-sequence? '.' digit-sequence
  digit-sequence '.'

exponent-part:
  'e' sign? digit-sequence
  'e' sign? digit-sequence

sign:
  '+' | '-'

digit-sequence:
  digit+

hexadecimal-fractional-constant:
  hexadecimal-digit-sequence? '.' hexadecimal-digit-sequence
  hexadecimal-digit-sequence '.'

binary-exponent-part:
  'p' sign? digit-sequence
  'p' sign? digit-sequence

hexadecimal-digit-sequence:
  hexadecimal-digit+

floating-suffix:
  'f' | 'l' | 'F' | 'L'

character-constant:
  '\''  c-char-sequence '\''
  'l\'' c-char-sequence '\''
  'u\'' c-char-sequence '\''
  'u\'' c-char-sequence '\''

c-char-sequence:
  c-char+

c-char:
  ~['\\\r\n]
  escape-sequence

escape-sequence:
  simple-escape-sequence
  octal-escape-sequence
  hexadecimal-escape-sequence
  universal-character-name

simple-escape-sequence:
    :   '\\' ['"?abfnrtv\\]

octal-escape-sequence:
  '\\' octal-digit
  '\\' octal-digit octal-digit
  '\\' octal-digit octal-digit octal-digit

hexadecimal-escape-sequence:
  '\\x' hexadecimal-digit+

string-literal:
  encoding-prefix? '"' s-char-sequence? '"'

encoding-prefix:
  'u8'
  'u'
  'U'
  'L'

s-char-sequence:
  s-char+

s-char:
  ~["\\\r\n]
  escape-sequence
  '\\\n'
  '\\\r\n'

SourceManager should take care of multiple files

Right now, SourceManager can only handle one file. Its content are read and kept in a byte-stream. It should be capable of loading multiple files and concatenate all content into one big buffer. Every file would have an offset into the buffer, pointing where that file starts at. That's useful, for example, to discovering which file a SourceLocation belongs to.

enumerator

enum-specifier:
  'enum' identifier? '{' enumerator-list '}'
  'enum' identifier? '{' enumerator-list ',' '}'
  'enum' identifier

enumerator-list:
  enumerator
  enumerator-list ',' enumerator

enumerator:
  enumeration-constant
  enumeration-constant '=' constant-expression

enumeration-constant:
  identifier

`-dump-ast` command-line option

-dump-ast makes the compiler dump the generated AST from translation units to stderr.

The AST would be broken down to entity and display:

  • Entity is the expression name (e.g. IfCondition, Operator etc).
  • Display is a user-define name (e.g. in int var;, VariableDeclaration is the entity name, and var is the display name).

Syntax could be similar to libclang's:

int main()
{
    int i = 42;
    return i;
}

Would generate:

- FunctionDefinition(int(), main)
    - Operator(=)
        - VarDeclaration(int, i)
        - Constant(42)
    - Return(i)

This is just an illustration, and it can/will change.

edit-1:

Stick to the current general AST. Example:

$ cat src.c
int main(int argc, char** argv)
{
  return 0;
}

$ ccompiler -dump-ast src.c
compilation unit:
  function definition:
    declaration specifiers:
      type specifier(int)
    direct declarator:
      identifier(main)
      parameter list:
        parameter declaration:
          declaration specifiers:
            type specifier(int)
          identifier(argc)
        parameter declaration:
          declaration specifiers:
            type specifier(char)
          declarator:
            pointer declarator(*):
              pointer declarator(*)
            identifier(argv)
    compound statement({):
      jump statement(return):
        integer constant(0)

There will be another compiler option to emit the intermediate language (IR).

Allow empty structs as an extesion

C90-§6.2.5-20:

A structure type describes a sequentially allocated nonempty set of member objects

Generate a warning if -pedantic is on. Disable this extension with -fno-empty-structures.

initializer

initializer:
  assignment-expression
  '{' initializer-list '}'
  '{' initializer-list ',' '}'

initializer-list:
  designation? initializer
  initializer-list ',' designation? initializer

designation:
  designator-list '='

designator-list:
  designator
  designator-list designator

designator:
  '[' constant-expression ']'
  '.' identifier

keywords

auto: 'auto'
break: 'break'
case: 'case'
char: 'char'
const: 'const'
continue: 'continue'
default: 'default'
do: 'do'
double: 'double'
else: 'else'
enum: 'enum'
extern: 'extern'
float: 'float'
for: 'for'
goto: 'goto'
if: 'if'
inline: 'inline'
int: 'int'
long: 'long'
register: 'register'
restrict: 'restrict'
return: 'return'
short: 'short'
signed: 'signed'
sizeof: 'sizeof'
static: 'static'
struct: 'struct'
switch: 'switch'
typedef: 'typedef'
union: 'union'
unsigned: 'unsigned'
void: 'void'
volatile: 'volatile'
while: 'while'

alignas: '_Alignas'
alignof: '_Alignof'
atomic: '_Atomic'
bool: '_Bool'
complex: '_Complex'
generic: '_Generic'
imaginary: '_Imaginary'
noreturn: '_Noreturn'
static-assert: '_Static_assert'
thread-local: '_Thread_local'

left-paren: '('
right-paren: ')'
left-bracket: '['
right-bracket: ']'
left-brace: '{'
right-brace: '}'

less: '<'
less-equal: '<='
greater: '>'
greater-equal: '>='
left-shift: '<<'
right-shift: '>>'

plus: '+'
plus-plus: '++'
minus: '-'
minus-minus: '--'
star: '*'
div: '/'
mod: '%'

and: '&'
or: '|'
and-and: '&&'
or-or: '||'
caret: '^'
not: '!'
tilde: '~'

question: '?'
colon: ':'
semi: ';'
comma: ','

assign: '='
star-assign: '*='
div-assign: '/='
mod-assign: '%='
plus-assign: '+='
minus-assign: '-='
left-shift-assign: '<<='
right-shift-assign: '>>='
and-assign: '&='
xor-assign: '^='
or-assign: '|='

equal: '=='
not-equal: '!='

arrow: '->'
dot: '.'
ellipsis: '...'

postfix-expression

postfix-expression:
   primary-expression
   postfix-expression '[' expression ']'
   postfix-expression '(' argument-expression-list? ')'
   postfix-expression '.' identifier
   postfix-expression '->' identifier
   postfix-expression '++'
   postfix-expression '--'
   '(' type-name ')' '{' initializer-list '}'
   '(' type-name ')' '{' initializer-list ',' '}'

type-name

type-name:
  specifier-qualifier-list abstract-declarator?

specifier-qualifier-list:
  (type-specifier | type-qualifier)+

abstract-declarator:
  pointer
  pointer? direct-abstract-declarator

direct-abstract-declarator:
  '(' abstract-declarator ')'
  '[' type-qualifier-list? assignment-expression? ']'
  '[' 'static' type-qualifier-list? assignment-expression ']'
  '[' type-qualifier-list 'static' assignment-expression ']'
  '[' '*' ']'
  '(' parameter-type-list? ')'
  direct-abstract-declarator '[' type-qualifier-list? assignment-expression? ']'
  direct-abstract-declarator '[' 'static' type-qualifier-list? assignment-expression ']'
  direct-abstract-declarator '[' type-qualifier-list 'static' assignment-expression ']'
  direct-abstract-declarator '[' '*' ']'
  direct-abstract-declarator '(' parameter-type-list? ')'

pointer:
  '*' type-qualifier-list?
  '*' type-qualifier-list? pointer

type-qualifier-list:
  type-qualifier+

type-qualifier:
  'const'
  'restrict'
  'volatile'
  '_Atomic'

type-specifier:
  'void'
  'char'
  'short'
  'int'
  'long'
  'float'
  'double'
  'signed'
  'unsigned'
  '_Bool'
  '_Complex'
  '__m128'
  '__m128d'
  '__m128i'
  atomic-type-specifier
  struct-or-union-specifier
  enum-specifier
  typedef-name

atomic-type-specifier:
  '_Atomic' '(' type-name ')'

typedef-name:
  identifier

statement

statement:
  labeled-statement
  compound-statement
  expression-statement
  selection-statement
  iteration-statement
  jump-statement
  // missing `asm` statement

labeled-statement:
  identifier ':' statement
  'case' constant-expression ':' statement
  'default' ':' statement

compound-statement:
  '{' block-item-list? '}'

block-item-list:
  block-item
  block-item-list block-item

block-item:
  declaration
  statement

expression-statement:
  expression? ';'

selection-statement:
  'if' '(' expression ')' statement ('else' statement)?
  'switch' '(' expression ')' statement

iteration-statement:
  'while' '(' expression ')' statement
  'do' statement 'while' '(' expression ')' ';'
  'for' '(' expression? ';' expression? ';' expression? ')' statement
  'for' '(' declaration expression? ';' expression? ')' statement

jump-statement:
  'goto' identifier ';'
  'continue' ';'
  'break' ';'
  'return' expression? ';'

compilation-unit

compilation-unit:
  translation-unit? EOF

translation-unit:
  external-declaration
  translation-unit external-declaration

external-declaration:
  function-definition
  declaration
  ';'

function-definition:
  declaration-specifiers declarator declaration-list? compound-statement

declaration-list:
  declaration
  declaration-list declaration

Accept only ';' or declarator after struct-union or enum declarator

This should be an error:

struct S { int i; }
//                 ^ missing ';'
int main() {}

Currently it compiles to:

compilation unit:
  function definition:
    declaration specifiers:
      struct or union specifier(struct):
        identifier(S)
        struct declaration:
          type specifier(int)
          identifier(i)
      type specifier(int)
    direct declarator:
      identifier(main)
      empty
    compound statement({):
      empty

static-assert

static-assert-declaration:
  '_Static_assert' '(' constant-expression ',' string-literal+ ')' ';'

struct and union

struct-or-union-specifier:
  struct-or-union identifier? '{' struct-declaration-list '}'
  struct-or-union identifier

struct-or-union:
  'struct'
  'union'

struct-declaration-list:
  struct-declaration
  struct-declaration-list struct-declaration

struct-declaration:
  specifier-qualifier-list struct-declarator-list? ';'
  static-assert-declaration

struct-declarator-list:
  struct-declarator
  struct-declarator-list ',' struct-declarator

struct-declarator:
  declarator
  declarator? ':' constant-expression

constant prefixes and suffixes

Integers:

integer-constant
    decimal-constant integer-suffix?
    octal-constant integer-suffix?
    hexadecimal-constant integer-suffix?
    binary-constant

integer-suffix
    unsigned-suffix long-suffix?
    unsigned-suffix long-long-suffix
    long-suffix unsigned-suffix?
    long-long-suffix unsigned-suffix?


unsigned-suffix
    [uU]

long-suffix
    [lL]

long-long-suffix
    'll' | 'LL'

Floating constant:

floating-suffix
    'f' | 'l' | 'F' | 'L'

Character and string literals:

character-constant
    '\'' c-char-sequence '\''
    'L\'' c-char-sequence '\''
    'u\'' c-char-sequence '\''
    'U\'' c-char-sequence '\''

encoding-prefix
    'u8' | 'u' | 'U' | 'L'

Lexical analyses should provide a token for every pre- and suffix.

Complete rewriting of the front-end

Some difficulties were faced in order to maintain and implement features in the then diagnostics system, source management, source lexing and parsing. For that reason, they are getting completely rewritten to follow a library-like design.

TODO list:

  • Source and file management
  • Diagnostics system
  • Preprocessor
  • Source lexing
  • Source parsing
  • Semantic analyses

After all of the above items are checked out, we can then proceed to implementing the back-end.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.