Giter VIP home page Giter VIP logo

cxw42 / do-not-self-host Goto Github PK

View Code? Open in Web Editor NEW
2.0 4.0 0.0 154 KB

A toolchain starting from assembly so you don't have to self-host your next programming language

License: Other

Python 45.59% C 14.87% Makefile 0.95% Perl 30.29% Raku 8.30%
programming-language programming-language-development self-hosting assembly vm virtual-machine bytecode bytecode-interpreter interpreter assembler

do-not-self-host's Introduction

do-not-self-host

A development toolchain from the ground up, starting from assembly. Don't self-host your next language! Make it possible for us to build from source, from scratch, without needing a bootstrap package!

This is a long-term hobby project, so please do not expect regular updates :) . However, I certainly welcome others who want to contribute.

Assumes a development environment that provides stdin/stdout and redirection.

Current status

  • ngb: VM (in C)
  • ngbasm: assembler (in Python)

Editor support

ngb assembly files have the extension .nas. A Vim syntax configuration is available here.

I'm not the only one

The Facebook Buck build system also doesn't self-host by default (although it can). The Buck FAQ says, in part:

Q: Why is Buck built with Ant instead of Buck?

A: Self-hosting systems can be more difficult to maintain and debug. If Buck built itself using Buck, then every time a change was made to Buck's source, the commit would have to include a new Buck binary that included that change. It would be easy to forget to include the binary, difficult to verify that it was the correct binary, and wasteful to bloat the Git history of the repository with binaries that could be rebuilt from source. Building Buck using Ant ensures we are always building from source, which is simpler to verify.

Installation and testing

The code is currently C and Python, but the infrastructure runs in Perl. Tests use Perl's prove.

Building

  • If you don't already have it, install Perl (e.g., using perlbrew).
  • Install cpanminus.

Then build using:

perl Makefile.PL
cpanm --installdeps .
make
cd mtok
make

Once you have run the perl and cpanm steps, you shouldn't need to do so again if you are only working on the C/Python/ngbasm sources. Just run make as necessary.

Testing

Once you have done the build steps, run prove or make test in the top level of the repository.

Older notes

Based on crcx/Nga-Bootstrap, which provides:

  • naje - a basic assembler (Python)
  • nmfcx - a Machine Forth Cross Compiler (Retro)

In the pipeline:

  • NGA+:

    • Implement NGA VM in x86 assembly (NASM?)
    • Read/write stdin/stdout (port-based, a la retro? Maybe not - that's flexible, but perhaps more than we need).
    • Add support for record blocks A and B - configurable number of fields per block; aload, astore, bload, bstore, aread, awrite, bread, bwrite
    • .const
  • Minimal Infix High-Level Language (Minhi) - <program>::=<expression>+, and everything else is an expression.

    • Why expressions? Because infix expressions are easy to parse based on a table, as described in A Retargetable C Compiler: Design and Implementation.
    • Lexer written in NGA+ that takes source and outputs token stream
    • Parser written in NGA+ that takes token stream (block A) and outputs AST (block B)
    • Compiler that produces NGA+ assembly
    • Later, a compiler that produces x86 assembly

Future: to be determined... (but possibly a C compiler written in Minhi)

do-not-self-host's People

Contributors

crcx avatar cxw42 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

do-not-self-host's Issues

Add debug info to ngb, ngbasm

Because it would be nice to have while reading traces.

  • ngbasm: Emit the symbol<->address map
  • ngb: Read the map if present in an input file
  • ngb: while printing trace output, also print the symbol name if any matches.

Implement Minhi

  • Tokenizer (currently regex-based)
  • Table-driven parser
  • Bytecode generator

Add scopes to the assembler

E.g., so this will work:

:global_var
    .data 42
.scope
    :local1_var
         .data 1
    fetch &global_var    ; OK
    fetch &local1_var    ; OK
    fetch &local2_var    ; assembly-time error - not in scope
.endscope
.scope
    :local2_var
         .data 2
    fetch &global_var    ; OK
    fetch &local1_var    ; assembly-time error - not in scope
    fetch &local2_var    ; OK
.endscope

Make ngb VM and assembler

General:

  • Add Makefile

VM:

  • Rename nga->ngb (to avoid confusion). (Note: I'm basing ngb on nga rather than retro since nga is much simpler, and I will eventually be implementing the VM in x86 assembly.)
  • Add in, out instructions (currently in ngaita.c)
  • Add err instruction to output to stderr (since we're going to want that for the toolchain)
  • Add iseof (push (TOS == -1) ? -1 : 0); leave the character on the stack)
  • Rename cjump to ccall, since that's what it is in nga.c
  • Add real cjump
  • Add dedicated counter variable and setcount/getcount instructions
  • Add loop &label a la x86 (because it's a nice, convenient thing to have).
    • I think this should be loopcheck &done_label so you can put it at the top of the loop, since most of the loops I have written have been top-check. The resulting loop would look like:

          setcount 5
      :loop
          loopcheck &loop_done    ; branch if counter==0; otherwise, decrement counter.
          getcount
          outnum
          jump &loop
      :loop_done
      

Assembler:

  • Permit comments in asm source files
  • Add .include directive
  • Add .lit <n> directive that emits a lit <n> instruction.
  • Change syntax so that any operand becomes a lit <n> before the instruction. E.g., cjump &done is assembled to lit, &done, cjump. Or eq 32 becomes lit, 32, eq. (This requires more care on the programmer's part with stack discipline :) .)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.