Giter VIP home page Giter VIP logo

m-forth's Introduction

M-Forth

Abstract

M-Forth is an implementation of FORTH for the M1 processor on Mac OS X.

Revision history

2021-12-06: M-Forth is now capable of running all the code in stdlib.f. Not all primitives are ready though, and documentation is still incomplete. The primitive "." prints in hex until I've written the proper definition in stdlib.f.

2021-12-15: M-Forth can now print numbers with . which respects the number base stored in the BASE variable. The words DECIMAL and HEX in stdlib.f changes BASE to 10 or 16, respectively.

2021-12-20: Fixed some serious bugs with LITSTRING and >CFA so now you can print strings with ." Hello world".

2021-12-28: Very stable now. Almost all of the library code in stdlib.f is ready. (Expect to be finished in a few days.)

2021-12-31: It's New Year's Eve and I managed to get M-FORTH up and running. There's an annoying bug (?) with FILE-OPEN which returns the file descriptor 2 rather than a negative value. Also, it would be nice if DOES> could be implemented. But we have to save something for 2022!

Introduction and background

This document will not explain FORTH in detail — there are plenty of resources for that. Rather, we will just cover the salient features of the language, in order to understand the implementation.

Forth words are akin to subroutines in other programming languages. Words can be either primitive, i.e., written in assembly language, or defined in terms of other words. The syntax for defining a new word is

: word word1 word2wordn ;

which is equivalent to

procedureword() { call word1(); call word2(); … call wordn(); }

in a traditional programming language. Notice that in Forth, arguments to words are not named. Instead, they are assumed to appear on a stack. As an example, let's look at the following definitions:
: DOUBLE DUP PLUS ;
: QUADRUPLE DOUBLE DOUBLE ;
Here, we define two new words — DOUBLE and QUADRUPLE — using the primitives DUP and PLUS. (The primitive DUP duplicates the top element of the stack.) In a working FORTH system, we could then continue and enter
42 QUADRUPLE .
and the system would reply 168. (If you find this confusing, I urge you read a tutorial on FORTH first.)

FORTH programmers can enjoy identifiers with pretty much any character (but not space), so . is another primitive which removes and prints out the top stack element.

The inner interpreter

As we've seen, FORTH words can be either primitives — consisting entirely of assembly language snippets — or user-defined, referencing a list of previously defined words (which, in themselves, reference other words, etc, until they ultimately reference a primitive).

We will define the inner interpreter — the core of the FORTH system. To execute a primitive word like DUP or PLUS is obvious: simply call the assembly code. But in order to execute a user-defined word, we must potentially invoke the interpreter recursively. For instance, when executing QUADRUPLE we must (twice) execute the word DOUBLE which, in itself, must execute DUP and PLUS.

In a higher-level language, the interpreter could be expressed recursively but here we must create a tight and fast interpreter in assembly language. It is therefore necessary for the interpreter to maintain another stack to remember the location before going off to execute the list of words in the referenced word. More specifically, when we execute the body of QUADRUPLE we first encounter DOUBLE so before calling the interpreter again with DOUBLE we save the location of where we must continue when we are done (which, incidentally, is another call to DOUBLE).

M-Forth uses "Indirect Threading". In the first stage we write the so called "inner interpreter" which can execute words but does not handle user input, i.e., all words (primitive or not) must be defined in the assembler file.

First we need to look at the data structure for word definitions. Let's look at the previous example:

: DOUBLE DUP PLUS ;
: QUADRUPLE DOUBLE DOUBLE ;

In this figure, gray cells denote the so called "codewords" which contains a pointer to a piece of code that handles the word. What is meant by "handling" the word depends on the word being a primitive (written in assembler) or not (consisting of links to other words).

M-FORTH language specifics

Control structures

condition IF true-part THEN

Algol equivalent: IF condition THEN true-part

( Calculate the absolute value, e.g., -3 ABS -- 3 )
: ABS           ( a -- |a| )
  DUP           ( a a )
  0< IF         ( a )
    NEGATE      ( -a )
  THEN
;

condition IF true-part ELSE false-part THEN

Algol equivalent: IF condition THEN true-part ELSE else-part

( Find maximum value, e.g., 5 9 MAX -- 9 )
: MAX           ( a b -- max(a,b) )
  2DUP          ( a b a b )
  > IF          ( a b )
    DROP        ( a )
  ELSE
    SWAP DROP   ( b )
  THEN
;

BEGIN loop-part condition UNTIL

Algol: DO loop-part WHILE condition

( Write n spaces, e.g., 10 SPACES writes 10 " " )
: SPACES        ( n -- )
  BEGIN
    SPACE       
    -1 +        ( n-1 -- )
    DUP 0 =
  UNTIL         ( until n=0 )
;

This is compiled to

        ,-----------------------------.
        V                              \
: SPACES SPACE -1 + DUP 0 = 0BRANCH ( -72 ) :

BEGIN loop-part AGAIN

Algol: WHILE true DO loop-part

( Print random numbers 0..9 until 0 occurs )
: RANDOM-NUMBERS  ( -- )
  RANDOMIZE
  BEGIN
    RND 10 MOD
    DUP . CR      ( print random number 0..9 )
    0= IF
      EXIT        ( use EXIT to leave infinite loop )
    THEN        
  AGAIN
;

Note that since BEGIN ... AGAIN forms an infinite loop, you must exit with EXIT, which returns from the word.

BEGIN condition WHILE loop-part REPEAT

Algol: WHILE condition DO loop-part

( Calculate sum of a + ... + b, e.g., 1 10 SUM -- 55 )
: SUM           ( a b -- sum ) ( if a < b, sum is 0 )
  0             ( a b 0 )
  -ROT          ( 0 a b )
  BEGIN
    2DUP <=     ( while a <= b ... )
  WHILE         ( sum a b )
    -ROT        ( b sum a )
    DUP         ( b sum a a )
    -ROT        ( b a sum a )
    +           ( b a sum+a )
    -ROT        ( sum+a b a )
    1+          ( sum+a b a+1 )
    SWAP        ( sum+a a+1 b )
  REPEAT
  DROP DROP     ( sum )
;

condition UNLESS true-part THEN

Algol: IF not condition THEN true-part

( Return TRUE if negative, otherwise FALSE, e.g., -3 NEGATIVE -- TRUE )
: NEGATIVE	( a -- TRUE | FALSE )
  TRUE		( a TRUE )
  SWAP		( TRUE a )
  0< UNLESS
    1+		( TRUE -> FALSE )
  THEN
;

Deviations from Jones Forth

  • Jones relies heavily on macros defcode, defvar and defconst for defining primitives, variables and constants. The assembler I use does not let me re-define the variable link which is used to chain together all the words. Instead I wrote a small Python script words.py which generates the necessary assembler definitions for all the primitives, variables and constants. The Makefile generates the file primitives.s which is then included in the main file M.s.

  • Rather than putting magic numbers like 4 and 8 in the code, I've decided to add two constants __WORDSIZE and __STACKITEMSIZE. These are (as the names imply) the number of bytes in a word (8) and the size of a stack element (16). These are restrictions enforced by the ARM architecture, so leave them alone unless you are porting this to another platform.

  • As a result of the above, I have not implemented primitives 4+ and 4-. You should use __WORDSIZE + if you are messing around with the internal data structures.

  • M-FORTH checks for stack underflow on the data stack, and stack overflow on the return stack. You can disable this with the STACKWARNING option in the Makefile but in my experience, checking for stack underflow/overflow doesn't add much to overall execution time. While it would be nice to check for data stack overflow and return stack underflow, this is unfortunately not implemented (yet).

  • I've decided to use -1 for true and 0 for false, unlike Jones Forth but in sync with ANS FORTH. I think the reason for sticking to the ANS FORTH convention is that you can use AND, OR, etc both as boolean and bitwise logical operations.

  • Jones Forth relies on the brk(2) system call to figure out where the data segment starts and to request some initial space. In recent Mac OS X, brk has been discontinued so I allocate space for the data segment with a constant INITIAL_DATA_SEGMENT_SIZE in the BSS segment. This does not seem to affect the size of the binary for M-FORTH.

  • The user-defined word .S prints the contents of the stack (non-destructively) but in my implementation I've decided to (1) print the content with . rather than U. so that negative values don't appear unsigned; (2) the stack is printed in the order so that the top element is shown to the right.

  • Normally, M-FORTH is started with

    cat stdlib.f - | ./M

but if M-FORTH is started without loading stdlib.f, there is still a primitive form of . which prints the top-of-stack as a 64-bit hexadecimal number. This is a useful debugging tool when working with the core interpreter.

Things to do (unsorted)

  • DO ... LOOP with LEAVE, CONTINUE and I inside the body
  • A variable, like BASE for case sensitivity, so we can write dup as well as DUP.
  • Documentation
  • >DFA uses INCR8 which should be called something else.

m-forth's People

Contributors

kjepo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.