Comments (4)
How would we go about describing instruction invocations? They're not context-free -- they're completely mysterious until it's matching time, and are parsed in context of each possible instruction. Meaning if two possible instructions have expression slots in different spots, what is considered an expression is going to change for the same invocation.
For example:
#ruledef
{
ld {a}, x + 1 => 0x11
ld x + 1, {a} => 0x22
}
x = 0
ld x + 1, x + 1 ; invocation
It's undefined what the invocation syntax is until the parsing algorithm runs, which will try to parse it twice: one pass for each rule you declared beforehand. When x + 1
is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).
With that in mind, do you still think it would make sense to keep an EBNF grammar around? Maybe for the other parts of the language?
The reason asm
block parameters need enclosing braces is to enable the assembler to perform substitution token-for-token, without syntactic context -- since braces are some of the only tokens not allowed to be part of an instruction's pattern, it's easy to spot them in a context-free manner.
Now, the reason you can also specify numerical asm
block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.
from customasm.
My argument is that there is a grammar for the metalanguage. It might look something like this, just kind of making it up:
letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
| "H" | "I" | "J" | "K" | "L" | "M" | "N"
| "O" | "P" | "Q" | "R" | "S" | "T" | "U"
| "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
| "c" | "d" | "e" | "f" | "g" | "h" | "i"
| "j" | "k" | "l" | "m" | "n" | "o" | "p"
| "q" | "r" | "s" | "t" | "u" | "v" | "w"
| "x" | "y" | "z" ;
nonzero digit = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit = nonzero digit | "0" ;
binary digit = "0" | "1" ;
octal digit = binary digit | "2" | "3" | "4" | "5" | "6" | "7" ;
hex digit = decimal digit | "A" | "B" | "C" | "D" | "E" | "F"
| "a" | "b" | "c" | "d" | "e" | "f" ;
character = letter | digit | "_" ;
binary = "0b", binary digit, { binary digit } ;
octal = "0o", octal digit, { octal digit } ;
decimal = nonzero digit, { digit } ;
hex = "0x", hex digit, { hex digit } ;
identifier = ( letter | "_" ), { character } ;
number = [ "-" ], ( binary | octal | decimal | hex ) ;
string = '"', { all characters - '"' }, '"' ;
ruledef directive = "#ruledef", white space, [ identifier ], white space, ruledef arguments ;
ruledef arguments = "{", match expression, { match expression }, "}" ;
match expression = match rule, match body ;
match rule = white space, { all characters }, "=>", white space ;
match body = expression | expressions ;
expressions = "{", white space, expression, { white space, expression }, white space "}" ;
white space = ? white space characters ? ;
all characters = ? all visible characters ? ;
With this, define what an expression
is and you have a good starting point for a grammar to write #ruledef
directives, identifiers, numbers, and strings.
I don't think it's worth trying to define the grammar of the instructions defined inside #ruledef
, which seems to be where you are getting stuck. It's enough to understand the grammar at a higher level.
When
x + 1
is specified verbatim in an instruction's pattern, it's not parsed as an expression -- it's simply parsed as a sequence of characters (currently, not even as proper tokens!).
That's perfectly fine! The grammar for the metalanguage should specify this and that solves it.
Now, the reason you can also specify numerical
asm
block parameters without the braces is kind of an oversight of mine -- behavior from before I realized you need token-for-token substitution to cover all cases. Behavior which maybe should be deprecated? All types of parameters should work fine with enclosing braces anyway, albeit changing the semantics a little.
This would probably be nice to address. AFAIK wrapping integral typed parameters in braces in the asm
context does not work.
from customasm.
I think the confusion with the asm
blocks will mostly be resolved with the next release I'm working on, where all arguments can be specified with braces within the asm
block. Feel free to open this again if you still think the EBNF is worth it!
from customasm.
I think some specification of the meta language syntax is still important even if it is not EBNF. For instance when I wrote a syntax definition for Sublime Text, I didn't have a great resource for defining the parser. It is mostly just an approximation based on the wiki and empirical observation.
Cf. #105 (comment)
from customasm.
Related Issues (20)
- Autoamtically resolve "multiple matches with the same output size" errors HOT 2
- Disambiguate instructions with whitespace differences HOT 10
- VSCode Syntax highlighting not working. HOT 2
- Add option for addressing units when outputting intelhex? HOT 1
- Error with misaligned string ends. HOT 5
- `#struct` Directive HOT 2
- Way to Include Raw Binary File HOT 1
- Feature Request: Compile for MacOS in release section. HOT 2
- Relative Jumps? HOT 2
- Are there any stability guarantees if I use customasm as a library? HOT 3
- Stack overflow / Index out of bounds with recursive rules
- v0.11 syntax error using a'b instead of b`a syntax for a constant with a specified width HOT 2
- Feature: adding space separated format
- Internal Overhaul
- "Unknown variable" when forwarding primitive arguments HOT 3
- no-color option HOT 2
- no match for instruction found HOT 2
- Feature Request: Different number base options for annotated text output. HOT 3
- Calculate `.len` of labels automatically
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from customasm.