graphitemaster / gmqcc Goto Github PK
View Code? Open in Web Editor NEWAn Improved Quake C Compiler
License: Other
An Improved Quake C Compiler
License: Other
for completeness' sake.
(adding AST and IR labels since the IR should know that it's a global without giving it DEF_SAVEGLOBAL)
I think we should provide some builtins that are common, as aliases that are hit when their functions are used (simple hashtable lookup). This will allow for greater optimization for certian things, for example:
print("this is a test ");
print(message);
print("\n");
print("test complete\n");
This causes tons of function-call overhead in terms of copying all the OFS_* globals each call and restoring them, the reality is this can be packed into one call.
print("this is a test ", message, "\ntest complete\n");
I have no clue what this would be called "function augmentation?" "function factoring?" anyways. In addition to optimization uses, we can provide better semantics such as annotations to prevent common bugs, opposed to have having a naked proto that lacks annotation.
Compile stuff like:
if not(a) { ... }
Parse support for the C function syntax.
Support arrays the way FTE does it: implicitly creating accessor functions for them.
Later we should optionally use an enhanced instruction set for them.
We'll need the &~=
assignment operator.
It's common behavior that that the following would work in terms of cpp behavior:
#define STR1(x) #x
#define STR2(x) STR1(x)
#define THE_ANSWER 42
#define THE_ANSWER_STR STR2(THE_ANSWER) /* "42" */
With ftepp THE_ANSWER_STR
results in the following string:
"THE_ANSWER"
To figure out the exact expected behavior of macro substitution I looked in the C standard documentation to find this golden gem:
6.10.3.1/1:
... After the arguments for the invocation of a function-like macro have been identified,
argument substitution takes place. A parameter in the replacement list, unless preceded
by a # or ## preprocessing token or followed by a ## preprocessing token (see below), is
replaced by the corresponding argument after all macros contained therein have been expanded...
This sounds confusing due to the way ANSI tried to redact their documents, essentially (it means):
when we use: STR1(THE_ANSWER)
we should get "THE_ANSWER"
, because the argument of STR1
is not macro-expanded. However, the argument of STR2 is macro-expanded when it's substituted into the definition of STR2
, which therefore gives STR1
an argument of 42, with the result of "42"
.
I suspect there is some code that heavily requires this behavior (that didn't surface because valid strings were still generated, albeit semantically generated wrong)
Make the parser thinner, add some AST nodes to represent "idents", (eg. a general "variable-by-name", or "type-by-name" node), which are resolved over multiple steps.
This also means moving constant folding and type-checking completely into the AST. Since the ast already contains file+line information this should be sufficient for all error output.
An effect of this would be that functions can be used without adding a prototype before using them. The same should apply to types like structs and unions once they're implemented. Though this may represent a challenge in the parser since it's hard to know if what we parse is a variable declaration when the type has not yet been defined as such. At first we might just error in such a situation and force the use of the 'local' (or "global") keyword to declare variables with a not-yet defined type.
OTOH we could replace the "local" keyword with "let", then we can use it in both the local and global space without requiring two seperate keywords.
FTE compatible preprocessor for -std=fteqcc
We do not need to support everything. Eg. #if does not need to take variables (even consts) into account, it suffices to work with #defined numbers/conditions for #if conditions. Otherwise it'll be a too heavy task, plus migrating to a more "modern" way of compiling would be hindered. (eg. splitting up the work over more passes to avoid the need for prototyping functions)
Note: special about FTE's preprocessor is its recursiveness: #define can be used inside a macro, thus even allows re#defining a macro to something else on use.
(done) prefix/postfix ++
, --
(done) ternary a?b:c
(done) &~=
what else?
id1 compiles, yet a lot more tests should be made, especially ones which are supposed to fail: (assigning variables of invalid types, operating on variables of incompatible types, passing invalid parameter types to functions, redefining functions with different parameter type or count, or different return type, ...)
It's used for function-types.
00:34:23 * | graphitemaster wants to do something that isn't actual work
C99 says:
"The behavior might cause translation to fail or cause the translator or the resulting program to behave in a non-conforming manner. Any such pragma that is not recognized by the implementation is ignored."
gmqcc does:
../common/util-pre.qh:1: parse error: unrecognized hash-keyword: flag
../common/util-pre.qh:1: parse error: junk after pragma
For reference, gcc does:
foo.c:1:0: warning: ignoring #pragma FOO BAR [-Wunknown-pragmas]
(but not by default)
Personally I suggest a warning for this, enabled by default, but possible to turn off.
I think it's crucial we support standard C/C++ digraph and trigraphs in the lexical analyzer for -std=gmqcc. Some people oppose the idea of it, and it should WARN by default (that drigraphs and trigraphs are just strange ..) it's important (from a constricted environment to support it).
If two functions in a row are called where (1) one of the parameters is the
same, and (2) the first function doesn't overwrite the OFS_PARAM
globals,
then it is not actually necessary to copy the parameter a second time.
could add this as: -Oparameter-omission
For projects like xonotic where there is a lot of gamecode, I think it would be nice to see namespaces for organizing code a little better. I expect the process to work exactly like it does in C++ something along the lines of:
namespace name {
{{{code}}}
}
and then for importing individual functions and variables into the global scope:
using namespace::ident;
This would allow for a hierarchies, server code in namespace server for example, while client in another namespace.
The standard output of the error/warning system needs to be entirely replaced with a log system. Something allowing enabling of coloring (which should talk with istty() to see if said terminal can even do color). This log system should be used by all systems in the compiler, from the lexing/parsing/ast/ssa/codegen (future assembler) etc. This system needs to also allow segregation of message types, i.e splitting warnings/errors to go to different file descriptors (errors to stderr, warnings to stdout as an example). Ability for configurable output files. I.E logging of warnings and errors to user defined files. This should be allowed via command line flags (stdout/stderr also being allowed as valid identifiers at the command line to select segregation types). I'd like to strip the heavy use of inline strings for warnings/errors and mitigate those to standard definition files for future i18n. This system should be defaulted to work the way we currently handle output. Ideally the general direction of this system is complete standalone, in the sense that the logger can be compiled as a separate C file, where everything else is required not to touch the stdio.h print functions. Or even consider using stdio.h include
Mark calls and stores as having side-effects, have ast nodes propagate this flag from their sub-expressions, and accumulate a node "weight" defaulting to 1 for all nodes.
Array setter/getter via functions would receive a weight based on the size of the array (or, since we use a binary search, based on the base 2 logarithm of the array's size).
Since calls set the side-effect flag, they don't need to have a sensible weight.
The reason: Since original QC has no early-out, nodes for &&
and ||
could still perform early-out in cases where the node is marked as having no side effects (iow. when neither side of the expression can have side-effects).
The weight could be used as a general optimization when performing early out: start with the shorter side.
Should be allowed because it's so common. With a warning by default.
Considering entity-members like genericValue1
, genericValue2
, and so on, and since it is common to use entity-chains as lists of data, it may be desirable not to have to have members of each exclusive "data-type" side by side, but in a union.
For instance:
// declare the union type of a double-linked list of floats
struct ListOfFloats {
float value;
entity next;
entity prev;
}
// or a list of entities
struct ListOfEntities {
entity ent;
entity next;
}
// collect all those lists in a union:
union EntityClasses {
ListOfFloats floatlist;
ListOfEntities entitylist;
}
// append the union to the entity-members:
.EntityClasses classes;
This will reduce the size of entities, but is rather ugly to type and requires the coders to keep a list of all the unions in the "EntityClasses" type.
For this there should be syntactic sugar, to allow extending entities at any place without listing them in some "collection-union".
The syntax is subject to change, but here's a taste:
entityclass FloatList {
float value;
entity next;
entity prev;
}
entityclass EntityList {
entity ent;
entity next;
}
// All entityclass structures are automatically appended after the last explicitly created entity-member, so the union lies at the end of the entity data.
// you can implicitly cast between the `entity` and any entity-class type:
EntityList ls = an_entity;
// you cannot cast between two specialized types implicitly, it will error
Additionally each entity could have a __builtin_typeid
member, the type-IDs are compiler generated (in an unspecified way) and can be accessed with a typeid()
intrinsic:
// to get the numerical value of a typename:
float id_of_EntityList = typeid(EntityList);
// to check the type of an entity
entity e = something();
if (typeid(e) == typeid(EntityList)) { ... }
I was thinking of including an ini parser in gmqcc and some logic for it to understand configuration files so users can set defaults for command line arguments, essentially it would be a file that exists in the same directory as the progs.src being compiled, simply named gmqcc.ini.
The warning shouldn't come from the parser because it cannot easily look into if/else. The ast's ir-gen is a better place.
Since QC doesn't really allow any real optimization for switch-statements (like tables, because using arrays in QC is just really slow), we could treat it like a huge if-statement and relax it a bit: a case could contain an actual non-constant expression, or a list or range:
switch (x) {
case 3: code(); break;
case y: code(); break; // y is a variable
case y + 3*z: code(); break; // z is also a variable
case {3, 4, 5}: code(); break; // is the same as: case 3: case 4: case 5:
case {6..9}: code(); break; // same as case 6: case 7: case 8: case 9: optimized to GT/LT instructions
case {13, 14, 15..19, 23}: code(); break; // mixing
// or all together
case {23, y, 1..5, 3*z, 8*y..3*z}:
code();
break;
}
Thoughts? Other than that it won't get into 0.2 ;)
I already mentioned it in #11.
We'd like to be able to label loops to be able to jump out of multiple loops at once:
We'll need to discuss the syntax, the following is just an illustration of what it should do:
while [label_x] (cond) {
code();
while (inner) {
foo();
if (something)
break label_x; // break out of the outter loop
}
}
Open for syntax suggestions...
while:label (...) {...}
<- current favourite I think?
while "label" (...) {...}
while (...) label {...}
we currently lack an enum keyword. Which I think we could greatly benefit from. Opposed to macros they're automatic (the ordering changes as you insert elements into it). they're also scoped, opposed to macros (which are just tetual substitution).
The problem with current macros is they're seriously underpowered for no reason. Many a times in C/C++ I've wanted the following for enumerations:
Often in C programs you may see something like:
enum optional_name {
FLAG_A = 1 << 1,
FLAG_B = 1 << 2,
FLAG_C = 1 << 3
};
The reality is you're better off with macros because you're explicitly setting the values.
I propose for an attribute field for enum that allows you to specify if it's flagged or not
enum(flagged) {
FLAG_A, // 1
FLAG_B, // 2
FLAG_C, // 4
}
Sometimes you may want to specify the side of the incrementation. For example
opposed to incrementing the right side, you may want the left. To do this we may supply
an additional attribute such as: "right", or "left", where left is default.
enum(flagged, right) {
FLAG_A = (4 << 1), // start here
FLAG_B, // (5 << 1) -> 10
FLAG_C // (6 << 1) -> 12
};
You may specify both as well so both sides increment
enum (flagged, right, left) {
FLAG_A = (1 << 1),
FLAG_B, // (2 << 2),
FLAG_C, // (3 << 3)
};
Unlike traditional C/C++ the name should force a scope on A,B and C, opposed to putting them
into global scope. There should be a required name::A, name::B, name::C (like namespaces) to use them.
enum(normal) name {
A, B ,C
};
Often you may want to get a string literal of "FLAG_A" (for debugging). This is often really hard to do without the help of some clever macros or libraries that perform voo-doo. I propose we implement support for string literal identifers in the form of basic stringize operate that is used in the preprocessor. #FLAG_A
It's unambiguous, and already is noted for being "stringize" anyways.
Very often I see (or even run) into cases where the layout of enumerations are very specific: for example something must exist after something, or before it.
I propose a bounds protection system
enum(normal) name {
A:before(B),
B:after(A),
C:after(A,B),
Z:before(_),
_
};
Note the :before and :after tags are sort of like binary predicates :)
If at any time the ordering gets mixed up a compiler error message should occur. This will catch bugs early in the act. Instead of surfacing up later when no one knows what he/she changed to cause it.
Now you remember the binary predicates :before and :after for enumerations. Well, I'd like to allow user-defined predicates on enumerations such as:
enumpred less(X, Y) -> !(X < Y);
enum {
X = 10,
Y = 0,
A:less(X) // 1 < 10 (pass)
B:less(A) // 1 < 2 (fail compiler error)
};
Note: the first argument of the predicate is the currently being defined constant that has a predicate
tag preceding it.
Binary predicates are compile-time lambdas essentially, they're unallowed to reference anything other than constants, and macros. Optionally predicates may contain a body in the form of { } and an explicit return. Predicates are allowed to use variables but they'll implicitly will become constants. In no event shall a binary predicate call a function. But it can call other binary predicates.
break
and continue
are not yet implemented since they were not part of id1 QC. However all the required information is there and the ast already contains 'break' and 'continue' block targets. Just need to keep track of them and have an ast-node to actually perform that action.
(Also: for a later milestone we'll want multiscope break/continue)
Aside from there being an IF_S
in FTE's instruction set, all non-float types need to use their corresponding NOT
instruction first.
We already define GMQCC
Now we need whatever we documented but don't have yet:
__STD_MINOR_VERSION__
__STD_MAJOR_VERSION__
And depending on -std=
one of these:
__STD_QCC__
__STD_FTEQCC__
__STD_GMQCC__
The small obvious optimizations: eg a = b * c
is translated as tmp = b*a; a = tmp
and there's no need to even have a tmp
there.
They should work as functions, being inlined at compile time, with each argument simply being an ast itself. Is what I gather of implementing this cleanly. I like Perls quasi keyword, but I don't like how it attempts to do it. Let me suggest an example of how this could work.
macro loop (n) // note no actual "type" on arguments
{
/// {{{ ... }}} like perl
for (float i = 0; i < {{{n}}}; i++) {
{{{body}}}
}
}
void foo() {
loop(100) {
// do stuff
}
}
the {{{}}} are just expansions (via copy) from the AST's end. There are some implicit
ones we can generate from them, like {{{body}}}, which would be the body added
around the invocation of loop itself (so that the loop gets the body). The reason we
need these sorts of predefined ASTs is because making loop work with the body
as an argument, i.e:
macro loop(n,b) { for(float i = 0; i < {{{n}}}; i++) {{{b}}} }
would require loop to be used as:
loop(100, {/* code */})
Which looks very terrible.
I think it would be nice to support compound literals in the language. In C for example stuff such as this is legal:
return (a ? (float[]){1, 2, 3} : (float[]){3, 2, 1})[b];
It would be nice if we could support that, as well as for functions that can return arrays, such as
float foo[]() {
return (float[]){1, 2, 3, 4, 5};
}
It could become very useful for other things too, like switch statements
switch (foo) {
case (float[]){1, 2, 3, 4}:
}
Which is part of what we discussed for switch statement additions, now we have an unambiguous way of "describing" what it is. Of course we could add range semantics to it as well:
(float[]){[0 .. 5] = 0, 1 /* 6 */, 2 /* 7 */}:
Now we have a way of describing what exactly that does, "switch on compound literal". Assuming we implemented functionality for them, you'd only need a way to partially evaluate the compounds inside, which if these where implemented you'd have a method of doing. Then it's just some IR work in the back to handle.
I can see it having other interesting uses as well for calling functions that take arrays
// I understand that the built-ins can never work like this
// but our functions can
void foo(float []) {
}
// now call
foo((float[]){1, 2, 3});
They could also be of use for storing function pointers themselves
consider this chunk of code:
enum {
FUNC_UPDATE,
FUNC_FOO,
FUNC_BAR
};
void[] getfuncs() {
#ifdef SERVER
return (void[]) { s_update, s_foo, s_bar };
#else
return (void[]) { c_update, c_foo, c_bar };
#endif
}
// call like so:
getfunc()[FUNC_UPDATE]();
They have many uses, we could even use them to store computed goto
goto[] trampoline() {
l1: do_x();
l2: do_y();
l3: do_z();
l4: do_w();
return (goto[]){l1, l2, l3, l4};
}
// goto l1 would actually work like a call
trampoline()[1]; // no requirement for explicit goto
This would really be fun to (ab)use, remember not wanting to use goto well assume you had this code:
float foo() {
if (bar) goto end;
print("hi");
return 0;
end:
return 1;
}
with computed goto we can prevent this using a wrapper, something like this:
goto[] foo_jumps() {
end: return 1;
return (goto[]){end};
};
float foo() {
if (bar) foo_jumps()[0];
print("hi");
return 0;
}
It's strange syntax wise and control-flow semantics are insane here, but I can see some uses of this method
This all possible with compound literals.
We could also use the core of them for implementing tuples:
(float, string, vector){1, "hi", '0 1 2'};
There is tons of things we can do, for example, composing vectors!
If we made float[3] implicitly convertible to vector, and vise-versa (which I think it currently is) we could actually remove hacks used to make vectors from components. For example, instead of:
vector foo = x * '0 0 1' + y * '0 1 0' + z * '1 0 0';
We could actually do something less ugly like:
vector foo = (float[3]) { x, y, z };
There is tons of uses for compound literals. It solves our relaxed goto, our switch statement ideas (including ranges), our computed goto (for nested structure that can break/continue), how to describe what a tuple is, and probably many other things.
What do you think?
Currently only built-in functions can have a variable amount of parameters since plain-old QC doesn't have any efficient way to implement vararg access.
I propose implementing them the same way arrays (#5) will be implemented (and we should push for the integer and GADDRESS, GSTORE
instructions to additionally provide efficient implementations for both).
Here's the proposed syntax:
void foo(...count) {
float i;
for (i = 0; i < count; i += 1) {
print("Parameter ", ftos(i), " = ", ...(i, string));
}
// pass all parameters starting at the 3rd:
bar(...(3));
}
Internally this is handled as follows:
A global float __builtin_argc;
is added, which is filled by the caller. If a counter-name has been provided, this global will be copied into it at the start of a function (since subsequent calls will overwrite the global).
The raw global named __builtin_argc
will not be accessible within a QC program.
The parameters are treated as an array of the highest-sized type: a vector[]
, otherwise a vector would cause alignment issues.
Additionally all provided parameters are copied into function-local section at the start of a function, much like the parameters after the 8th parameter.
I think once I stablize and merge my new allocator / memory tracking system into the allocator branch we should mitigate that into master for the following reasons we dicussed this morning:
I suggested the idea of allowing something like this, this morning:
void mem_freeall(mem_heap_t *heap) {
mem_info_t *info = NULL;
for (info = heap->info; info; info = info->next) {
MEM_FREE (info - 1);
}
}
Which would allow us to just colltate all leaks to the end and free all as if it where a memory pool. Which is what it is essentially. I see some uses for this specifically in the intermediate representation, as you've already mentioned. I'd like to touch on some other interesting aspects this improves though:
frees
argument can be prefetched (blazing fast free)This all comes at a higher size cost though, code-wise, and information wise. But considering that binary size isn't of a concern, as much as speed is, we should be fine.
I'm hoping to get this in for milestone 0.2, although we might move it up to 0.3 depending on how much work needs to be done internally to change all the systems to use their own heap, or multiple heaps, we'll just have to wait and see.
Since this is effort on both ends of the spectrum, i.e me actually finishing my implementation, and changing my code to use their own heaps, and your job to change your code to use their own heaps, I have to assign this to both of us.
EDIT: The computers here at school have the worst webbrowser ever, IE6, the milestone selector, and the assignment system, and label selection all don't work. I'll fix them when I get home.
In response to __builtin_typeid #8 I think it would be better to implement a re-directive interface in header files once we have a CPP (see #2). This way if existing code some-how (for which ever insane reason chooses to use variable names like "typeid" the code will (and should still compile). Much like how in C, bool only works if you #include << stdbool.h >>. This sort of support would do something along the lines of:
which would simply do something like
It's important to provide [[extension]] specifiers here for two reasons.
-1 IT hints the compiler that the code is using an extension (which implies that we can skip the entire check of variables and hit the compilers builtin intrinsic logic (to do what needs to be done).
-2 IT prevents people from using __builtin_typeid the same way the compiler would. So we can warn that it's an extension or even error, and that it would be better if they #include << stdtypeid.qh >>
QC by design only has functions with a maximum of 8 parameters, we still want more. This needs some support code, and we need to choose whether to manually copy from parameter-positions or if it's possible to simply put them after the regular PARAM globals and pack them up with the 8th parameter's "size" attribute of the function... (This would work in DP, I have no idea if other engines would cry on it...)
So first we'll have manual copying...
or I could check what fteqcc does.
In the long run we'll want to support tuples, since they're just so damn useful.
In order for a clear and unambiguous syntax, I propose to drop the comma as an "operator", and allow semicolons within parenthesis expressions to replace the old comma.
x = (a, b); // Old
x = (a; b); // New
The comma would then be reserved for tuples, including destructuring assignments such as:
float x, y, z;
(x, y, z) = a_vector;
// Or:
string s;
(x, y, s) = a_float_float_string;
Just to get this documented. Added.
Adapt ast_instantiate to also require a copy-ctor. Required for inlining and ast-macros, since they essentially insert a copy of an ast-tree.
The names for arrays are given their standard indices bracket fulfilled names, except they don't require it at all. In fact there is no reason for this to be wrote to the string table at all. Or is there? I'm confused. Anyways I vote we don't write that out at all, or just remove those brackets [ ](two bytes), and replace it with just a : so array[100] becomes array:100, to save a big whooping BYTE!. Add this up for xonotic's case, and that is almost 6500 bytes saved.
fteqcc provides an option to enable this, id-QC does not do it
This means we also cannot simply activate it implicitly via -std=fteqcc
, but require a parameter as well. For -std=gmqcc
however it should be on by default!
fteqcc calls it -flo
, will we use the same? I'd vote for a more verbose switch.
In C++ there is a common trick that allows you to define variables having scope to if/else constructs within the body of the if expression.
if(int a = 100) {
// have access to `a` here
} else {
// have access to `a` here
}
// no access here
I'd like to see support for this when -std=gmqcc
because it allows for implementation of certain
tricks like FOR_EACH and could prevent globals (since if/else only shares, and not scope outside).
The only other way of implementing this exact functionality is with an additional set of braces, like such:
{
int a = 100;
if (a == 100) {
// have access to 'a'
} else {
// have access to `a`
}
}
// no access here
Which sucks very much. Anyways the FOR_EACH macro is a good example of what use we can gain from this:
#define FOR_EACH(T,N) \
if (decltype(T)::iterator cur = T.cur()) { } else \
if (decltype(T)::iterator end = T.end()) { } else \
for (decltype(T)::iterator N = cur; cur != end; ++N)
Mind you I have no clue how we could do FOR_EACH like this since we don't have pointer arithmetic or what not, in fact foreach should be a keyword even (we should open a ticket on that when we get there). But this is just an example showcasing how it could be useful to have this.
Similar to C does it, except only allow them as types, not in variable declarations of the form struct { contents } variable_name
.
Also allow unnamed structs and unions, which may be used internally later to create "entity-classes".
The error
function for example causes execution to stop. error
cases in if
statements thus usually don't contain a return
keyword and can cause wrong “used uninitialized” warnings.
How should we solve this? A noreturn
keyword?
noreturn void error();
Or, since a return-type makes no sense, it could be used instead of the type: noreturn error()
?
Or as attribute void error() noreturn;
Or should we just recognize the error
function as such?
Thoughts?
As much as I dislike it...
-std=fteqcc
aslo needs a switch
(This should probably also get an AST node so we can later make use if the SWITCH instructions... though they don't work well with nested switches...)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.