ziglang / zig-spec Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Currently, the stage 1 implementation of the Zig parser only allows for all struct field declarations to be contiguous. If there is any other member between multiple field declarations, it will cause an error. However, the current specification, namely:
ContainerMembers
<- TestDecl ContainerMembers
/ TopLevelComptime ContainerMembers
/ KEYWORD_pub? TopLevelDecl ContainerMembers
/ ContainerField COMMA ContainerMembers
/ ContainerField
/
allows for fields to be declared anywhere, and is thus inconsistent with the current language.
In grammar.y, as part of PrimaryTypeExpr:
LoopTypeExpr <- KEYWORD_inline? (ForTypeExpr / WhileTypeExpr)
ForTypeExpr <- ForPrefix TypeExpr (KEYWORD_else TypeExpr)?
WhileTypeExpr <- WhilePrefix TypeExpr (KEYWORD_else Payload? TypeExpr)?
This confuses me, since it would match something like while (true) u8
. What do I miss?
I cannot get validation on a 20.04 cloud host, a gitpod cloud host nor in gh actions. All of the tests have 127 vs 0
cd grammar
./validate.sh
The link in grammar/README.md
for peg parser generator
is broken. It's pointing to http://piumarta.com/software/peg/
.
It may have been taken down. April 3rd was the last time that it was crawled.
Is there another place that could be linked?
The Rust project has 3 related attempts to cover the spec
It could help to have something simple and showable for interested contributors, for which 1 sounds most approachable.
I suspect that 2 will be, with the exception of custom linker semantics (if ever introduced) and under assumption that we have exactly 1 compilation step without generated code to be used, relative simple to derive once we have some sort of model for comptime+runtime semantics.
At least, for now, under the assumption that we dont take into account hardware specifics not covered by the current C11 memory model for parallel execution semantics.
This is due to (if you read the linked paper below) optimizations for weak memory have thread-local and global effects (a thread-local optimization can enable a global one and vice versa), so I am very unsure if and how those should be represented in the type system or how to insert safety checks.
So at least from my point of view to prevent the churn of increasing a lot of paper without a lot meaning, it would be nice to start with 1 and provide a list of open semantic questions contributors can toy with and come up with something better.
Any ideas how to proceed with this? Is this the correct place to discuss or should I open an issue on the Zig compiler repo?
The related discussion
Q:
"Do you have already ideas on simplifying cache synchronisation protocol semantics? The C11 model has many severe flaws, which prevent a lot optimizations. You probably know this already, but for other interested readers https://plv.mpi-sws.org/c11comp/"
A: "so far I am not planning on touching that. I hope someone else will solve it and I just need to plug in the solutions... and there is some nice progress .. https://dl.acm.org/doi/pdf/10.1145/3385412.3386010" ("Promising 2.0: Global Optimizations
in Relaxed Memory Concurrency" by Lee et al.)
Source https://www.reddit.com/r/rust/comments/wiwjch/comment/ijfo2k6/?utm_source=share&utm_medium=web2x&context=3
Link to Coq proofs: https://github.com/snu-sf/promising2-coq
Update1: added brief reddit discussio
Update2: added coq proofs
Currently, grammar.y
is out of sync with the actual grammar implemented by the zig
compiler (in std/zig/tokenizer
and std/zig/parser.zig
).
These are the places where grammar.y
and zig
diverges, found with the improved parser from #42, using files in src
and lib/std
in the zig
repository.
doc-comments not allowed in top-level comptime
and test
declaration.
/// doc-comment for comptime.
comptime {
var a = 1;
_ = a;
}
/// doc-comment for test.
test {}
Saturating arithmetic is not supported by grammar.y
.
test {
var a: isize = 10;
a +|= 1;
a -|= 1;
a *|= 1;
a <<|= 1;
const b: isize = a +| 1;
const c: isize = b -| 1;
const d: isize = c *| 1;
const e: isize = d <<| 1;
_ = e;
}
Mixed doc-comment
and line-comment
is not supported by grammar.y
.
// A doc-comment followed by a line-comment is not supported by grammar.y.
const S = struct {
//! doc
/// doc
// doc
a: i32,
};
NOTE: I found mixing doc-comment
and line-comment
confusing, and autodoc
doesn't not handle them correctly.
Examples:
std/mem.zig:3760
/// Force an evaluation of the expression; this tries to prevent
/// the compiler from optimizing the computation away even if the
/// result eventually gets discarded.
// TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168
pub fn doNotOptimizeAway(val: anytype) void {
See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.
std/coff.zig:354
/// This relocation is meaningful only when the machine type is ARM or Thumb.
/// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair.
// ARM_MOV32 = 5,
/// This relocation is only meaningful when the machine type is RISC-V.
/// The base relocation applies to the high 20 bits of a 32-bit absolute address.
// RISCV_HIGH20 = 5,
/// Reserved, must be zero.
RESERVED = 6,
See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.
New addrspace
keyword.
Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing)
Date: 2021-09-14
test {
const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined;
_ = y;
}
Inline switch prong not supported by grammar.y
.
Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases)
Date: 2022-02-13
const std = @import("std");
const expect = std.testing.expect;
const SliceTypeA = extern struct {
len: usize,
ptr: [*]u32,
};
const SliceTypeB = extern struct {
ptr: [*]SliceTypeA,
len: usize,
};
const AnySlice = union(enum) {
a: SliceTypeA,
b: SliceTypeB,
c: []const u8,
d: []AnySlice,
};
fn withSwitch(any: AnySlice) usize {
return switch (any) {
// With `inline else` the function is explicitly generated
// as the desired switch and the compiler can check that
// every possible case is handled.
inline else => |slice| slice.len,
};
}
test "inline else" {
var any = AnySlice{ .c = "hello" };
try expect(withSwitch(any) == 5);
}
New packed struct
syntax.
Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs)
Date: 2022-02-23
pub const AbsolutePointerModeAttributes = packed struct(u32) {
supports_alt_active: bool,
supports_pressure_as_z: bool,
_pad: u30 = 0,
};
Part of ziglang/zig#2093
I'm pretty sure this should be valid syntax but it is not
const a = x: {}.*;
test.zig:1:16: error: expected token ';', found '.'
const a = x: {}.*;
Both the parser for the Zig compiler and the auto generated peg compiler fails to parse this so this is a grammar design flaw. I think this is fixed by this diff, but I'll have to do some more testing to ensure nothing breaks.
@@ -88,13 +88,12 @@ PrimaryExpr
/ KEYWORD_continue BreakLabel?
/ KEYWORD_resume Expr
/ KEYWORD_return Expr?
- / LabeledExpr
+ / BlockLabel? LoopExpr
+ / Block
/ CurlySuffixExpr
IfExpr <- IfPrefix Expr (KEYWORD_else Payload? Expr)?
-LabeledExpr <- BlockLabel? (Block / LoopExpr)
-
Block <- LBRACE Statement* RBRACE
LoopExpr <- KEYWORD_inline? (ForExpr / WhileExpr)
Currently the grammar does not support multi-variable for loop syntax.
Explained here
Actually, this broke with the grammar changes in #1685. Basically, we have this rule at stmt level:
Statement <- ... / LabeledStatement ... / AssignExpr SEMICOLON LabeledStatement <- BlockLabel? (Block / LoopStatement)
AssignExpr
expands into all other expressions.LabeledStatement
is beforeAssignExpr
so that rule is being parsed first, succeeds and thenStatement
succeeds. Then the parser tries to end the__case_1
block, which then fails the parsing.The parser parses the grammar correctly, but the new grammar just does not allow this syntax. I can't come up with a solution on top of my head right now.
Edit: Well, a solution would be to make translate-c conform to this new grammar. Idk if we want to put in the work to make
x:{}.* = 1;
valid at stmt level.
Is this the procedure that you are using to install peg? I am working on getting a github work flow to run check_parser.sh.
wget https://www.piumarta.com/software/peg/peg-0.1.18.tar.gz
tar zxf peg-0.1.18.tar.gz
cd peg-0.1.18
make
Regards, Mark
Hi there,
I'm currently working on a project where I need to parse a simplified version of the Zig programming language using Flex and Bison. To achieve this, I am looking for a simplified Zig grammar in Backus-Naur Form (BNF) that can be used within a .y
(Yacc) file. Additionally, I require the corresponding tokens to be defined in a .l
(Lex) file. The simplified grammar covers basic programming features, function calls, arrays, and structure usages.
The goal is to leverage Flex and Bison to parse this simplified Zig grammar effectively, focusing on the core syntax and structures that define the language. This parser will be a foundational tool for further development and analysis of Zig code within our project.
Thank you for your time.
NL = New Line (0x0a)
CR = Carriage Return (0x0d)
TAB = Tab (0x09)
zig fmt
, leaving only NL.zig fmt
may not mangle multi line string literals, and therefore the control character TAB are rejected by the grammar inside multi-line string literals.For string literals that want to include CR, TAB, or any other control sequences, they will need to use regular string literals and the ++
operator, or @embedFile
.
zig fmt
.zig fmt
.The equivalent of this test in C is UB according to ubsan:
const bytes align(4) = [_]u8{ 1, 2, 3, 4 };
const S = extern struct {
a: u16,
b: u16,
c: u32,
};
const x = @ptrCast(*const S, &bytes);
var p = &x.b; // TODO: Triggers ubsan for CBE, since @sizeOf(S) > @sizeOf(bytes)
const expected = switch (native_endian) {
.Little => 0x0403,
.Big => 0x0304,
};
try expect(p.* == expected);
The problem is that we're reinterpreting memory using a type larger than the underlying object. The load itself does not exceed the region of the underlying object however, so this behavior could conceivably be well-defined in Zig.
Is this UB in Zig?
Am I missing a step?
[nix-shell:~/dev/zig-spec/grammar]$ make VERBOSE=1
gcc -O3 -o build/parser build/full.c
/run/user/1000/ccYkPCbK.o: In function `main':
full.c:(.text.startup+0x7): undefined reference to `yyparse'
collect2: error: ld returned 1 exit status
make: *** [Makefile:17: build/parser] Error 1
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.