ziglang / zig-spec Goto Github PK

View Code? Open in Web Editor NEW

137.0 137.0 24.0 215 KB

License: MIT License

Makefile 0.06% Yacc 4.10% C 0.36% Shell 0.16% Zig 95.31%

zig-spec's People

Contributors

Stargazers

Watchers

zig-spec's Issues

Grammar for container members does not enforce that fields be adjacent

Currently, the stage 1 implementation of the Zig parser only allows for all struct field declarations to be contiguous. If there is any other member between multiple field declarations, it will cause an error. However, the current specification, namely:

ContainerMembers
    <- TestDecl ContainerMembers
     / TopLevelComptime ContainerMembers
     / KEYWORD_pub? TopLevelDecl ContainerMembers
     / ContainerField COMMA ContainerMembers
     / ContainerField
     /

allows for fields to be declared anywhere, and is thus inconsistent with the current language.

What is LoopTypeExpr supposed to match?

In grammar.y, as part of PrimaryTypeExpr:

LoopTypeExpr <- KEYWORD_inline? (ForTypeExpr / WhileTypeExpr)

ForTypeExpr <- ForPrefix TypeExpr (KEYWORD_else TypeExpr)?

WhileTypeExpr <- WhilePrefix TypeExpr (KEYWORD_else Payload? TypeExpr)?

This confuses me, since it would match something like while (true) u8. What do I miss?

Trying to validate grammar

I cannot get validation on a 20.04 cloud host, a gitpod cloud host nor in gh actions. All of the tests have 127 vs 0

cd grammar
./validate.sh

The file `grammar/README.md` has a broken link.

The link in grammar/README.md for peg parser generator is broken. It's pointing to http://piumarta.com/software/peg/.

It may have been taken down. April 3rd was the last time that it was crawled.

Is there another place that could be linked?

roadmap to spec

The Rust project has 3 related attempts to cover the spec

1. a human readable specification in an idealized language https://github.com/RalfJung/minirust
1. a debuggable language spec to simplify type system concepts based on reduction https://github.com/nikomatsakis/a-mir-formality/
1. a C/C++-like "axiomatic in style" one https://github.com/ferrocene/specification (later to vertify properties of 1.)

It could help to have something simple and showable for interested contributors, for which 1 sounds most approachable.
I suspect that 2 will be, with the exception of custom linker semantics (if ever introduced) and under assumption that we have exactly 1 compilation step without generated code to be used, relative simple to derive once we have some sort of model for comptime+runtime semantics.
At least, for now, under the assumption that we dont take into account hardware specifics not covered by the current C11 memory model for parallel execution semantics.
This is due to (if you read the linked paper below) optimizations for weak memory have thread-local and global effects (a thread-local optimization can enable a global one and vice versa), so I am very unsure if and how those should be represented in the type system or how to insert safety checks.

So at least from my point of view to prevent the churn of increasing a lot of paper without a lot meaning, it would be nice to start with 1 and provide a list of open semantic questions contributors can toy with and come up with something better.

Any ideas how to proceed with this? Is this the correct place to discuss or should I open an issue on the Zig compiler repo?

The related discussion
Q:
"Do you have already ideas on simplifying cache synchronisation protocol semantics? The C11 model has many severe flaws, which prevent a lot optimizations. You probably know this already, but for other interested readers https://plv.mpi-sws.org/c11comp/"
A: "so far I am not planning on touching that. I hope someone else will solve it and I just need to plug in the solutions... and there is some nice progress .. https://dl.acm.org/doi/pdf/10.1145/3385412.3386010" ("Promising 2.0: Global Optimizations
in Relaxed Memory Concurrency" by Lee et al.)
Source https://www.reddit.com/r/rust/comments/wiwjch/comment/ijfo2k6/?utm_source=share&utm_medium=web2x&context=3

Link to Coq proofs: https://github.com/snu-sf/promising2-coq

Update1: added brief reddit discussio
Update2: added coq proofs

Update grammar.y

Currently, grammar.y is out of sync with the actual grammar implemented by the zig compiler (in std/zig/tokenizer and std/zig/parser.zig).

These are the places where grammar.y and zig diverges, found with the improved parser from #42, using files in src and lib/std in the zig repository.

doc-comments not allowed in top-level comptime and test declaration.

/// doc-comment for comptime.
comptime {
    var a = 1;
    _ = a;
}

/// doc-comment for test.
test {}

Saturating arithmetic is not supported by grammar.y.

test {
    var a: isize = 10;

    a +|= 1;
    a -|= 1;
    a *|= 1;
    a <<|= 1;

    const b: isize = a +| 1;
    const c: isize = b -| 1;
    const d: isize = c *| 1;
    const e: isize = d <<| 1;
    _ = e;
}

Mixed doc-comment and line-comment is not supported by grammar.y.

// A doc-comment followed by a line-comment is not supported by grammar.y.
const S = struct {
    //! doc
    /// doc
    // doc
    a: i32,
};

NOTE: I found mixing doc-comment and line-comment confusing, and autodoc doesn't not handle them correctly.

Examples:

std/mem.zig:3760

/// Force an evaluation of the expression; this tries to prevent
/// the compiler from optimizing the computation away even if the
/// result eventually gets discarded.
// TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168
pub fn doNotOptimizeAway(val: anytype) void {

See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.

std/coff.zig:354

/// This relocation is meaningful only when the machine type is ARM or Thumb.
/// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair.
// ARM_MOV32 = 5,

/// This relocation is only meaningful when the machine type is RISC-V.
/// The base relocation applies to the high 20 bits of a 32-bit absolute address.
// RISCV_HIGH20 = 5,

/// Reserved, must be zero.
RESERVED = 6,

See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.

New addrspace keyword.

Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing)
Date: 2021-09-14

test {
    const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined;
    _ = y;
}

Inline switch prong not supported by grammar.y.

Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases)
Date: 2022-02-13

const std = @import("std");
const expect = std.testing.expect;

const SliceTypeA = extern struct {
    len: usize,
    ptr: [*]u32,
};
const SliceTypeB = extern struct {
    ptr: [*]SliceTypeA,
    len: usize,
};
const AnySlice = union(enum) {
    a: SliceTypeA,
    b: SliceTypeB,
    c: []const u8,
    d: []AnySlice,
};

fn withSwitch(any: AnySlice) usize {
    return switch (any) {
        // With `inline else` the function is explicitly generated
        // as the desired switch and the compiler can check that
        // every possible case is handled.
        inline else => |slice| slice.len,
    };
}

test "inline else" {
    var any = AnySlice{ .c = "hello" };
    try expect(withSwitch(any) == 5);
}

New packed struct syntax.

Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs)
Date: 2022-02-23

pub const AbsolutePointerModeAttributes = packed struct(u32) {
    supports_alt_active: bool,
    supports_pressure_as_z: bool,
    _pad: u30 = 0,
};

Remove binary and octal float literals

Part of ziglang/zig#2093

Suffix op not allowed on labeled blocks

I'm pretty sure this should be valid syntax but it is not

const a = x: {}.*;

test.zig:1:16: error: expected token ';', found '.'
const a = x: {}.*;

Both the parser for the Zig compiler and the auto generated peg compiler fails to parse this so this is a grammar design flaw. I think this is fixed by this diff, but I'll have to do some more testing to ensure nothing breaks.

@@ -88,13 +88,12 @@ PrimaryExpr
      / KEYWORD_continue BreakLabel?
      / KEYWORD_resume Expr
      / KEYWORD_return Expr?
-     / LabeledExpr
+     / BlockLabel? LoopExpr
+     / Block
      / CurlySuffixExpr

 IfExpr <- IfPrefix Expr (KEYWORD_else Payload? Expr)?

-LabeledExpr <- BlockLabel? (Block / LoopExpr)
-
 Block <- LBRACE Statement* RBRACE

 LoopExpr <- KEYWORD_inline? (ForExpr / WhileExpr)

Grammar does not support new for loop syntax

Currently the grammar does not support multi-variable for loop syntax.

Labeled block not allowed as lhs of assignment at statement level

Explained here

Actually, this broke with the grammar changes in #1685. Basically, we have this rule at stmt level:
Statement
    <- ...
     / LabeledStatement
     ...
     / AssignExpr SEMICOLON

LabeledStatement <- BlockLabel? (Block / LoopStatement)
AssignExpr expands into all other expressions. LabeledStatement is before AssignExpr so that rule is being parsed first, succeeds and then Statement succeeds. Then the parser tries to end the __case_1 block, which then fails the parsing.

The parser parses the grammar correctly, but the new grammar just does not allow this syntax. I can't come up with a solution on top of my head right now.

Edit: Well, a solution would be to make translate-c conform to this new grammar. Idk if we want to put in the work to make x:{}.* = 1; valid at stmt level.

Installing peg

Is this the procedure that you are using to install peg? I am working on getting a github work flow to run check_parser.sh.

wget https://www.piumarta.com/software/peg/peg-0.1.18.tar.gz
tar zxf peg-0.1.18.tar.gz
cd peg-0.1.18
make

Regards, Mark

Looking for a Simplified Zig Grammar in BNF for Flex/Bison Parsing

Hi there,

I'm currently working on a project where I need to parse a simplified version of the Zig programming language using Flex and Bison. To achieve this, I am looking for a simplified Zig grammar in Backus-Naur Form (BNF) that can be used within a .y (Yacc) file. Additionally, I require the corresponding tokens to be defined in a .l (Lex) file. The simplified grammar covers basic programming features, function calls, arrays, and structure usages.

The goal is to leverage Flex and Bison to parse this simplified Zig grammar effectively, focusing on the core syntax and structures that define the language. This parser will be a foundational tool for further development and analysis of Zig code within our project.

Specific Requirements:

Simplified Zig Grammar in BNF: A concise version of Zig's grammar, capturing its essential syntax and structures in BNF format. This will be used to generate the parser with Bison.
Tokens for Flex: Definitions of the necessary tokens that correspond to the simplified Zig grammar. These tokens will be utilized by Flex for lexical analysis.

Thank you for your time.

grammar clarifications regarding tabs and carriage returns

NL = New Line (0x0a)
CR = Carriage Return (0x0d)
TAB = Tab (0x09)

Inside Line Comments and Documentation Comments

Any TAB is rejected by the grammar since it is ambiguous how it should be rendered.
CR directly preceding NL is unambiguously part of the newline sequence. It is accepted by the grammar and removed by zig fmt, leaving only NL.
CR anywhere else is rejected by the grammar.

Inside Multi-Line String Literals

zig fmt may not mangle multi line string literals, and therefore the control character TAB are rejected by the grammar inside multi-line string literals.
CR inside the multiline string literal is also rejected for the same reason
However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

For string literals that want to include CR, TAB, or any other control sequences, they will need to use regular string literals and the ++ operator, or @embedFile.

Whitespace

TAB used as whitespace is unambiguous. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.
CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.

Clarification about reinterpreting memory through over-sized type

The equivalent of this test in C is UB according to ubsan:

const bytes align(4) = [_]u8{ 1, 2, 3, 4 };
const S = extern struct {
    a: u16,
    b: u16,
    c: u32,
};

const x = @ptrCast(*const S, &bytes);
var p = &x.b; // TODO: Triggers ubsan for CBE, since @sizeOf(S) > @sizeOf(bytes)

const expected = switch (native_endian) {
    .Little => 0x0403,
    .Big => 0x0304,
};
try expect(p.* == expected);

The problem is that we're reinterpreting memory using a type larger than the underlying object. The load itself does not exceed the region of the underlying object however, so this behavior could conceivably be well-defined in Zig.

Is this UB in Zig?

undefined reference to `yyparse'

Am I missing a step?

[nix-shell:~/dev/zig-spec/grammar]$ make VERBOSE=1
gcc -O3 -o build/parser build/full.c
/run/user/1000/ccYkPCbK.o: In function `main':
full.c:(.text.startup+0x7): undefined reference to `yyparse'
collect2: error: ld returned 1 exit status
make: *** [Makefile:17: build/parser] Error 1