Giter VIP home page Giter VIP logo

zig-spec's People

Contributors

andrewrk avatar bnoordhuis avatar dajobat avatar data-man avatar ehaas avatar emekoi avatar g-w1 avatar hejsil avatar hryx avatar ifreund avatar jacobly0 avatar markfirmware avatar mattbork avatar mcsinyx avatar mlugg avatar mokulus avatar perillo avatar scr0nch avatar thejoshwolfe avatar vexu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zig-spec's Issues

Grammar for container members does not enforce that fields be adjacent

Currently, the stage 1 implementation of the Zig parser only allows for all struct field declarations to be contiguous. If there is any other member between multiple field declarations, it will cause an error. However, the current specification, namely:

ContainerMembers
    <- TestDecl ContainerMembers
     / TopLevelComptime ContainerMembers
     / KEYWORD_pub? TopLevelDecl ContainerMembers
     / ContainerField COMMA ContainerMembers
     / ContainerField
     /

allows for fields to be declared anywhere, and is thus inconsistent with the current language.

What is LoopTypeExpr supposed to match?

In grammar.y, as part of PrimaryTypeExpr:

LoopTypeExpr <- KEYWORD_inline? (ForTypeExpr / WhileTypeExpr)

ForTypeExpr <- ForPrefix TypeExpr (KEYWORD_else TypeExpr)?

WhileTypeExpr <- WhilePrefix TypeExpr (KEYWORD_else Payload? TypeExpr)?

This confuses me, since it would match something like while (true) u8. What do I miss?

roadmap to spec

The Rust project has 3 related attempts to cover the spec

It could help to have something simple and showable for interested contributors, for which 1 sounds most approachable.
I suspect that 2 will be, with the exception of custom linker semantics (if ever introduced) and under assumption that we have exactly 1 compilation step without generated code to be used, relative simple to derive once we have some sort of model for comptime+runtime semantics.
At least, for now, under the assumption that we dont take into account hardware specifics not covered by the current C11 memory model for parallel execution semantics.
This is due to (if you read the linked paper below) optimizations for weak memory have thread-local and global effects (a thread-local optimization can enable a global one and vice versa), so I am very unsure if and how those should be represented in the type system or how to insert safety checks.

So at least from my point of view to prevent the churn of increasing a lot of paper without a lot meaning, it would be nice to start with 1 and provide a list of open semantic questions contributors can toy with and come up with something better.

Any ideas how to proceed with this? Is this the correct place to discuss or should I open an issue on the Zig compiler repo?

The related discussion
Q:
"Do you have already ideas on simplifying cache synchronisation protocol semantics? The C11 model has many severe flaws, which prevent a lot optimizations. You probably know this already, but for other interested readers https://plv.mpi-sws.org/c11comp/"
A: "so far I am not planning on touching that. I hope someone else will solve it and I just need to plug in the solutions... and there is some nice progress .. https://dl.acm.org/doi/pdf/10.1145/3385412.3386010" ("Promising 2.0: Global Optimizations
in Relaxed Memory Concurrency" by Lee et al.)
Source https://www.reddit.com/r/rust/comments/wiwjch/comment/ijfo2k6/?utm_source=share&utm_medium=web2x&context=3

Link to Coq proofs: https://github.com/snu-sf/promising2-coq

Update1: added brief reddit discussio
Update2: added coq proofs

Update grammar.y

Currently, grammar.y is out of sync with the actual grammar implemented by the zig compiler (in std/zig/tokenizer and std/zig/parser.zig).

These are the places where grammar.y and zig diverges, found with the improved parser from #42, using files in src and lib/std in the zig repository.

  • doc-comments not allowed in top-level comptime and test declaration.

    /// doc-comment for comptime.
    comptime {
        var a = 1;
        _ = a;
    }
    
    /// doc-comment for test.
    test {}
  • Saturating arithmetic is not supported by grammar.y.

    test {
        var a: isize = 10;
    
        a +|= 1;
        a -|= 1;
        a *|= 1;
        a <<|= 1;
    
        const b: isize = a +| 1;
        const c: isize = b -| 1;
        const d: isize = c *| 1;
        const e: isize = d <<| 1;
        _ = e;
    }
  • Mixed doc-comment and line-comment is not supported by grammar.y.

    // A doc-comment followed by a line-comment is not supported by grammar.y.
    const S = struct {
        //! doc
        /// doc
        // doc
        a: i32,
    };

    NOTE: I found mixing doc-comment and line-comment confusing, and autodoc doesn't not handle them correctly.

    Examples:

    • std/mem.zig:3760

      /// Force an evaluation of the expression; this tries to prevent
      /// the compiler from optimizing the computation away even if the
      /// result eventually gets discarded.
      // TODO: use @declareSideEffect() when it is available - https://github.com/ziglang/zig/issues/6168
      pub fn doNotOptimizeAway(val: anytype) void {

      See: https://ziglang.org/documentation/master/std/#root;mem.doNotOptimizeAway.

    • std/coff.zig:354

      /// This relocation is meaningful only when the machine type is ARM or Thumb.
      /// The base relocation applies the 32-bit address of a symbol across a consecutive MOVW/MOVT instruction pair.
      // ARM_MOV32 = 5,
      
      /// This relocation is only meaningful when the machine type is RISC-V.
      /// The base relocation applies to the high 20 bits of a 32-bit absolute address.
      // RISCV_HIGH20 = 5,
      
      /// Reserved, must be zero.
      RESERVED = 6,

      See https://ziglang.org/documentation/master/std/#root;coff.BaseRelocationType.

  • New addrspace keyword.

    Commit: ziglang/zig@ccc7f9987 (Address spaces: addrspace(A) parsing)
    Date: 2021-09-14

    test {
        const y: *allowzero align(8) addrspace(.generic) const volatile u32 = undefined;
        _ = y;
    }
    
  • Inline switch prong not supported by grammar.y.

    Commit: ziglang/zig@b4d81857f (stage1+2: parse inline switch cases)
    Date: 2022-02-13

    const std = @import("std");
    const expect = std.testing.expect;
    
    const SliceTypeA = extern struct {
        len: usize,
        ptr: [*]u32,
    };
    const SliceTypeB = extern struct {
        ptr: [*]SliceTypeA,
        len: usize,
    };
    const AnySlice = union(enum) {
        a: SliceTypeA,
        b: SliceTypeB,
        c: []const u8,
        d: []AnySlice,
    };
    
    fn withSwitch(any: AnySlice) usize {
        return switch (any) {
            // With `inline else` the function is explicitly generated
            // as the desired switch and the compiler can check that
            // every possible case is handled.
            inline else => |slice| slice.len,
        };
    }
    
    test "inline else" {
        var any = AnySlice{ .c = "hello" };
        try expect(withSwitch(any) == 5);
    }
  • New packed struct syntax.

    Commit: ziglang/zig@6249a24e8 (stage2: integer-backed packed structs)
    Date: 2022-02-23

    pub const AbsolutePointerModeAttributes = packed struct(u32) {
        supports_alt_active: bool,
        supports_pressure_as_z: bool,
        _pad: u30 = 0,
    };

Suffix op not allowed on labeled blocks

I'm pretty sure this should be valid syntax but it is not

const a = x: {}.*;
test.zig:1:16: error: expected token ';', found '.'
const a = x: {}.*;

Both the parser for the Zig compiler and the auto generated peg compiler fails to parse this so this is a grammar design flaw. I think this is fixed by this diff, but I'll have to do some more testing to ensure nothing breaks.

@@ -88,13 +88,12 @@ PrimaryExpr
      / KEYWORD_continue BreakLabel?
      / KEYWORD_resume Expr
      / KEYWORD_return Expr?
-     / LabeledExpr
+     / BlockLabel? LoopExpr
+     / Block
      / CurlySuffixExpr

 IfExpr <- IfPrefix Expr (KEYWORD_else Payload? Expr)?

-LabeledExpr <- BlockLabel? (Block / LoopExpr)
-
 Block <- LBRACE Statement* RBRACE

 LoopExpr <- KEYWORD_inline? (ForExpr / WhileExpr)

Labeled block not allowed as lhs of assignment at statement level

Explained here

Actually, this broke with the grammar changes in #1685. Basically, we have this rule at stmt level:

Statement
    <- ...
     / LabeledStatement
     ...
     / AssignExpr SEMICOLON

LabeledStatement <- BlockLabel? (Block / LoopStatement)

AssignExpr expands into all other expressions. LabeledStatement is before AssignExpr so that rule is being parsed first, succeeds and then Statement succeeds. Then the parser tries to end the __case_1 block, which then fails the parsing.

The parser parses the grammar correctly, but the new grammar just does not allow this syntax. I can't come up with a solution on top of my head right now.

Edit: Well, a solution would be to make translate-c conform to this new grammar. Idk if we want to put in the work to make x:{}.* = 1; valid at stmt level.

Installing peg

Is this the procedure that you are using to install peg? I am working on getting a github work flow to run check_parser.sh.

wget https://www.piumarta.com/software/peg/peg-0.1.18.tar.gz
tar zxf peg-0.1.18.tar.gz
cd peg-0.1.18
make

Regards, Mark

Looking for a Simplified Zig Grammar in BNF for Flex/Bison Parsing

Hi there,

I'm currently working on a project where I need to parse a simplified version of the Zig programming language using Flex and Bison. To achieve this, I am looking for a simplified Zig grammar in Backus-Naur Form (BNF) that can be used within a .y (Yacc) file. Additionally, I require the corresponding tokens to be defined in a .l (Lex) file. The simplified grammar covers basic programming features, function calls, arrays, and structure usages.

The goal is to leverage Flex and Bison to parse this simplified Zig grammar effectively, focusing on the core syntax and structures that define the language. This parser will be a foundational tool for further development and analysis of Zig code within our project.

Specific Requirements:

  • Simplified Zig Grammar in BNF: A concise version of Zig's grammar, capturing its essential syntax and structures in BNF format. This will be used to generate the parser with Bison.
  • Tokens for Flex: Definitions of the necessary tokens that correspond to the simplified Zig grammar. These tokens will be utilized by Flex for lexical analysis.

Thank you for your time.

grammar clarifications regarding tabs and carriage returns

NL = New Line (0x0a)
CR = Carriage Return (0x0d)
TAB = Tab (0x09)

Inside Line Comments and Documentation Comments

  • Any TAB is rejected by the grammar since it is ambiguous how it should be rendered.
  • CR directly preceding NL is unambiguously part of the newline sequence. It is accepted by the grammar and removed by zig fmt, leaving only NL.
  • CR anywhere else is rejected by the grammar.

Inside Multi-Line String Literals

  • zig fmt may not mangle multi line string literals, and therefore the control character TAB are rejected by the grammar inside multi-line string literals.
  • CR inside the multiline string literal is also rejected for the same reason
  • However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR.

For string literals that want to include CR, TAB, or any other control sequences, they will need to use regular string literals and the ++ operator, or @embedFile.

Whitespace

  • TAB used as whitespace is unambiguous. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.
  • CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.

astronaute-meme

Clarification about reinterpreting memory through over-sized type

The equivalent of this test in C is UB according to ubsan:

const bytes align(4) = [_]u8{ 1, 2, 3, 4 };
const S = extern struct {
    a: u16,
    b: u16,
    c: u32,
};

const x = @ptrCast(*const S, &bytes);
var p = &x.b; // TODO: Triggers ubsan for CBE, since @sizeOf(S) > @sizeOf(bytes)

const expected = switch (native_endian) {
    .Little => 0x0403,
    .Big => 0x0304,
};
try expect(p.* == expected);

The problem is that we're reinterpreting memory using a type larger than the underlying object. The load itself does not exceed the region of the underlying object however, so this behavior could conceivably be well-defined in Zig.

Is this UB in Zig?

undefined reference to `yyparse'

Am I missing a step?

[nix-shell:~/dev/zig-spec/grammar]$ make VERBOSE=1
gcc -O3 -o build/parser build/full.c
/run/user/1000/ccYkPCbK.o: In function `main':
full.c:(.text.startup+0x7): undefined reference to `yyparse'
collect2: error: ld returned 1 exit status
make: *** [Makefile:17: build/parser] Error 1

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.