Giter VIP home page Giter VIP logo

guacamole's Introduction

NAME

Guacamole - A parser toolkit for Standard Perl

VERSION

version 0.008

SYNOPSIS

use Guacamole;
my ($ast) = Guacamole->parse($string);

DESCRIPITON

Guacamole is a Perl parser toolkit.

It can:

  • Parse Standard Perl

    This is explained in this document.

    For Standard Perl, see the next clause.

  • Check a file is written in Standard Perl

    This is done by standard, which is where Standard Perl is described.

  • Lint your code

    See Guacamole::Linter.

  • Deparse your code

    See Guacamole::Deparse.

  • Rewrite your code

    There is a proof-of-concept for this and we hope to provide this as a framework.

Standard Perl

Guacamole only works on Standard Perl. You can read about it here: standard.

Parser

my ($ast) = Guacamole->parse($string);

To parse a string, call Gaucamole's parse method. (This might turn to an object-oriented interface in the future.)

It returns a list of results. If it ever returns more than one, this is a bug that means it couldn't ambiguously parse something. This will later be enforced in the interface. The current interface is not official.

AST Nodes

Guacamole returns an AST with two types of nodes.

my ($ast) = Guacamole->parse('$foo = 1');

The above will generate a larger AST than you imagine (which might be pruned in the future). We'll focus on two types of nodes that will appear above.

Rules

Rules are the top level expressions. They include the definitions for rules. They include information on location in the file, length, line, and column.

$rule = {
    'children'  => [...],
    'column'    => 2,
    'length'    => 3,
    'line'      => 1,
    'name'      => 'VarIdentExpr',
    'start_pos' => 1,
    'type'      => 'rule',
},

This rule is a VarIdentExpr which is an expression for a variable identity.

In the code above, it refers to the foo in $foo - which is the identity itself.

It has one child, described below under Lexemes.

Lexemes

The child for the VarIdentExpr rule should be the value of the identity.

$lexeme = {
    'name'  => '',
    'type'  => 'lexeme',
    'value' => 'foo',
};

The name attribute for all lexemes is empty. This is to make it easy to write code that checks for the value of a rule without having to check whether it's a rule first.

THANKS

  • Damian Conway

    For helping understand what is feasible, what isn't, and why, and for having infinite patience in explaining these.

  • Jeffrey Kegler

    For Marpa and helping understand how to use Marpa better.

  • Gonzalo Diethelm

    For continuous feedback and support.

  • H. Merijn Brand (@Tux)

    For providing the initial production-level test of Guacamole to help shake many of the bugs in the BNF.

SEE ALSO

AUTHORS

  • Sawyer X
  • Vickenty Fesunov

COPYRIGHT AND LICENSE

This software is Copyright (c) 2022 by Sawyer X.

This is free software, licensed under:

The MIT (X11) License

guacamole's People

Contributors

briandfoy avatar chromatic avatar plicease avatar valcomm avatar vickenty avatar xsawyerx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

guacamole's Issues

Regex-related functions

First version does not parse the regex string or regex modifiers as elements, only the full regex.

  • split
  • m//
  • s///
  • tr///
  • y///
  • qr//
  • //

Keyword parsing rules

This is me collecting my insights from how we should parse/lex the keywords.

  • Perl seems to allow Expressions everywhere
  • Keywords that receive two arguments can use two Expressions
  • However, a single Expression that returns two Values doesn't work
crypt "text", "salt";      # ok
crypt text(), salt();      # ok
crypt @{ text_and_salt() } # not ok

Cannot assign from keywords

chmod 0755, "filename";           # ok
my $foo = chmod 0755, "filename"; # not ok

open my $fh, '<', $filename;           # ok
my $foo = open my $fh, '<', $filename; # not ok

Bareword STDIN fails with -t

I'm testing my code against use standard and I found what I think is a discrepancy. I have code that tests if STDIN is an interactive terminal thusly:

if (-t STDIN == 0) { ...

This fails with standard, but I think it should be allowed according to what I've read. Alternately I did find that:

if (-t *STDIN == 0) { ...

removes the warning, but I'm not sure it should be required per the documentation.

Support all keywords

This is a TODO placeholder for all keywords in perlfunc. Once this is done.... we're technically done. :)

  • abs
  • accept
  • alarm
  • atan2
  • bind
  • binmode
  • bless
  • break
  • caller
  • chdir
  • chmod
  • chomp
  • chop
  • chown
  • chr
  • chroot
  • close
  • closedir
  • connect
  • continue
  • cos
  • crypt
  • dbmclose
  • dbmopen
  • defined
  • delete
  • die
  • do
  • dump
  • each
  • eof
  • eval
  • evalbytes
  • exec
  • exists
  • exit
  • exp
  • fc
  • fcntl
  • fileno
  • flock
  • fork
  • getc
  • getlogin
  • getpeername
  • getpgrp
  • getppid
  • getpriority
  • getpwnam
  • getgrnam
  • gethostbyname
  • getnetbyname
  • getprotobyname
  • getpwuid
  • getgrgid
  • getservbyname
  • gethostbyaddr
  • getnetbyaddr
  • getprotobynumber
  • getservbyport
  • getpwent
  • getgrent
  • gethostent
  • getnetent
  • getprotoent
  • getservent
  • setpwent
  • setgrent
  • sethostent
  • setnetent
  • setprotoent
  • setservent
  • endpwent
  • endgrent
  • endhostent
  • endnetent
  • endprotoent
  • endservent
  • getsockname
  • getsockopt
  • glob
  • gmtime
  • goto
  • grep
  • hex
  • index
  • int
  • ioctl
  • join
  • keys
  • kill
  • last
  • lc
  • lcfirst
  • length
  • link
  • listen
  • local
  • localtime
  • lock
  • log
  • lstat
  • map
  • mkdir
  • msgctl
  • msgget
  • msgrcv
  • msgsnd
  • my
  • next
  • no
  • oct
  • open
  • opendir
  • ord
  • our
  • pack
  • package
  • pipe
  • pop
  • pos
  • print
  • printf
  • prototype
  • push
  • quotemeta
  • rand
  • read
  • readdir
  • readline
  • readlink
  • readpipe
  • recv
  • redo
  • ref
  • rename
  • require
  • reset
  • return
  • reverse
  • rewinddir
  • rindex
  • rmdir
  • say
  • scalar
  • seek
  • seekdir
  • select
  • semctl
  • semget
  • semop
  • send
  • setpgrp
  • setpriority
  • setsockopt
  • shift
  • shmctl
  • shmget
  • shmread
  • shmwrite
  • shutdown
  • sin
  • sleep
  • socket
  • socketpair
  • sort
  • splice
  • split
  • sprintf
  • sqrt
  • srand
  • stat
  • state
  • study
  • sub
  • substr
  • symlink
  • syscall
  • sysopen
  • sysread
  • sysseek
  • system
  • syswrite
  • tell
  • telldir
  • tie
  • tied
  • time
  • times
  • truncate
  • uc
  • ucfirst
  • umask
  • undef
  • unlink
  • unpack
  • unshift
  • untie
  • use
  • utime
  • values
  • vec
  • wait
  • waitpid
  • wantarray
  • warn
  • write

Control flows:

  • if
  • if postfix
  • elsif
  • else
  • for
  • for postfix
  • foreach
  • foreach postfix
  • unless
  • unless postfix
  • until
  • until postfix
  • while
  • while postfix

File ops:

  • -r
  • -w
  • -x
  • -o
  • -R
  • -W
  • -X
  • -O
  • -e
  • -z
  • -s
  • -f
  • -d
  • -l
  • -p
  • -S
  • -b
  • -c
  • -t
  • -u
  • -g
  • -k
  • -T
  • -B
  • -M
  • -A
  • -C

Regex-based:

  • ` ` (bare backticks)
  • // (bare regex)
  • m//
  • s///
  • tr///
  • y///
  • qr/STRING/ (which is also a q-function)

Q functions:

  • q/STRING/
  • qq/STRING/
  • qw/STRING/
  • qx/STRING/
  • qr/STRING/

Additional syntax:

  • Quote escaping in strings
  • Prefixed dereference @{$var}
  • Postfix dereference $var->@*

Direct array and hash access

# array and hash
$array[$index]
$hash{$value}

# array slices on array and hash
@array[@indices]
@hash{@keys}

# hash slices on array and hash
%array[@indices]
%hash{@keys}

@vickenty, do you feel like taking this? :)

Variable dereferencing

The following syntax is not yet supported:

@{$foo}
$foo->@*

This includes:

  • ${...} / ...->$*
  • @{...} / ...->@*
  • %{...} / ...->%*
  • &{...} / ...->&*
  • *{...} / ...->**

I'm honestly happy to only have postfix dereferencing. One recommended way of doing it and that's it.

Rename NonBrace* rules

After fixing empty hash literals in #84, NonBrace expressions do not live up to their name, because they can now begin with a brace.

Syntax: inline POD

Inline POD presents a unique problem:

We need to match the beginning of line specifically, then we need to gobble everything, including all spacing until we match the =cut.

Marpa's BFN regex supports ^ so we might be able to get away with it with ^[#] and ^=cut$ but I'm not sure if that's for beginning/end of lines.

Hm, I wonder what it would look like if
you tried to write a BNF for Pod from this.

-- perlpodscpec

Accidentally supporting Foo::()

This is because we cannot tell the difference between Ident used for SubCall and Ident used for class name.

We need to have two different Ident lexemes. This isn't a top priority. I'm just opening a ticket to keep track.

More data in b628b3e. Test added with TODO item, so when it passes, we'll know.

do{1} is parsed to two options (block and hashref)

Because do is now capable of being parsed with Block or with Expression, do {1} is parsed twice:

  • Once as a Block with Literal Value of 1
  • Once as an Expression with LitHash literal hash reference

We need to figure out #2 (block and hash disambiguation) in order to resolve which of these is.

(It is pretty cool that it shows it could be either this or that.)

Solve left recursion

Introduce NonBraceExpression brought us back to left recursion issue. This needs to be resolved to maintain unambiguous parsing.

Rewrite the POD and the wiki page

All of the PODs need to be written.

The wiki includes the main changes, but the talk gave a much better breakdown, so:

  • Move the talk content to the POD
  • Update the wiki or remove it
  • Update the README
  • Merge all awaiting approved PRs
  • Add way more examples
  • Release new version

Keywords with parenthesis are also parsed as subcalls

parses('open $fh, "<", "foo";');   # unambiguous: keyword
parses('open($fh, "<", "foo");');  # ambiguous: keyword or subcall

Complete parse tree:

(Program
  (StatementSeq
    (Statement
      (NonBraceExprValueR
        (OpListKeywordExpr
          (OpKeywordOpenExpr
            'open'
            (OpListKeywordArg
              (ExprValueR
                (Value
                  (NonLiteral
                    (ParenExpr
                      (ExprComma
                        (ExprValueL
                          (Value
                            (NonLiteral
                              (Variable
                                (VarScalar '$' (VarName (Ident 'fh')) (ElemSeq0))))))
                        ','
                        (ExprComma
                          (ExprValueL
                            (Value
                              (Literal
                                (NonBraceLiteral (InterpolString '"' '<' '"')))))
                          ','
                          (ExprValueR
                            (Value
                              (Literal
                                (NonBraceLiteral (InterpolString '"' 'foo' '"'))))))))
                    (ElemSeq0)))))))))
    ';'))
(Program
  (StatementSeq
    (Statement
      (NonBraceExprValueR
        (NonBraceValue
          (NonLiteral
            (SubCall
              (NonQLikeIdent (NonQLikeFunctionName 'open'))
              (CallArgs
                (ParenExpr
                  (ExprComma
                    (ExprValueL
                      (Value
                        (NonLiteral
                          (Variable
                            (VarScalar '$' (VarName (Ident 'fh')) (ElemSeq0))))))
                    ','
                    (ExprComma
                      (ExprValueL
                        (Value
                          (Literal
                            (NonBraceLiteral (InterpolString '"' '<' '"')))))
                      ','
                      (ExprValueR
                        (Value
                          (Literal
                            (NonBraceLiteral (InterpolString '"' 'foo' '"'))))))))))))))
    ';'))

glob vs package

A method call on a string is dynamically dispatched. If a filehandle with the given name exists, call will be dispatched to it, otherwise perl will try package name. This does not depend on the method name called, just on the value of the object.

All these are affected:

my $foobar = "Foo::Bar";
$foobar->say();
Foo::Bar->say();
Foo::Bar::->say();
"Foo::Bar"->say();

Behaviour of the code is changed iff a filehandle is assigned to *Foo::Bar, either via call to open or via assignment operator:

open(*Foo::Bar, ">foo");
*Foo::Bar = $filehandle;

Funnily enough, blessing something into "Foo::Bar" undoes the effect:

use strict;
*Foo::Bar = *STDOUT;
my $foobar = "Foo::Bar";
$foobar->say("Hi");
bless {}, "Foo::Bar";
$foobar->say("Hi"); 

outputs:

Hi
Can't locate object method "say" via package "Foo::Bar" at test2.pl line 6.

It seems that this behaviour is fairly new: under 5.18.4 and 5.20.3 the output is different:

Hi
Hi

In the end, this behaviour seems to be baked pretty deep into the interpreter and unavoidable using just syntactic changes.

Pluggable grammars

We can support grammar plugins so people could load additional grammar to parse DSLs.

Examples:

  • Moose
  • Dios
  • Dancer2

We can document the base lexemes so others can use them. They could create their own lexemes with a prefix for them, like MooseKeywordHasExpr and MooseKeywordHas.

Syntax: labels

The Label concept is available but only used for keywords that need labels like next.

We still want proper labeling for expressions and blocks.

Syntax: subroutines

This is mentioned covered under #5 but I'm creating a separate ticket for it to keep track.

  • Forward declarations: sub foo;

  • Named subroutines: sub foo { ... }

  • Subroutine signatures: sub foo (...) {...}

  • Subroutine prototypes: sub foo :prototype(...) {...}

  • Subroutines will always be parsed as having signatures by default in parens, prototypes in :prototype

  • Prototypes will be read, but not used.

At the moment we will not be parsing the definition of sigantures or prototypes. Perhaps in the future, we'll improve the parsing to also understand what they do.

Alternate regexp delimiters trigger a warning

Consider the following code to remove https:// from a string

my $str = "https://google.com";

# Triggers a warning (but is must more readable)
$str =~ s|https://||;

# No warning, but a lot harder to read
$str =~ s/https:\/\///;

I think alternate regexp delimiters should be considered standard. At least some primary chars like: |, /, !', and maybe @` should be OK.

Control whitespace manually

Some places shouldn't have spaces. For example, $foo->@[$foo] cannot be written as $oo -> @ [ $ foo ].

By default, our BNF ignores all spaces and we play around t make it not ignore spaces in critical places.

We should remove the automatic spaces in our BNF and then add spaces where we think appropriate.

Differentiate between unary + number and number of unary '-' char

When we see a -4, it's parsed ambiguously as:

# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprValueR
#         (NonBraceValue (NonBraceLiteral (LitNumber '-' '4')))))))
# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprUnaryR
#         '-'
#         (ExprValueR
#           (Value
#             (Literal (NonBraceLiteral (LitNumber '4')))))))))

Ambiguity with sort

sort $a + $b

parses as both:

# via OpKeywordSort VarScalar OpListKeywordArg
(sort (varscalar $a) (oplistkeywordarg (exprunary + $b)))

 # via OpKeywordSort OpListKeywordArgNonBrace
(sort (oplistkeywordargnonbrace (expradd '+' (varscalar $a) (varscalar $b)))

perl disambiguates to the latter.

This can be fixed with a new expression variant that does not start with an unary operator, similar to NonBraceExpr variants, but I'm don't like the idea of duplicating the whole table again.

Support LiteralValue number with minus and dot

When trying to parse -44.4, we get the following options:

# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprValueR
#         (NonBraceValue
#           (NonBraceLiteral (LitNumber '-' '44' '.' '4')))))))
# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprUnaryR
#         '-'
#         (ExprValueR
#           (Value
#             (Literal
#               (NonBraceLiteral (LitNumber '44' '.' '4')))))))))
# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprAddR
#         (NonBraceExprValueL
#           (NonBraceValue (NonBraceLiteral (LitNumber '-' '44'))))
#         '.'
#         (ExprValueR
#           (Value
#             (Literal (NonBraceLiteral (LitNumber '4')))))))))
# (Program
#   (StatementSeq
#     (Statement
#       (NonBraceExprAddR
#         (NonBraceExprUnaryL
#           '-'
#           (ExprValueL
#             (Value
#               (Literal (NonBraceLiteral (LitNumber '44'))))))
#         '.'
#         (ExprValueR
#           (Value
#             (Literal (NonBraceLiteral (LitNumber '4')))))))))

This isn't urgent. I moved the failing test to TODO.

block and hash disambiguation

This happens when a pair of curly braces is used as a stand-alone statement in a code block (sub, eval, etc), or as the first argument to one of the operators below:

  • print, printf, say
  • system, exec
  • sort, grep, map

(This is not related to prototypes, these operators are always parsed using special rules, even if parenthesis are used around the arguments. In expressions, after return keyword curlies are always treated as a hash literal.)

While a sufficiently powerful parser can probably handle this, rules used for this disambiguation are rather unique and would complicate the parser too much. The disambiguation rules are also not documented in full. In brief, perl checks if there's a comma right after first thing inside the braces (full disclosure below).

I'd like to make curlies always interpreted as a block:

  • for operators above, hash argument only makes sense in map context;
  • hashes at top-level are rare (eg. do "config.pl" or sub { { foo => 1 } }).

Several possible solutions:

  1. Require a semicolon after opening brace in ambiguous situations.

    map { $_ => 0 } @a; # not ok
    map {; $_ => 0 } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub foo { $x } } # not ok
    print { $fh } "hi"; # not ok
  2. Require parens around expressions with comma or fat-comma inside ambiguous blocks.

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok
  3. Require parens around all expressions with comma or fat-comma, if not inside an expression. This is global change, but in return code block syntax becomes the same everywhere (unless I missed anything).

    map { $_ => 0 } @a; # not ok
    map { ($_ => 0) } @a; # ok
    sub foo { { 1 => 2 } } # not ok
    sub foo { return { 1 => 2 } } # ok
    sub foo { 1 => 2 } # not ok
    { my $x; sub { $x } } # ok
    print { $fh } "hi"; # ok

Disambiguation rules

Exact details are not really important to this issue, but I put them here for reference and entertainment.

Perl parses a pair of curly braces as a hash if one of the following is true:

  • there is nothing in the braces;
  • the second token is a fat comma;
  • the second token is a regular comma and the first token starts with a 'q' or non-lowercase letter.

Here token means a quoted string or command ('', "", ````, q{}, `qq{}` and `qx{}`) or a sequence of word characters.

{ } # hash
{ 1 } # block
{ 1, 2 } # hash
{ fuss, 2 } # block
{ Pack, 2 } # hash
{ quiz, 2 } # hash
{ fuss => 2 } # hash
{ qq{} => 2 } # hash
{ qr{} => 2 } # block

Remove indirect notation

Seems like we're able to parse indirect notation using ArrowIndirectCall. I suggest we remove it.

Class methods don't work

Foo->thing(); # Fails to parse

This is because Ident is not part of the matches for NonBraceExprArrow or ExprArrow. I tried adding it by it would then parse stuff like foo()->... with foo being an Ident and fail the rest of the parsing.

Not sure how to fix this. @vickenty, work your magic?

Syntax: Flip-flop operators

We can support .. as an infix operator, but ... needs to be detected because it's different than the ellipsis operator.

Range Operators under perlop.

Syntax: phases

You can define phases with sub. We will understand that. But we should also understand phases as keywords replacing sub:

  • BEGIN
  • INIT
  • CHECK
  • UNITCHECK
  • END

All blocks must end with semicolon, unless last statement

[Edit: added all blocks.]

Because subroutines are considered Statements and Statements use Semicolon between them, the following fails:

sub foo () {...}
sub bar () {...}

Instead, it has to be written this way:

sub foo () {...};
sub bar () {...};

This also affects if() conditions:

if ($foo) {1}                          # ok
if ($foo) {1} foo()                    # not ok
if ($foo) {1}; foo()                   # ok
if ($foo) {1} elsif ($bar) {2} foo();  # not ok
if ($foo) {1} elsif ($bar) {2}; foo(); # ok

A ton of new failures

Test Summary Report
-------------------
t/Statements/Block.t                                    (Wstat: 2048 Tests: 10 Failed: 8)
  Failed tests:  1-8
  Non-zero exit status: 8
t/Statements/Expressions/OpKeywordExpr/OpKeywordChmod.t (Wstat: 2304 Tests: 10 Failed: 9)
  Failed tests:  1, 3-10
  Non-zero exit status: 9
t/Statements/Expressions/OpKeywordExpr/OpKeywordOpen.t  (Wstat: 1280 Tests: 5 Failed: 5)
  Failed tests:  1-5
  Non-zero exit status: 5
t/Statements/Expressions/OpKeywordExpr/OpKeywordSplice.t (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
t/Statements/Expressions/OpKeywordExpr/OpKeywordSplit.t (Wstat: 2048 Tests: 10 Failed: 8)
  Failed tests:  2-5, 7-10
  Non-zero exit status: 8
t/Statements/Expressions/OpKeywordExpr/OpKeywordStat.t  (Wstat: 256 Tests: 1 Failed: 1)
  Failed test:  1
  Non-zero exit status: 1
t/Statements/Expressions/OpKeywordExpr/OpKeywordSub.t   (Wstat: 3328 Tests: 13 Failed: 13)
  Failed tests:  1-13
  Non-zero exit status: 13
t/Statements/Expressions/Value/ArrowDerefVariable.t     (Wstat: 2560 Tests: 20 Failed: 10)
  Failed tests:  1, 3, 5, 7, 9, 11, 13-16
  Non-zero exit status: 10
t/Statements/Expressions/Value/QLikeValue.t             (Wstat: 4352 Tests: 158 Failed: 17)
  Failed tests:  121-128, 131-133, 136-139, 144, 157
  Non-zero exit status: 17
t/Statements/Expressions/arrow.t                        (Wstat: 1280 Tests: 10 Failed: 5)
  Failed tests:  2-3, 7-9
  Non-zero exit status: 5
t/Statements/Expressions/variable.t                     (Wstat: 4352 Tests: 26 Failed: 17)
  Failed tests:  1-5, 7, 11, 13, 15-23
  Non-zero exit status: 17
t/Statements/LoopStatement.t                            (Wstat: 3584 Tests: 23 Failed: 14)
  Failed tests:  1-6, 8-11, 15-16, 20-21
  Non-zero exit status: 14
t/Statements/PackageStatement.t                         (Wstat: 3584 Tests: 15 Failed: 14)
  Failed tests:  1-4, 6-15
  Non-zero exit status: 14
t/Statements/RequireStatement.t                         (Wstat: 512 Tests: 12 Failed: 2)
  Failed tests:  3-4
  Non-zero exit status: 2
t/Statements/SubStatement.t                             (Wstat: 3584 Tests: 32 Failed: 14)
  Failed tests:  4-15, 18, 31
  Non-zero exit status: 14
t/Statements/UseNoStatement.t                           (Wstat: 10240 Tests: 52 Failed: 40)
  Failed tests:  2-6, 12-26, 28-32, 38-52
  Non-zero exit status: 40
t/Statements/WhileStatement.t                           (Wstat: 512 Tests: 2 Failed: 2)
  Failed tests:  1-2
  Non-zero exit status: 2
Files=23, Tests=447, 17 wallclock secs ( 0.15 usr  0.06 sys + 16.14 cusr  0.88 csys = 17.23 CPU)
Result: FAIL

Suggestion: Assigning a scalar to an array without parens should be a warning

I ran in to syntax today I haven't seen before:

my @a = "one";

For the purposes of Guacamole I don't think you should be able to assign a single scalar to an array like this. It should require parens, or qw() or something.

my @a = ("one"); # Preferred

This is a non-intuitive syntax, and we should encourage people to avoid this.

Do not allow q-like Call parsing

q* functions are their own expression. They don't support spaces between the delimiters.

Unfortunately, that means that q () (as well as qq (), qr (), qx ()) are all parsed as functions. We need to make sure functions cannot be called that way.

Variables within interpolated strings

Interpolated strings can have a lot in them. From an analysis perspective, it would be nice to identify things within them, but to which degree should we support it?

  • Variables:
"Hello, $name!"
  • Braced identifier:
"Hello_${name}_$extension"
  • Dereferencing:
"Hello, ${$nameref}!"
  • Postfix dereferencing (feature postfix_qq):
"Hello, $nameref->$*!"
  • Complete expressions within dereferencing:
"Hello, @{[ do_whatever() . " and $name" ]}!"

My general thought is that identifying a single variable or a regular (prefixed) deref is not difficult. That would still require determining #8 to decide whether prefixed deref is supported or not. ${$var} will likely always be more pleasant than $var-&gt;$*`, but it the postfix is likely harder to identify within strings. (Considering I want to enable it for good in future versions, we can still decide to use it.)

Define "Standard Perl"

What makes some code "standard" and others not? What Perl don't I get to use if I want to use guacamole?

Calling time() as a function results in a warning

Testing some more of my code I found another "bug". Calling time() as a function with parenthesis triggers a warning:

if (time() % 15 == 0) {

However... calling it without parenthesis does not trigger a warning:

if (time % 15 == 0) {

Special variables

Other than $_ and @_ which fit the identifiers for variable names.

Other variables (like $^O, etc.) are not supported.

Syntax: ellipsis

... should be supported as a statement, not expression.

do {...} is correct, but return ... or $foo or ... should fail.

return statement

'Subroutine calls' mentions that return there is no special cases for return, but it does not seem possible.

For example:

foo(1, return(2, 3), 4);

If return is not special and treated like any other subroutine, it will be possible to use it in an expression. If return was a subroutine, it would get arguments (2, 3), so under these rules the snippet above should return (2, 3) as well. But in perl this returns (2, 3, 4) because return consumes everything until the last closing parenthesis.

One way or another return needs to be special. Some options are:

  1. only allow return as a stand-alone statement with a pair of top-level parenthesis around arguments.
return(1, 2); # ok
return(1, 2), 3; # not ok
return(@foo); # ok
return @foo; #not ok
foo(1, return 2); # not ok

(package keyword is not allowed in expressions, so it seems there is precedent for stand-alone statement operators like this already).

  1. only allow return as a stand-alone statement, but accept full syntax:
return (1, 2), 3; # ok
return @foo; # ok
foo(1, return 2) # not ok
  1. don't restrict use of return in any way:
foo(1, return 2) # ok
foo(1, return (2, 3), 4) # ok

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.