Giter VIP home page Giter VIP logo

polygolf's Introduction

PolyGolf

Ambitious polyglot autogolfer for https://code.golf

Design goals

  • a language (IR) that compiles to many languages (C, Lua, Java, etc.)
  • targeted languages are limited to those available on https://code.golf
    1. C-like mutable imperative (easy): Bash, BASIC, C, C#, C++, COBOL, Crystal, D, Dart, Fortran, Go, Java, JavaScript, Julia, Lua, Nim, Pascal, Perl, PHP, PowerShell, Python, Raku, Ruby, Rust, Swift, V, Zig
    2. mostly functional: Elixir, F#, Haskell, Lisp
    3. other: Assembly, ><>, brainfuck, GolfScript, Hexagony, J, K, Prolog, sed, SQL, VimL
  • can compile PolyGolf to any language without language-specific annotations
  • goal for each language target (vague): at most twice as long on average for all holes, aka score at least 500× number of holes

Read on for usage, or see docs/architecture.md for the architecture.

Playground

You can try Polygolf in your browser using the Playground.

Local usage

Requires npm and node 20.11.1+ installed.

To get started, clone the repository, then run

npm install
npm run build
npm install . --location=global

This will set up the polygolf command to point to the CLI script.

Polygolf CLI supports the following options:

  • --input, -i: path to the input program
  • --lang, -l: target language name or its extension, if omitted, targets all supported languages
  • --output, -o: output path, if omitted, output goes to stdout
  • --chars, -c: if set, switches the objective from bytes to chars
  • --all, -a: if set, outputs all input variants
  • --debug, -d: if set, outputs debug info

To uninstall, use npm uninstall polygolf --location=global

Competitiveness

PolyGolf is designed to be decent at golfing, so there's concern about making it public with applications to the competitive site https://code.golf. However, I will keep this repository and all PolyGolf features public with a few naive reference solutions. To avoid spoilers, solutions should not be shared unless they are naive/obvious solution to simple holes.

Syntax

Program is a tree of nodes. Node is either

  • integer literal 58,
  • text literal "text literal\n another line",
  • variable $very_important_var,
  • root block op1; op2; op3;
  • a block {op1; op2; op3}, potentially with multiple variants { variant1 / variant2 / variant3 } or
  • s-expression

S-expression takes one of the following forms: opcode or (opcode arg1 arg2 arg3 ...) or (arg1 opsymbol arg2) If the expression is top level or a direct child of a block, it is to be semicolon terminated instead: opcode arg1 arg2 arg3 ...; or arg1 opsymbol arg2;

Only symbols and division ops can be used in place of opsymbol.

Each expression can optionally be annotated with a type expression like this: "text":Text.

Type expression is either

  • Integer range -10..10, -oo..oo, 0..oo,
  • Simple type Text, Bool, Void or
  • S-expression using type expressions (List (Set 0..100))

Literals

Integer literals are either

  • base 10 - no prefix, optionally using a scientific notation, so that 1e6 is the same as 1000000,
  • base 2 - 0b prefix,
  • base 16 - 0x prefix.

String literals are JSON string literals.
List literals are written as n-ary s-expressions:
(list 1 2 3 4 5)
Array and set literals are similar:
(array 1 2 3), (set 1 2 3)
Table literals are n-ary s-expressions taking a variable number of key-value pairs:
(table ("x" => 0) ("y" => 1) ("z" => 2))

Semantics

All nodes in the Polygolf tree are expressions and each expression has a specific type. Some expressions don't return a value - these have the unit type.

Types

Polygolf is strongly typed, featuring the following types:

  • Void - the unit type - this is the return type of statements.
  • Bool - boolean.
  • Int or -oo..oo - integer of unlimited size`, with its subtypes:
    • LowerBound..UpperBound - Where the bounds are inclusive integer literals or "-oo" or "oo".
  • Text - unicode string of unlimited length, with its subtypes:
    • (Text SomeIntegerType), where SomeIntegerType is a subtype of Int and signifies the type of the text codepoint length, for example (Text 1..1) is a text with exactly one codepoint.
    • Ascii - string consisting of ascii characters only.
    • (Ascii SomeIntegerType), where SomeIntegerType is a subtype of Int and signifies the type of the text length, for example (Ascii 1..1) is a text with exactly one ascii character.
  • (List MemberType) - dynamic length, zero indexed sequence of items of type MemberType.
  • (Array MemberType lengthLiteral) - fixed length, zero indexed sequence of items of type MemberType. This currently has limited support.
  • (Table InType OutType) - partial table / dictionary / map from values of type InType to values of type OutType.
  • (Set MemberType) - set of items of type MemberType - note that this currently has zero support on the backend.
  • (Func InType_1 ... InType_n OutType) - a function type - currently no support on the backend.

Polygolf has type inference so for example if variable $a is Int, then Polygolf knows that (10 + ($a mod 10)) is 10..19.

Variables

Each variable must be first used in an assignment. Variable type is determined by the type annotation on the first assignment or the type of the value being assigned, if the annotation is missing.

Special expressions

  • {} - Block - combines multiple expressions into a single one. An important feature of blocks is that they can contain multiple variants / alternatives. These variants are expanded (even recursively) so that several different programs solving the hole are generated and the shortest (compilable) one is chosen. This is very useful when variant A is longer (or cannot Polygolf is unable to compile it) than variant B for some subset of languages, but it is shorter for another subset of languages.
  • assign, <- - assigns a value to a variable.
  • key_value, => - this can only be used as a part of a table literal.
  • func - anonymous function literal - last argument is the body, all others are its arguments.
  • if - if statement - expects a boolean condition and 1-2 bodies - a consequent and an optional alternate.
  • for - a for each loop. Expects an optional loop variable, a list to iterate over and a body. To iterate over an integer range, construct a list using .. or ..<. Has the following syntactic sugars:
    • for $i $n body; for for $i (..<$n) body;,
    • for $n body; for for (..<$n) body;,
    • for $c $text body; for for $c (text_to_list[Ascii] $text) body,
    • for[byte] $c $text body; for for $c (text_to_list[byte] $text) body,
    • for[codepoint] $c $text body; for for $c (text_to_list[codepoint] $text) body,
  • while - a while loop. Expects a boolean condition and a body.
  • for_argv - a loop over input arguments. Expects a loop variable and a static integer literal representing the upper bound on the number of arguments.
  • conditional - a ternary conditional expression. Expects a boolean condition, a consequent and an alternate.
  • unsafe_conditional - same as conditional but both branches can be safely evaluated regardless of the condition.
  • list, array, set, table - construct the respective collection with the given items.
  • argv_get - gets a single input arg. Its argument must be a static integer literal - this cannot be used in a loop.

Polygolf operators

All other expressions are Polygolf operators. Most of them return values, but some are used for I/O and some are used for setting values in collections. Complete list of opcodes.

One can reference on opcode be either its name or its alias. Some opcodes share the alias - this is resolved by the used arity / types of inputs. Symbolic aliases and div, mod can also be used in an infix manner: (+ 2 3) is the same as (2 + 3) and in mutating manner: $x <- ($x + 1); is the same as $x +<- 1;.

Many text opcodes have ...[byte], ...[codepoint], ...[Ascii] variants. Use the ascii one where possible, as that will allow target langs to choose any implementation. The other two will force implementations that are valid outside of the ascii range.

Example

For more examples, search this repo for *.test.md files like this one.

Example Fibonacci using variants

$a:0..832040 <- 0;
$b:0..1346269 <- 1;
for $i 31 {
    println $a;
    {   % temp variable
        $t:0..1346269 <- ($a + $b):0..1346269;
        $a <- $b:0..832040;
        $b <- $t;
    /   % arithmetic trick
        $b <- ($b + $a):0..1346269;
        $a <- ($b - $a):0..832040;
    }
};

This could compile to the following in C

a;b=1;i=1;main(){for(;i++<31;b+=a;a=b-a)printf("%d\n",a);}

Note the following C-specific features, besides the syntax:

  • declarations out front
  • default values (0) omitted
  • statements moved inside the for loop (!)

The same unrefined IR could compile to the following in Lua:

a=0
b=1
for i=1,30 do print(a)a,b=a+b,a end

Note the following Lua-specific features, besides the syntax:

  • foreach-range loop instead of a glorified while loop (!)
  • temporary variable replaced with simultaneous assignment (!)

Golfing plugins

Overview of Polygolf's language unspecific golfing knowledge, demonstrated on Python:

Integer arithmetic

  • 31457283<<20
  • 30000003e6 (in Lua, TODO in other langs where possible)
  • x<=5x<6
  • x%10==0x%10<1
  • (x+1)*(y+1)~x*~y
  • x//32x>>5
  • x**2x*x
  • x*x*xx**3

Boolean arithmetic

  • not(a==5 or b!=10)a!=5 and b==10
  • ~(a|b)~a&~b

Variables

  • x=1,y=1x=y=1
  • x=t;x=y;y=tx,y=y,x
  • x=x*3x*=3
  • x=z;y=x*5y=z*5

Loops

  • for i in range(10):print("O")for _ in"X"*10:print("O")
  • for i in range(len(d)):print(d[i])for i in d:print(i)
  • for i in range(0,10,2):print(i)for i in range(5):print(2*i)
  • for i in range(10,20):print(i+1)for i in range(11,21):print(i)

Identifiers

  • variable=0v=0
  • print(x);print(y);print(z)p=print;p(x);p(y);p(z)

Printing

  • print(end="x\n")print("x")

Literals

  • ["ab","cd","e","X","?!"]"ab cd e X ?!".split()
  • {"a":"A","b":"B","c":"C"}[x]["A","B","C"][["a","b","c"].find(x)]
  • ["A","B","C"][x]"abc"[x]
  • ["A","B","C"].find(x)"abc".find(x)

Chaining & shortcircuiting

  • a<b and b<ca<b<c
  • if a==b:print(a)a==b==print(a)

Tips for writing solutions in Polygolf

  • Don't golf the Polygolf source - let Polygolf do the golfing (especially the simple stuff) for you. This includes:
    • Use whitespace to format the source.
    • Use comments if appropriate.
    • Use descriptive variable names - Polygolf will shorten them for you.
    • Store intermediate results of complex expressions in auxilary variables - Polygolf will inline them for you.
  • Help Polygolf understand the problem. This includes:
    • Explicitly annotate types of values. The type inference algorithm isn't perfect or in some cases can't even possible narrow the type down as much as you can. This is especially relevant for
      • Values coming from argv - perhaps you know they will be ascii or that they will be representing an integer in a certain range.
      • Complex arithmetic expressions.
    • Prefer higher level opcodes if they exist. While Polygolf aims to generally be able to convert between lower level implementation and a higher level one, the conversion from low level to high level is harder and might not always work out for you.
  • Use variants. Polygolf is WIP and the set of golfing rules it knowns is limited. If there are two different equivalent implementations that both are sometimes shorter, include them both using the variant syntax. If you believe the case is general enough and that Polygolf should be able to generate one based on the other, open an issue.

polygolf's People

Contributors

akshatj27 avatar dokutan avatar jared-hughes avatar mewheni avatar michalmarsalek avatar shanethegamer avatar steffan153 avatar valanm22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

polygolf's Issues

Python redundant newline

Language
Python

Polygolf source

for $i 0 10 {
    for $j 0 10 {
        if ($i < $j) ($a <- $j);
        if ($i < $j) ($a <- $j);
    };
    $a <- $i;
};

Current output

for i in range(10):
 for j in range(10):
  if i<j:a=j
  if i<j:a=j

 a=i

Expected output

for i in range(10):
 for j in range(10):
  if i<j:a=j
  if i<j:a=j
 a=i

Comment
Code is valid but unnecesarily longer.

Nim method parenthesization

Compiling

print (text_split "abc" "b");

to Nim produces

include re
"abc".split "b".echo

which throws a type error

Python print without newline

Compiling

print 1;
println 1;

to Python produces

print(end=1)
print(1)

The first line produces a typeError because the end argument must be a string. (The second print is only there so the golfLastPrint plugin isn't run on the first print.)

Variable inlining

Inline variables that are only used once, so that

$a <- (argv_get 0);
$b <- (text_length $a);
$c <- ($b * 3);
print (int_to_text $c);

compiles to Python as

import sys
print(len(sys.argv[1])*3)

This makes writing complex expressions easier in Polygolf by allowing naming the intermediate steps.

Traversal only visits 9 children.

Attempting to build the following polyglot code into Lua gives the error Lua requires inclusive ForRange.

$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
$a <- 1;
for $b 1 10 {
};

Removing one or more of the assignments fixes the error, but it seems having 9 or more assignments causes a bug.

gcd

  • introduce plugin replacing gcd Polygolf builtin with an impl using Euclid's algorithm (for langs without gcd builtin)

Broken builtin aliasing

Language
Python

Polygolf source

println 1;
println 1;
println 1;

Current output

p=print
print(1)
print(1)
print(1)

Expected output

p=print
p(1)
p(1)
p(1)

Comment
Output code is valid but unnecesarily longer.

Print ops

  • Typecheck print and println to expect single Text argument.
  • Add print_int, println_int opcodes.
  • Add println_list_joined_using opcode expecting a list and text.
  • Add println_many_joined_using opcode expecting a arbitrary many arguments and then one text argument
  • Add related plugins.
  • Remove typecheck for print in Python.
  • Add print and println to UnaryOpCode array & type.

Nim int-to-text precedence

Compiling

$b:0..1 <- 1;
$a <- (int_to_text (1 + $b));

to Nim produces

var
 b=1
 a= $1+b

This is a syntax error. It should produce

var
 b=1
 a= $(1+b)

*.test.md.ts files contribute to build and can prevent it from succeeding

This bug only affects development.

When there's an error in a *.test.md.ts, there's no way to automatically recover and developer needs to manually delete all the files with errors.

Causes are either:

  1. Developer error in a *.test.md file.
  2. Switching to another branch.

Fix is either:

  1. Exclude *.test.md.ts from esbuild.
  2. Generate *.test.md.ts files outisde of src folder.
  3. Delete all *.test.md.ts files after testing.

Python simple block body

Language
Python

Polygolf source

for $i 0 10 {
    $a <- $i;
    println $a;
};

Current output

for i in range(10):
 a=i
 print(a)

Expected output

for i in range(10):a=i;print(a)

Comment
Code is valid but unnecesarily longer.

Use truthiness

Add plugin which golfs

if x!=0:print(x)

to

if x:print(x)

Nim Spacing before $

Compiling

print (int_to_text 1);

to Nim produces

echo$1

. This is a syntax error, it should produce

echo $1

Remove debugEmit

Get rid of src/languages/debug as we now have Polygolf target for debugging.

Replace debugEmit's usage in /src/plugins/loops.test.ts with markdown tests.

Abstract dependencies

New languages often have

    // get dependencies
    // TODO: abstract this part for other languages
    // TODO: cache, and maybe do recursive merging for performance

Need some better way to manage imports.

Compiling invalid polygolf

Compiling

print true;

to polygolf produces

print (true);

which when fed into the parser produces a syntax error. Most likely this should not be a syntax error.

Python if-else spacing

Compiling

if (1==1) { println 1; } { println 2; };

to Python produces

if 1==1:print(1)else:print(2)

This is a syntax error, it should produce

if 1==1:print(1)
else:print(2)

Tree walking interpreter

Create a tree walking interpreter.
Use it for collapsing constant expressions and to get output of the whole program.

Function plugins

Add functions related plugins like inlining, unanonymizing functions, etc.

Nim simple block body

Language
Nim

Polygolf source

for $i 0 10 {
    println $i;
    println $i;
};

Current output

for i in..9:
 i.echo
 i.echo

Expected output

for i in..9:i.echo;i.echo

Comment
Code is valid but unnecesarily longer.

Allow compiling to all languages at once

So do like polygolf -i src/programs/fibonacci.polygolf -l all -o fibonacci and it will output to fibonacci.lua, fibonacci.nim, etc.

Another possible thing is that instead of having -o fibonacci, you can have an option like --cg that will automatically upload everything to code.golf (of course, you have to input your token or whatever).

Invalid multivar assignment in Nim

Compiling

if (1 == 1) {
 $a:0..oo <- 1;
 $b:0..oo <- 2;
 $c:0..oo <- ($a + $b);
};

to Nim produces

if 1==1:
 var(a,b,c)=(1,2,a+b)

which throws Error: undeclared identifier: 'a'. A non-erroring code it could produce is

if 1==1:
 var
  a=1
  b=2
  c=a+b

Note: the if statement is just there to force the variable declaration to be indented, causing the erroneous version of the assignment to be used.

Nim - accidental raw string literal

Language
Nim

Polygolf source

print "Hello\nworld!";

Current output

echo"Hello\nworld!"

Expected output

echo "Hello\nworld!"

Comment
Not using the space makes the string literal to be interpreted as a raw one.
For the correct behaviour, space should be inserted if and only if the literal contains escape sequences.
On the other hand, if the literal is already raw, the r prefix should be removed, i.e.
echo r"\n" should be golfed to echo"\n".

Associative binary ops

Add plugin that regroupschained applications of a associative operator, or integrate this ogic into existing plugins.

ForEachArg IR Node

Add an IR node for looping over argv. Some languages have unique syntax for this task compared to other cases of looping over arrays (ie. AWK and COBOL) or require things like accessing argc.

Make syntax for range types and for looping over ranges consistent

  • Remove the current syntax for loops over ranges: for $i 0 100
  • Introduce for $i 0..99 for inclusive upper bound and for $i 0..<100 for exclusive upper bound instead
  • Consider changing the .. alias of text_concat to something else (++ or <>...)
  • Make sure 0..99 is parsed correctly in both type annotation context and loop context.
  • Make sure the bounds can be any expression, including variants (for $i {0 / 1}..<201 {...)

Lua multiple str-concat ordering

Compiling the following polygolf to Lua:

$a <- "Two";
print (.. "One" $a "Three");

outputs

a="Two"
print("Three".."One"..a)

This issue does not occur in Python or Nim.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.