ngsankha / rbsyn Goto Github PK

View Code? Open in Web Editor NEW

95.0 6.0 4.0 1.09 MB

Program synthesis for Ruby

License: BSD 3-Clause "New" or "Revised" License

Ruby 90.62% Shell 0.09% Python 9.29%

program-synthesis ruby rails ruby-on-rails

rbsyn's People

Contributors

Stargazers

Watchers

Forkers

diohabara designium viamin jigyass

rbsyn's Issues

Ordering of program elimination and branch pruning strategies

Right now, all the strategies are run in an unspecified order by taking the descendants of the base class. Specifying an order is required for best results.

Full username_available? method

Finally get to synthesis of something like:

  def self.username_available?(username, email = nil)
    lower = username.downcase
    return false if reserved_username?(lower)
    return true  if !User.exists?(username_lower: lower)
    # staged users can use the same username since they will take over the account
    email.present? && User.joins(:user_emails).exists?(staged: true, username_lower: lower, user_emails: { primary: true, email: email })
  end

Blockers:

Split prune strategies into separate files

program size might change during synthesis branching conditions

The way our branch synthesis works might sometimes trigger a call to the new candidate programs generation routine. So the final program with the branches might be higher in size than the program before.

The candidate program generation loop takes into account size constraints - but the branching doesn't. Figure out a way to do that while respecting size constraints.

pod#schedule_check_if_needed regression

Test fails now with a no candidate found. It works in the Docker image submitted to artifact review.

auto refactor common subexpression from basic blocks

The subexpression Post.where(slug: arg1).first in #13 is repeated over and over again, it would be nice if it can be refactored out. We have effect information to know that only reads are performed so it should be safe to put this subexpression on top.

Add envref handler to all the AST passes

Update InverseBranchFold strategy to not emit if

We can fall back on the prog cond tuples to do the if statement generation at a later time. Will lead to better branch folding opportunities.

bug in disable side effect flags

Disable side effects flag still knows that a side effect assertion failed just disables the feedback to the synthesis loop. That still an improvement over hunting in the dark. Should stop the assertion check altogether.

Fix this ASAP.

handle overloaded methods

only the first type definition is used during synthesis now. Should use all methods

DSL for specifications

An rspec like DSL for specifying tests that can be used to synthesize functions. Example:

define :username_available?, "(String) -> %bool" do

  spec "returns true when user doesn't exist" do
    username_available? 'bruce1'

    post { |result|
      result == true
    }
  end

  spec "returns false when user exists" do
    pre {
      u = User.create(name: 'Bruce Wayne', username: 'bruce1', password: 'coolcool')
      u.emails.create(email: '[email protected]')
    }

    username_available? 'bruce1'

    post { |result|
      result == false
    }
  end

  assert_equal generate_program, %{
def username_available?(arg0)
  User.exists?(username: arg0)
end
}.strip
end

envref can be used for depth != 0

we only look up envref when filling holes at depth 0. They can also be used for higher depths.

The reasoning for this was the synthesis engine if stumbles across the same expression we can bump up the count. This is slow a lot of times, and reusing old expressions smartly can help.

Joins types are not correct

Joins input type computation ignore the label for a particular association, so they don't work all the time.

Eliminate programs that are not type correct

Given a function argument of type { email: ?String, active: ?%bool, username: ?String, name: ?String }, and if asked to give a program that returns a String the synthesizer will return programs like:

arg0.[](:name)
arg0.[](:username)
arg0.[](:active)
arg0.[](:email)

The program with :active is not correct. Even though it can be eliminated after testing, doing so with type checking could be cheaper. Such programs are generated because the argument can be String or bool. But only after the argument is filled that the precise type can be calculated. Such kind of analysis can be directly implemented as well.

set ordering based on preconditions

reorder worklist based on some similarity index with preconditions

More benchmarks

merge environments

There are multiple environments at the moment:

Type environment: (used at the start) has the function args and return types
Local environment: (used when building expressions) has reusable expressions
Adhoc hashes that are made based on some combination of the above

These could be merged for a much cleaner code

Code cleanup

Need some large scale refactoring to clean up the code.

Synthesizer class interface is outdated after DSL

petri nets?

Can we use petri nets for better computation of paths through types? Side effects help when there are assertions that test that side effect. But there is no way to reach a side effect path if there is no assertion for it. Petri nets could be helpful.

Example:

# type: () -> B
def my_method
  a = A.new()
  a.set_some_state(:foo)
  B.lookup(a)
end

custom exception class

there are place where any possible exceptions are blindly ignored because it could be from the synthesized code (which arguably could be wrong). This makes debugging a pain during development. Make a project specific error class that can be used to wrap all known errors and handled properly.

VSCode Plugin

RbSyn is non-interactive at the moment---there are specs and then the synthesizer finds some code that satisfies it without any further human input. However, the core of RbSyn supports partial programs in the form of holes, but that functionality is not exposed anywhere. A VSCode plugin will be the best way to write programs (partial/full) and let RbSyn do the completion. This involves the following steps:

Add surface syntax support to write type and/or effect holes
Investigate possible ways to integrate type inference of the holes so that types need not be written at the holes
Refactor RbSyn core into a background server that searches for program candidates
A VSCode plugin that takes the surface syntax passes that to a server and displays possible solutions
Investigate with a user study if program synthesis with a human in the loop actually aids in faster and correct programming?

add holes

make holes a part of the AST, right now the expansion of expression feels very adhoc

referring variables in tests

Variables declared in pre-condition, the function invocation and the post condition cannot refer to each other.

better DSL for tests

the specification is provided in a custom DSL that requires us to write pre and post blocks. There is a possible way to write more natural specifications that looks exactly like tests, without separate pre and post blocks.

wrapped_func1:
  call synthesized func
  raise an exception wrapping the return value

wrapped_func2:
  call wrapped_func1
  if need to run user provided post condition:
    catch the exception and return unwrapped return value
  else:
    forward the exception

eval_ast:
  set the flag to indicate if running user provided postcond or not
  define wrapped_func2 instead of synthesized func in the eval context
  eval the spec
  if exception:
    call the engine provided post cond
  else:
    we have run the user post cond

This is a tricky change to make, but will make the user specs look much nicer.

efficient branching

start off with one hole and fill that with the assumption that it will pass all tests.
if one candidate passes n tests and fails m tests, keep the program corresponding to n tests together (along with branch condition) and recursively apply this procedure.
Then apply the merge procedure as in the paper.

This method will eliminate individual cost for generating candidates for each test.

receiver of side effect method may be an expression

the way the code is presently written, the receiver during side effect is always a variable.

variable binding generation

Avoid cyclic dependency in variable assignment by checking DAG of expr environment

Abstract program

Couple of things becoming clear while writing the paper and getting more benchmarks to pass:

ProgWrapper and ProgTuple don't need to be separate. They can be merged into an abstract program that has other abstract programs, the branch and the environment.
Make objects immutable - this will increase memory usage but make things functional and easy to reason about