whitequark / parser Goto Github PK

A Ruby parser.

License: Other

Ruby 29.62% Yacc 66.91% Ragel 3.46% Shell 0.01%

parser's Introduction

Parser

Parser is a production-ready Ruby parser written in pure Ruby. It recognizes as much or more code than Ripper, Melbourne, JRubyParser or ruby_parser, and is vastly more convenient to use.

You can also use unparser to produce equivalent source code from Parser's ASTs.

Sponsored by Evil Martians. MacRuby and RubyMotion support sponsored by CodeClimate.

Installation

$ gem install parser

Usage

Load Parser (see the backwards compatibility section below for explanation of emit_* calls):

require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true

Parse a chunk of code:

p Parser::CurrentRuby.parse("2 + 2")
# (send
#   (int 2) :+
#   (int 2))

Access the AST's source map:

p Parser::CurrentRuby.parse("2 + 2").loc
# #<Parser::Source::Map::Send:0x007fe5a1ac2388
#   @dot=nil,
#   @begin=nil,
#   @end=nil,
#   @selector=#<Source::Range (string) 2...3>,
#   @expression=#<Source::Range (string) 0...5>>

p Parser::CurrentRuby.parse("2 + 2").loc.selector.source
# "+"

Traverse the AST: see the documentation for gem ast.

Parse a chunk of code and display all diagnostics:

parser = Parser::CurrentRuby.new
parser.diagnostics.consumer = lambda do |diag|
  puts diag.render
end

buffer = Parser::Source::Buffer.new('(string)', source: "foo *bar")

p parser.parse(buffer)
# (string):1:5: warning: `*' interpreted as argument prefix
# foo *bar
#     ^
# (send nil :foo
#   (splat
#     (send nil :bar)))

If you reuse the same parser object for multiple #parse runs, you need to #reset it.

You can also use the ruby-parse utility (it's bundled with the gem) to play with Parser:

$ ruby-parse -L -e "2+2"
(send
  (int 2) :+
  (int 2))
2+2
 ~ selector
~~~ expression
(int 2)
2+2
~ expression
(int 2)
2+2

$ ruby-parse -E -e "2+2"
2+2
^ tINTEGER 2                                    expr_end     [0 <= cond] [0 <= cmdarg]
2+2
 ^ tPLUS "+"                                    expr_beg     [0 <= cond] [0 <= cmdarg]
2+2
  ^ tINTEGER 2                                  expr_end     [0 <= cond] [0 <= cmdarg]
2+2
  ^ false "$eof"                                expr_end     [0 <= cond] [0 <= cmdarg]
(send
  (int 2) :+
  (int 2))

Features

Precise source location reporting.
Documented AST format which is convenient to work with.
A simple interface and a powerful, tweakable one.
Parses 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1, and 3.2 syntax with backwards-compatible AST formats.
Parses MacRuby and RubyMotion syntax extensions.
Rewriting support.
Parsing error recovery.
Improved clang-like diagnostic messages with location information.
Written in pure Ruby, runs on MRI >=2.0.0, JRuby and Rubinius (and historically, all versions of Ruby since 1.8)
Only one runtime dependency: the ast gem.
Insane Ruby lexer rewritten from scratch in Ragel.
100% test coverage for Bison grammars (except error recovery).
Readable, commented source code.

Documentation

Documentation for Parser is available online.

Node names

Several Parser nodes seem to be confusing enough to warrant a dedicated README section.

(block)

The (block) node passes a Ruby block, that is, a closure, to a method call represented by its first child, a (send), (super) or (zsuper) node. To demonstrate:

$ ruby-parse -e 'foo { |x| x + 2 }'
(block
  (send nil :foo)
  (args
    (arg :x))
  (send
    (lvar :x) :+
    (int 2)))

(begin) and (kwbegin)

TL;DR: Unless you perform rewriting, treat (begin) and (kwbegin) as the same node type.

Both (begin) and (kwbegin) nodes represent compound statements, that is, several expressions which are executed sequentally and the value of the last one is the value of entire compound statement. They may take several forms in the source code:

foo; bar: without delimiters
(foo; bar): parenthesized
begin foo; bar; end: grouped with begin keyword
def x; foo; bar; end: grouped inside a method definition

and so on.

$ ruby-parse -e '(foo; bar)'
(begin
  (send nil :foo)
  (send nil :bar))
$ ruby-parse -e 'def x; foo; bar end'
(def :x
  (args)
  (begin
    (send nil :foo)
    (send nil :bar)))

Note that, despite its name, kwbegin node only has tangential relation to the begin keyword. Normally, Parser AST is semantic, that is, if two constructs look differently but behave identically, they get parsed to the same node. However, there exists a peculiar construct called post-loop in Ruby:

begin
  body
end while condition

This specific syntactic construct, that is, keyword begin..end block followed by a postfix while, behaves very unlike other similar constructs, e.g. (body) while condition. While the body itself is wrapped into a while-post node, Parser also supports rewriting, and in that context it is important to not accidentally convert one kind of loop into another.

$ ruby-parse -e 'begin foo end while cond'
(while-post
  (send nil :cond)
  (kwbegin
    (send nil :foo)))
$ ruby-parse -e 'foo while cond'
(while
  (send nil :cond)
  (send nil :foo))
$ ruby-parse -e '(foo) while cond'
(while
  (send nil :cond)
  (begin
    (send nil :foo)))

(Parser also needs the (kwbegin) node type internally, and it is highly problematic to map it back to (begin).)

Backwards compatibility

Parser does not use semantic versioning. Parser versions are structured as x.y.z.t, where x.y.z indicates the most recent supported Ruby release (support for every Ruby release that is chronologically earlier is implied), and t is a monotonically increasing number.

The public API of Parser as well as the AST format (as listed in the documentation) are considered stable forever, although support for old Ruby versions may be removed at some point.

Sometimes it is necessary to modify the format of AST nodes that are already being emitted in a way that would break existing applications. To avoid such breakage, applications must opt-in to these modifications; without explicit opt-in, Parser will continue to emit the old AST node format. The most recent set of opt-ins is specified in the usage section of this README.

Compatibility with Ruby MRI

Unfortunately, Ruby MRI often changes syntax in patchlevel versions. This has happened, at least, for every release since 1.9; for example, commits c5013452 and 04bb9d6b were backported all the way from HEAD to 1.9. Moreover, there is no simple way to track these changes.

This policy makes it all but impossible to make Parser precisely compatible with the Ruby MRI parser. Indeed, at September 2014, it would be necessary to maintain and update ten different parsers together with their lexer quirks in order to be able to emulate any given released Ruby MRI version.

As a result, Parser chooses a different path: the parser/rubyXY parsers recognize the syntax of the latest minor version of Ruby MRI X.Y at the time of the gem release.

Compatibility with MacRuby and RubyMotion

Parser implements the MacRuby 0.12 and RubyMotion mid-2015 parsers precisely. However, the lexers of these have been forked off Ruby MRI and independently maintained for some time, and because of that, Parser may accept some code that these upstream implementations are unable to parse.

Known issues

Adding support for the following Ruby MRI features in Parser would needlessly complicate it, and as they all are very specific and rarely occurring corner cases, this is not done.

Parser has been extensively tested; in particular, it parses almost entire Rubygems corpus. For every issue, a breakdown of affected gems is offered.

Void value expressions

Ruby MRI prohibits so-called "void value expressions". For a description of what a void value expression is, see this gist and this Parser issue.

It is unknown whether any gems are affected by this issue.

Syntax check of block exits

Similar to "void value expression" checks Ruby MRI also checks for correct usage of break, next and redo, if it's used outside of a {break,next,redo}-able context Ruby returns a syntax error starting from 3.3.0. parser gem simply doesn't run this type of checks.

It is unknown whether any gems are affected by this issue.

Invalid characters inside comments and literals

Ruby MRI permits arbitrary non-7-bit byte sequences to appear in comments, as well as in string or symbol literals in form of escape sequences, regardless of source encoding. Parser requires all source code, including the expanded escape sequences, to consist of valid byte sequences in the source encoding that are convertible to UTF-8.

As of 2013-07-25, there are about 180 affected gems.

\u escape in 1.8 mode

Ruby MRI 1.8 permits to specify a bare \u escape sequence in a string; it treats it like u. Ruby MRI 1.9 and later treat \u as a prefix for Unicode escape sequence and do not allow it to appear bare. Parser follows 1.9+ behavior.

As of 2013-07-25, affected gems are: activerdf, activerdf_net7, fastreader, gkellog-reddy.

Dollar-dash

(This one is so obscure I couldn't even think of a saner name for this issue.) Pre-2.1 Ruby allows to specify a global variable named $-. Ruby 2.1 and later treat it as a syntax error. Parser follows 2.1 behavior.

No known code is affected by this issue.

EOF characters after embedded documents before 2.7

Code like "=begin\n""=end\0" is invalid for all versions of Ruby before 2.7. Ruby 2.7 and later parses it normally. Parser follows 2.7 behavior.

It is unknown whether any gems are affected by this issue.

Contributors

whitequark
Markus Schirp (mbj)
Yorick Peterse (yorickpeterse)
Magnus Holm (judofyr)
Bozhidar Batsov (bbatsov)

Acknowledgements

The lexer testsuite is derived from ruby_parser.

The Bison parser rules are derived from Ruby MRI parse.y.

Contributing

Make sure you have Ragel ~> 6.7 installed
Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

parser's People

Contributors

Stargazers

Watchers

Forkers

mbj judofyr haileys bbatsov jmettraux hannestyden jonas054 yaauie howto-ruby ttwonovan joshcheek misfo jeffkreeftmeijer bf4 dblock jjedmorianktah nevir brixen cremno yujinakayama bitland tungsten-lang ysei arrmac agrimm luqipan dblommesteijn hirocaster sanemat drogar sfenig deivid-rodriguez kubicle rgbd michelangelo13 alehander92 rousisk ravinggenius lovehandle sanghvisagar rrosenblum modulexcite wispproxy alexdowad jameszhan walf443 johnnysparks copeus spickermann syndbg segiddins mlarraz nelhage dlobatog ryoqun iliabylich neo4u strongruby eagletmt pocke robblanco pombredanne riddochc mvidner soutaro take4 johnlinvc nateberkopec backus thinkryan drenmi humane-documentation mckaz parndt jormon olleolleolle step9 tagliala gwsu jameslinus fhwang denisdefreyne sourcegraphtest vmg hakubjozak pat helencampbell piyush9620 koic maxlap abaldwinhunter marcandre typedruby parser-no-warning rgo janbiedermann dpostorivo mifrill stanhu rattrayalex

parser's Issues

Needed: Guide to migrate ripper-dependent libraries to use parser

Libraries that use 'ripper' are cruby-only. Switching them to use parser would make them compatible with other rubies. We should put together a guide for how to achieve this migration, and any costs/benefits of doing so.

Incorrect parsing of here documents

When parsing this file:

puts <<D
ABCDEF
D

a parsing error occurs:

$ ruby-parse ex3.rb 
ex3.rb:2:5: error: unexpected token tCONSTANT
ABCDEF
    ^^

A second problem is that when the end-of-text marker does not occur within the string,

puts <<D
ABCEF
D

a const node is produced:

$ ruby-parse ex4.rb 
(begin
  (send nil :puts
    (str "ABCEF\n"))
  (const nil :D))

Parsing class/module definitions yields a nil name

Here's an example:

Parser::Ruby20.parse("class SomeClass < StandardError; end").source_map
=> #<Parser::Source::Map::Definition:0x007f8ae14ae4f8
 @end=#<Source::Range (string) 33...36>,
 @expression=#<Source::Range (string) 0...36>,
 @keyword=#<Source::Range (string) 0...5>,
 @name=nil,
 @operator=#<Source::Range (string) 16...17>>

I'm pretty sure @name should not be nil.

Tested with Parser 1.3.1.

Unknown opcode 0x86 (RuntimeError)

furnace-avm2 -i fram1.abc -d -o ff.abc
/usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/primitives/opcode_sequence.rb:206:in parse': Unknown opcode 0x86 (RuntimeError) from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/primitives/opcode_sequence.rb:65:inopcode_at'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/metadata/exception_info.rb:19:in resolve!' from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/metadata/method_body_info.rb:20:inblock in after_read'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/metadata/method_body_info.rb:19:in each' from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/abc/metadata/method_body_info.rb:19:inafter_read'
from (generated-io:Furnace::AVM2::ABC::MethodBodyInfo):54:in read' from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/binary/record.rb:314:inread_nested'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/binary/record.rb:340:in block in read_array' from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/binary/record.rb:338:intimes'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/lib/furnace-avm2/binary/record.rb:338:in read_array' from (generated-io:Furnace::AVM2::ABC::File):64:inread'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/bin/furnace-avm2:76:in block in <top (required)>' from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/bin/furnace-avm2:74:inopen'
from /usr/lib/ruby/gems/1.9.1/gems/furnace-avm2-1.0.3/bin/furnace-avm2:74:in <top (required)>' from /usr/bin/furnace-avm2:23:inload'
from /usr/bin/furnace-avm2:23:in `

Problems with multiline comments

I'm currently playing with your awesome tool. When I tried to parse a huge code base, I
noticed, that parser seems to have a problem with multiline comments.

I created the following test case:

def test_multiline
=begin
This is a documentation comment
=end
end

I tried to parse it with the following code

require 'parser/ruby19'

code = File.read('testcase.rb')
xxxx = Parser::Ruby19.parse(code)
puts xxxx

I'm getting this exception:

.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.0/lib/parser/diagnostic/engine.rb:24:in `process': unexpected token tEQL (Parser::SyntaxError)
    from /Users/mmuench/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.0/lib/parser/base.rb:120:in `on_error'
    from (eval):3:in `_racc_do_parse_c'
    from (eval):3:in `do_parse'
    from /Users/mmuench/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.0/lib/parser/base.rb:66:in `parse'
    from /Users/mmuench/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.0/lib/parser/base.rb:18:in `parse'
    from multicommenttest.rb:4:in `<main>'

Add a way to build trees with a subclass of Parser::AST::Node

Per @yorickpeterse's suggestion.

Keyword is nil in rescue node source map

Here's an example:

Parser::Ruby20.parse('test rescue log').src
=> #<Parser::Source::Map::Condition:0x007f8ae60c98d8
 @begin=nil,
 @else=nil,
 @end=nil,
 @expression=#<Source::Range (string) 0...15>,
 @keyword=nil>

If would be nice if this behaved similarly to regular/modifier if:

Parser::Ruby20.parse('test if log').src
=> #<Parser::Source::Map::Keyword:0x007f8ae60f90b0
 @begin=nil,
 @end=nil,
 @expression=#<Source::Range (string) 0...11>,
 @keyword=#<Source::Range (string) 5...7>>

Having the keyword rescue location at the (rescue ...) node would make it easier to detect if rescue was used in regular or modifier position(which I need to do, otherwise I would not have bothered you with this). It would also make the generated nodes more consistent with those for if, unless , etc.

Error when parsing Exceptions

I found bug when I tried to parse the following ruby code:

begin
  eval "1+1"
rescue SyntaxError, NameError => boom
  print "String doesn't compile: " + boom
rescue::Exception => bang
  print "Error running script: " + bang
end

The Problem is the "rescue::Exception => bang" line. While this is horrible code style, it
seems to be valid ruby code (ruby 1.9 executes it without any problems).

Missing "ambiguous first argument"-warning

See: https://github.com/rtomayko/tilt/blob/master/test/tilt_wikiclothtemplate_test.rb#L22

$ ruby -c -w test/tilt_wikiclothtemplate_test.rb
test/tilt_wikiclothtemplate_test.rb:22: warning: ambiguous first argument; put parentheses or even spaces
test/tilt_wikiclothtemplate_test.rb:27: warning: ambiguous first argument; put parentheses or even spaces
Syntax OK

ruby-parse -w doesn't give any warnings.

Constants as method definition receivers are incorrectly parsed

Parser incorrectly parses method definitions using a constant as a receiver as a method definition on a method call. For example, the following code:

require 'parser'
require 'parser/ruby19'

Parser::Ruby19.parse('def String.example; end')

Results in the following AST:

(defs
  (send nil :String) :example
  (args)
  (nil))

Instead it should be the following:

(defs
  (const nil :String) :example
  (args)
  (nil))

Does not detect duplicate argument names

Does not handle the # coding: magic comment

Add dot/operator attribute to the :send node source map

As discussed here - it would be useful to have a quick way to get the type of the . operator(. or ::) and its position when dealing with :send nodes.

Does not parse "def foo bar: 1" in 2.1 mode

Better location info for "unicode codepoint too large" diagnostic

While this is an edge case, it should be done for completeness. (And it may actually be very useful.)

Backports gem

We've discussed a while back it might be useful to backport certain functionality from 1.9 and 2.0 to enhance performance on older Rubies and easy the maintenance burden of the Parser codebase(less if respond_to? :something branches). There is a backports gem, that is pretty modular and seems to be well written and tested. If it's OK by you we might include it and leverage some useful stuff from it - like bsearch for Ruby 1.9 and 1.8 for instance.

Buffer#line_begin_positions performance

When I started to port RuboCop to Parser I noticed that using the source_map functionality degrades performance significantly. Using ruby-prof I got the following output:

 %self      total      self      wait     child     calls  name
 63.84     12.969    12.969     0.000     0.000     1190   String#each_char
 11.52      3.460     2.340     0.000     1.120     1639   Parser::Lexer#advance
  3.20      0.956     0.651     0.000     0.305    23367   Kernel#loop
  1.25      0.254     0.254     0.000     0.000   306981   Module#===

The following Buffer methods seem to be the problem:

      def line_begin_positions
        # TODO: Optimize this.
        [0] + source.
          each_char.
          with_index.
          select do |char, index|
            char == "\n"
          end.map do |char, index|
            index + 1
          end
      end

      def line_for(position)
        # TODO: Optimize this.
        line_begin_positions.rindex do |line_beg|
          line_beg <= position
        end
      end

Looking at the TODO I'm pretty sure you're aware of the problem, but I decided to bring it up anyways, since this is a real performance-killer.

AST of the body of a method definition makes no sense

When parsing a method definition with a body that is more than a single line the results are inconsistent.

A method definition with a simple body:

require 'parser'
require 'parser/ruby19'

code = <<-EOF
def example
  return 10
end
EOF

Parser::Ruby19.parse(code)

This produces the following AST:

(def :example
  (args)
  (return
    (int 10)))

However, when using a more complex definition body:

require 'parser'
require 'parser/ruby19'

code = <<-EOF
def example
  # Not really complex but you get the idea.
  10
  20
  30
end
EOF

Parser::Ruby19.parse(code)

We get this instead:

(def :example
  (args)
  (begin
    (int 10)
    (int 20)
    (int 30)))

In both cases Ripper would wrap the body in a (body) node, regardless of the amount of lines/nodes in the definition body.

Document the ability to specify a custom node class

See #18, the ability to specify a custom AST node (and how) should be documented either in the source code or in a separate Markdown file.

DOS line endings result in strange token positions

Again, I'm using rubocop to demonstrate a problem that I firmly believe lies within parser. Hope that's OK.

File with CR+LF line endings:


a = {}

Command:

./bin/rubocop --only SpaceAroundBraces /tmp/doseol.rb

Output (with additional printout of tokens that I've added in rubocop):

[[[1, 1], tIDENTIFIER, "a"],
 [[2, 1], tEQL, "="],
 [[2, 3], tLBRACE, "{"],
 [[2, 4], tRCURLY, "}"],
 [[2, 5], tNL, nil]]

It should be:

[[[2, 0], tIDENTIFIER, "a"],
 [[2, 2], tEQL, "="],
 [[2, 4], tLBRACE, "{"],
 [[2, 5], tRCURLY, "}"],
 [[2, 6], tNL, nil]]

String interpolation parsing issue

Here's a strange thing I noticed

ruby-parse -e '"#{"A"}"'
# => (str "A")

I'm pretty sure that the correct top-level node should be a dstr, otherwise that's kind of confusing. Especially when compared to:

ruby-parse -e '"a#{"A"}"'
# => (dstr (str "a") (str "A"))

I feel that the output from the first example should be (dstr (str "A")).

Needs RubyParser compatibility layer

Virtually every library that depends on parsing Ruby source currently uses ruby_parser. It would be neat to have a compatibility layer for it. The current decoupled parser+builder architecture allows to add that easily.

Error parsing UTF-8 characters

From rubocop/rubocop#219

Here's the set of tests I ran in RuboCop and their results:

Ruby MRI 1.9.3 - No encoding line: Syntax errors
Ruby MRI 1.9.3 - Encoding line: No errors
Ruby MRI 2.0.0 - No encoding line: Parse error (see stack trace below)
Ruby MRI 2.0.0 - Encoding line: No errors

All cases used the word "düsseldorf" in a single-quoted string as the test and the file was stored as UTF-8.

Test cases 1, 2 and 4 are all expected behavior. Test case 3 appears at first glance to be a bug in Parser.

Stack trace:

"\xC3" from ASCII-8BIT to UTF-8
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:48:in `encode'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:48:in `reencode_string'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:83:in `source='
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:62:in `block in read'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:61:in `open'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/gems/parser-2.0.0.beta3/lib/parser/source/buffer.rb:61:in `read'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:82:in `block in inspect_file'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:233:in `parse'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:81:in `inspect_file'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:59:in `block in run'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:41:in `each'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/lib/rubocop/cli.rb:41:in `run'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/bin/rubocop:14:in `block in <top (required)>'
/Volumes/Data/Users/Lee/.rvm/rubies/ruby-2.0.0-p0/lib/ruby/2.0.0/benchmark.rb:296:in `realtime'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bundler/gems/rubocop-5472bdf87ab6/bin/rubocop:13:in `<top (required)>'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bin/rubocop:23:in `load'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bin/rubocop:23:in `<main>'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bin/ruby_noexec_wrapper:14:in `eval'
/Volumes/Data/Users/Lee/.rvm/gems/ruby-2.0.0-p0@kangaruby/bin/ruby_noexec_wrapper:14:in `<main>'

Confusing node types when dealing with assignments

In certain cases parser returns different node types when assigning class variables and constants compared to when assigning other variables (e.g. locals).

Output when assigning a local variable:

# number = 10
(lvasgn :number (int 10))

Assigning an instance variable:

# @number = 10
(ivasgn :@number (int 10))

And a class variable:

# @@number = 10
(cvdecl :@@number (int 10))

However, when defining a class variable inside a method you get the following instead:

# def foo; @@number = 10; end
(def :foo
  (args)
  (cvasgn :@@number
    (int 10)))

Constants also have their type set to cdecl opposed to casgn.

Although this may be due to semantics or the way MRI/the spec states that things should be, I find this difference highly confusing.

Especially from a user perspective this becomes annoying. You'll end up having to define multiple callbacks (when iterating over the AST) to deal with both assignment types (even though they do exactly the same in a lot of cases). Another problem is that this will inevitably lead to too much if/else logic to deal with these differences.

String interpolation is broken

require 'parser/ruby19'

Parser::Ruby19.parse('"#{10}"') # => (int 10)

Parsing error for method call near comment

ruby-parse fails when parsing a file with this content:

f a, b #
f

I get

jonas@jonas-laptop:~$ /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-2.0.0.beta2/bin/ruby-parse /tmp/ex1.rb 
/tmp/ex1.rb:2:1: error: unexpected token tIDENTIFIER
f
^

Better whitespace handling in lexer

Currently, whitespace handling in lexer is in most places fragile and may contain errors. It should be refactored.

gemspec not valid on master

$ mkdir test
$ cd test
$ vi Gemfile
$ cat Gemfile

source 'https://rubygems.org'

gem 'parser', :git => 'git://github.com/whitequark/parser.git'

$ bundle -v
Bundler version 1.3.5
$ bundle install
Updating git://github.com/whitequark/parser.git
Fetching gem metadata from https://rubygems.org/.........
Fetching gem metadata from https://rubygems.org/..
Resolving dependencies...
Using ast (1.0.2)
Using slop (3.4.5)
Using parser (1.3.2) from git://github.com/whitequark/parser.git (at master)
parser at /Users/jmettraux/.rvm/gems/ruby-1.9.3-p194/bundler/gems/parser-076a98d148bb did not have a valid gemspec.
This prevents bundler from installing bins or native extensions, but that may not affect its functionality.
The validation message from Rubygems was:
  ["lib/parser/lexer.rb", "lib/parser/ruby18.rb", "lib/parser/ruby19.rb", "lib/parser/ruby20.rb", "lib/parser/ruby21.rb"] are not files

Using bundler (1.3.5)
Your bundle is complete!
Use `bundle show [gemname]` to see where a bundled gem is installed.
$

AST Visitor

I know Parser already has an AST Processor, but I was wondering if it might be sensible to have a simpler method for just walking the AST, yielding interesting nodes to blocks and ignoring certain nodes when necessary. In RuboCop we use this method:

      def on_node(types, node, excludes = [])
        yield sexp if Array(types).include?(node.type)

        return if Array(excludes).include?(node.type)

        node.children.each do |elem|
          if Parser::AST::Node === elem
            on_node(types, elem, excludes) { |s| yield s }
          end
        end
      end

I guess this is generic enough to be useful to other Parser users. Maybe we could have an AST::Visitor mixing with this as the sole method or something similar. Since I'm no parsing and walking AST expert by any means I'd like to hear what you think about the idea.

Needs a way to get a parser for currently running Ruby

Accepts multiple assignment in a conditional context

Demonstrated by:

$ ruby -e 'if (a, b = foo); end'
-e:1: multiple assignment in conditional
$ ruby -Ilib -rparser/ruby19 -e 'p Parser::Ruby19.parse("if (a, b = foo); end")'
(if
  (masgn
    (mlhs
      (lvasgn :a)
      (lvasgn :b))
    (send nil :foo))
  (nil) nil)

AST’s do not match with `align_eq.rb` example

Thanks for parser. Trying align_eq.rb from the blog post [1], I get the following problem.

$ #parser 1.4 from `gem install parser`
$ more align_eq.rb test.prawn
::::::::::::::
align_eq.rb
::::::::::::::
class AlignEq < Parser::Rewriter
  def on_begin(node)
    eq_nodes = []

    node.children.each do |child_node|
      if assignment?(child_node)
        eq_nodes << child_node
      elsif eq_nodes.any?
        align(eq_nodes)
        eq_nodes = []
      end
    end

    align(eq_nodes)

    super
  end

  def align(eq_nodes)
    aligned_column = eq_nodes.
      map { |node| node.src.operator.column }.
      max

    eq_nodes.each do |node|
      if (column = node.src.operator.column) < aligned_column
        insert_before node.src.operator, ' ' * (aligned_column - column)
      end
    end
  end
end
::::::::::::::
test.prawn
::::::::::::::
require 'barby'

font_size = 12
font_size_x = 16
$ ruby-rewrite -l align_eq.rb test.prawn
ASTs do not match:
--- test.prawn
+++ test.prawn|after AlignEq
@@ -1,7 +1,8 @@
 (begin
   (send nil :require
     (str "barby"))
-  (lvasgn :font_size
-    (int 12))
+  (send nil :font_siz
+    (lvasgn :e
+      (int 12)))
   (lvasgn :font_size_x
     (int 16))

What am I missing?

[1] http://whitequark.org/blog/2013/04/26/lets-play-with-ruby-code/

Does not detect if both an actual block and block-pass are specified

Error when processing constant assignment

I have an error which I can reproduce via rubocop, but the error occurs in parser (or ast), so I report it here. I've added printing of the AST in rubocop.

/tmp/ex2.rb:

A, B = f

Error report:

jonas@jonas-laptop:~/dev4/rubocop$ ./bin/rubocop --only PercentR -d /tmp/ex2.rb 
Scanning /tmp/ex2.rb
(masgn
  (mlhs
    (casgn nil :A)
    (casgn nil :B))
  (send nil :f))
An error occurred while PercentR cop was inspecting /tmp/ex2.rb.
undefined method `to_ast' for nil:NilClass
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:232:in `process'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-2.0.0.beta2/lib/parser/ast/processor.rb:78:in `on_casgn'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:237:in `process'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:253:in `block in process_all'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:252:in `map'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:252:in `process_all'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-2.0.0.beta2/lib/parser/ast/processor.rb:6:in `process_regular_node'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:237:in `process'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:253:in `block in process_all'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:252:in `map'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:252:in `process_all'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-2.0.0.beta2/lib/parser/ast/processor.rb:6:in `process_regular_node'
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/ast-1.0.2/lib/ast/processor.rb:237:in `process'
/home/jonas/dev4/rubocop/lib/rubocop/cop/cop.rb:51:in `inspect'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:100:in `block in inspect_file'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:92:in `each'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:92:in `inspect_file'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:59:in `block in run'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:41:in `each'
/home/jonas/dev4/rubocop/lib/rubocop/cli.rb:41:in `run'
./bin/rubocop:14:in `block in <main>'
/home/jonas/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/benchmark.rb:295:in `realtime'
./bin/rubocop:13:in `<main>'

1 file inspected, no offences detected

I used the --only flag just to shorten the output.

2.0 release schedule

It seems that all issues I want to have solved in 2.0 (#31, #36, #46) are folding nicely, and both the API and internals are being ironed out. So, I'd like to explain the process for 2.0 release.

As soon as #31 is solved, I release 2.0.0.beta1, so you could play with the new features and port your code to the changed API.
Proposals which add API incompatibility are accepted.
Public API is being documented.
At 2013-07-01, I release 2.0.0.pre1. API changes are discouraged.
At the same time, 1.5.0 is released, which deprecates the old APIs (see below). It is possible to write code compatible with both 1.5.0 and 2.0.0.
I'm still paying attention to the API consumers and tweaking interfaces if necessary.
Public API documentation is being finished.
At 2013-08-01, I release 2.0.0. Hopefully this will be the last major version of Parser.

The main goal is to have a comprehensive documentation of the public API in the YARD format, which will allow, in particular, to strictly follow Semantic Versioning requirements.

If you have something to say, please do. Help with documentation is gladly accepted.

Incompatibilities in 2.0 (#40, #26, #31, #70, #71):

Method renamed: Source::Range#to_source → Source::Range#source.
Method renamed: AST::Node#source_map → AST::Node#location.
Method renamed: AST::Node#src → AST::Node#loc.
Node removed: (cvdecl). (cvasgn) is emitted instead.
Node renamed: (cdecl) → (casgn).
"Synthesized" (nil) nodes are removed. Plain nil (primitive) or empty (begin) nodes are emitted instead, depending on presence of delimiters (parens, begin/end).
Node splitted: (begin) → (kwbegin). For keyword begin (begin..end), a (kwbegin) node is emitted. Unlike (begin), (kwbegin) nodes may have only a single child.

RDocs online anywhere

I did some searching for the Parser Rdocs on the internet, and couldn't seem to find them anywhere. I ended up browsing them locally, so I know they exist.

Is there something I am missing?

Error when the word encoding appears in a string

I get this error in a file where the first line is require 'cane/encoding_aware_iterator'.

$ /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/bin/ruby-parse /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/cane-2.5.2/lib/cane/file.rb
/home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:29:in `find': unknown encoding name - _aware_iterator (ArgumentError)
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:29:in `recognize_encoding'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:40:in `reencode_string'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:83:in `source='
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:62:in `block in read'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:61:in `open'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/source/buffer.rb:61:in `read'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:154:in `block in process_files'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:152:in `each'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:152:in `process_files'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:134:in `block in process_all_input'
    from /home/jonas/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/benchmark.rb:280:in `measure'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:132:in `process_all_input'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner/ruby_parse.rb:115:in `process_all_input'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:32:in `execute'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/lib/parser/runner.rb:12:in `go'
    from /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/bin/ruby-parse:6:in `<main>'

List expected tokens in syntax error messages

The unexpected_token diagnostic should suggest several expected variants. While this functionality is not exported by Racc, it should be trivial to extract from the tables.

unparseable input produces (nil)

Whereas ruby_parser simply passes the Racc::ParseError, parser says it's a nil node.

$ ruby -e "asdf:123"
-e:1: syntax error, unexpected tINTEGER, expecting tSTRING_CONTENT or
tSTRING_DBEG or tSTRING_DVAR or tSTRING_END
$ bundle exec ruby-parse -e "asdf:123"
(nil)
$ bundle exec ruby-parse -e "nil"
(nil)

I'd love to be able to differentiate easily what parses and what doesn't.

[1.4.1] lib/parser/lexer.rb:3139:in `source_buffer=': undefined method `source'

Trying the following example code

$ more ruby_parse.rb
require 'parser/current'

class PrawnProcessor < Parser::AST::Processor
  def on_send(node)
    p "Nice, a send node\n"
  end

  def on_if(node)
    p "Nice, an if node\n"
  end
end

def process(buffer)
  parser = Parser::CurrentRuby.new
  p buffer
  ast = parser.parse(buffer)
  puts ast

  PrawnProcessor.new.process(ast)
  puts "done"
end

filename = ARGV[0]
process(File.read(filename))
$ more test3.rb
2 + 2
$ ruby ruby_parse.rb test3.rb
"2 + 2\n"
/home/joey/.rvm/gems/ruby-1.9.3-p429@sozialdienst2/gems/parser-1.4.1/lib/parser/lexer.rb:3139:in `source_buffer=': undefined method `source' for "2 + 2\n":String (NoMethodError)
    from /home/paul/.rvm/gems/ruby-1.9.3-p429@sozialdienst2/gems/parser-1.4.1/lib/parser/base.rb:64:in `parse'
    from ruby_parse.rb:16:in `process'
    from ruby_parse.rb:24:in `<main>'

Unfortunately I am not able to test if this is a regression, as

gem install parser -V 1.4.0

did not work for some reason.

Lexing support

This issue is closely related to #31 and, incidentally, #36.

What we want is to be able to extract comments from source code, and to be able to get the token stream from it. As it is not possible to tokenize Ruby source without parsing it, the API is bound to Parser::Base.

Suggested API changes/modifications:

Parser::Base#parse(source_buffer). This is a high-level API.
Returns a 2-tuple of [ ast, comments ]. comments is an array of Parser::Source::Comment. Each Comment provides the following:
- #text. Text of the comment, including leading #/=begin.
- #type returning :inline or :document, #inline? and #document?.
- #location, aliased to #loc. Parser::Source::Range describing the source mapping for the comment.
Parser::Base#tokenize(source_buffer). This is a low-level API.
Returns a 3-tuple of [ ast, comments, tokens ]. ast and comments are described above. tokens is an Array of tuples of the following form: [ type, [ value, location ] ].
- type is a Symbol reflecting the type of the token. Comments are present in the stream at appropriate places and have the type :tCOMMENT.
- value is a token-specific token value.
- location is a Parser::Source::Range describing the source mapping for the token.

Better location info for "embedded document meets end of file" diagnostic

Currently, this diagnostic just points at EOF. Make it point at the opening =begin.

Needs proper source map construction

The infrastructure is in place; however, we need a more convenient way to compose source maps and add the composition to all the nodes.

Line Numbers in AST

I would currently like to use this to parse some ruby, but I need the ability to associate particular parts of the parse tree with specific line numbers in the source files. Is this currently possible? If not, is it feasibly to add? I'm potentially willing to write the patch myself, but I have no idea where to start.

Thanks,
Evan

Empty class/module defs break the AST::Processor

Processing this code class Test; end results in:

NoMethodError:
       undefined method `to_ast' for nil:NilClass

in the on_class method in AST::Processor.

I guess some nil check is missing in AST::Processor.

Parsing error when calling a capitalized method without parentheses

I get

$ /home/jonas/.rvm/gems/ruby-1.9.3-p194/gems/parser-1.3.3/bin/ruby-parse /tmp/example.rb 
/tmp/example.rb:5:9: error: unexpected token tCARET
Output /^I/
        ^

in this file

def Output(arg)
  p arg
end

Output /^I/

Only the last line is necessary to demonstrate the problem. I've added the method definition to establish that this is a valid ruby program. It prints /^I/ when I run it.

Here-doc problem on Windows

There's an issue for rubocop where you'll find a description of the problem. It can be reproduced with CR+LF line endings, while LF line endings work fine.

Improve efficiency of encoding detection in Source::Buffer

Currently, in order to extract first two lines, the entire file is split into an array.

The proper way is probably to use a regular expression to find whatever's between first two \n's.

Extracting comments from source code

Not really a issue, just a question I wasn't sure where to ask.

rubocop has some comment style checks that need access to the tokens generated by the lexer, since comments for obvious reasons are not part of the parser AST. Basically I need some equivalent of Ripper.lex. I noticed the Lexer class in Parser's source code and its use to generate the output of ruby-parser -E, but I'm quite certain how can I interact with it to simply get a list of tokens with their text and locations.

How to handle 1.9/2.0 files in non-US-ASCII encoding when running on 1.8.7?

I see two options here:

Forbid loading parsers for >1.8 on 1.8.
Check encoding for the files being loaded, and bail out if it's not US-ASCII/ASCII-8BIT.

whitequark / parser Goto Github PK

parser's Introduction

Parser

Installation

Usage

Features

Documentation

Node names

(block)

(begin) and (kwbegin)

Backwards compatibility

Compatibility with Ruby MRI

Compatibility with MacRuby and RubyMotion

Known issues

Void value expressions

Syntax check of block exits

Invalid characters inside comments and literals

\u escape in 1.8 mode

Dollar-dash

EOF characters after embedded documents before 2.7

Contributors

Acknowledgements

Contributing

parser's People

Contributors

Stargazers

Watchers

Forkers

parser's Issues

Recommend Projects

Recommend Topics

Recommend Org