Giter VIP home page Giter VIP logo

swiftstudies / oysterkit Goto Github PK

View Code? Open in Web Editor NEW
179.0 179.0 24.0 8.48 MB

OysterKit is a framework that provides a native Swift scanning, lexical analysis, and parsing capabilities. In addition it provides a language that can be used to rapidly define the rules used by OysterKit called STLR

License: BSD 2-Clause "Simplified" License

Swift 99.95% Shell 0.05%
decoder language lexical-analysis parsing-capabilities swift

oysterkit's People

Contributors

eimantas avatar joemcbride avatar kigi avatar maxdesiatov avatar pietbrauer avatar swiftstudies avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oysterkit's Issues

Support parsing a plain byte stream

Based on the examples and the definition of a parse() function it looks like it's only possible to parse strings, not the more generic Data. I think it would be a nice feature if it would be possible to run the parser over a generic Data as this would allow to parse more complex grammars where some rules may define a blob array or something like that.

Finalize changes to STLR language

Remove some of the old transient tokens that are no longer needed, and make the new ones that can be used for error propagation and handling standard instead of custom.

Bug? Pinned tokens could be adopted by their parents

When a successful match results in no node OR transient for a HomogenousAST the children are hoisted to the parent. Is this behaviour correct?

Investigation should validate that

  • It is not at least a warning that a pinned node is also transient or resulted in no node being created by the constructor (perhaps a side effect of the complexity of this approach?)
  • That if there are pinned children we should not preserve the structure (I don't think so, unless the parent is also pinned)

Nested expression matching

One awesome thing would be ability to nest pattern matches/calls within your grammar, like a PEG, Antlr, or even a Flex/Bison:

Example:

statement = [a-zA-Z0-9]+
statements = statement + statements

It might do this, just didn't see anything in the docs about it.

ps. awesome work on regex support!

Suggestion to rename .whitespaces to .whitespace or .whitespaceOrTab or to change their behaviour

When reviewing #47 I realised that .whitespaces naming doesn't play well with .whitespaces*, .whitespaces+ or .whitespace?. It implies that many whitespaces may be matched, while only one is matched.

It could be better for library user experience to either rename it or make it match multiple symbols.
Possible naming: .whitespace, .whitespaceChar, .whitespaceCharacter, .whitespaceOrTab.

Same actually could be applied to .newlines, which I think matches only one newline character.

Overall, looking at character sets, singular/plural naming is inconsistent at a first glance. I understand that it applies to the number of characters in a character set, but users of STLR might not understand it this way (like it happened to me). I think that communicating how many characters will be matched is more important than how many characters are in a set.

OysterKit.swift missing

I converted the code to use Swift 1.2, and also fixed the absolute path in the project to get things compiling.

One thing I'm not sure I can work around though, the iOS version of the OysterKit framework includes the file "OysterKit.swift", which is not in the project anywhere. I note that the Mac version of the framework has OKStandard.swift - is that a new name for the same file?

Bork example errors with preposition

I pulled the Bork repo, as well as my own, and when I run the example: "ATTACK SNAKE WITH SWORD" im getting the following error:

keyNotFound(CodingKeys(stringValue: "noun", intValue: nil), Swift.DecodingError.Context(codingPath: [CodingKeys(stringValue: "secondSubject", intValue: nil)], debugDescription: "No value associated with key CodingKeys(stringValue: \"noun\", intValue: nil) (\"noun\").", underlyingError: nil))
// Lexer

@pin verb         = "INVENTORY" | "GO" | "PICKUP" | "DROP" | "ATTACK"
@pin noun         = "NORTH" | "SOUTH" | "KITTEN" | "SNAKE" | "CLUB" | "SWORD"
@pin adjective    = "FLUFFY" | "ANGRY" | "DEAD"
@pin preposition  = "WITH" | "USING"

// Commands
subject = (adjective .whitespace)? noun
command = verb (.whitespace subject (.whitespace preposition .whitespace @token("secondSubject") subject)? )?

Just like your tutorial.

Thanks!

STLR: using singular term for predefined characters

Use singular for predefined sets (.decimalDigit, .letter vs. .decimalDigits, .letters)

ICU specifies character categories in singular terms (\p{Decimal}, \p{Letter}) with the quantifier being separate (+, *, ?, {n,m})

SLTR rule also uses modifiers (?,*,+) to specify quantity and seems more in line with a Regular Expression type of declaration. Plural term in the definition seems to imply multiple of given category, when it's only one of N. When you read the STLR

number = .decimalDigits

One could infer it to match the number "123" when in fact it will only match the "1"

Examples of the singular w/ modifiers:

digits = .decimalDigit+ // one or more of a decimal digit
ows = .whitespace? // optional whitespace char

Rework error handling

The new stack and rule system enables far more simple error handling, but the code is still littered with old special cases that are no longer required.

Before release of v1 this should be cleaned up to use simple hierarchical errors and all error handling removed from IRs and parsing strategies.

You must declare the name of the grammar before any other declarations (e.g. grammar <your-grammar-name>) from 47 to 47

I'm trying to build an stlr file for swift comments (taken from "The Swift Programming Language" book).

This is my grammar file: swift.stlr

grammar SwiftComments

whitespace = whitespace-item whitespace?

whitespaceItem = lineBreak | comment | multiline-comment |
				  "\u0000" | "\u0009" | "\u000B" | "\u000C" | "\u0020"

lineBreak = "\u000A" | "\u000D" | "\u000D\u000A"

comment = "//" commentText lineBreak
multilineComment = "/*" multilineCommentText "*/"

commentText = commentTextItem commentText?
commentTextItem = /[^\r\n]/

multilineCommentText = multitlineCommentTextItem multitlineCommentText?
multilineCommentTextItem = (>> !"/*" | !"*/") (multilineComment | commentTextItem)

When I run stlrc -g swift.stlr I get the error in issue title. Any pointers to where I got this wrong?

Bork tutorial broken

Hey there,

i am not sure if this is still maintained but i tried to follow the steps in the Bork tutorial and it seems to be broken.

  1. When defining the grammar stlrc breaks because the name of the grammar is not defined (missing in tutorial)
  2. When running stlrc with the defined Bork grammar, no output is provided (macOS 10.15.4, Swift 5.2.2)
  3. After editing the main.swift file to include the user input, compiling fails as the generated Bork.swift module does not include the parse method:

image

I hope this is still maintained! Thank you :)

Lost identifier annotations in dynamic language generation

In the dynamic language extension of the STLRIntermediateRepresentation.GrammarRule

fileprivate func rule(from grammar:STLRIntermediateRepresentation, 
                                  inContext context:GenerationContext, 
                                  creating token:Token? = nil, 
                                  annotations: RuleAnnotations)->Rule?

It's a bit of a mess really. There are two code paths one for LHR recursive rules and one for rules that aren't. In both cases if the expression for the grammar rule is just a single element that does generate a token the annotations are stripped from the new identifier (they either propagate into the expression or are lost).

I think the right solution is to make an identifier declaration rule different to others in that it is a rule with the token (the identifier) and the annotations on that declared identifier, but the MATCHER is taken from the expression's rule. You would have to be careful you don't end up with double tokens. If the token is transient then there isn't a problem.

GENERATED EXPRESSION'S RULE HAS A NON-TRANSIENT TOKEN OR ITS OWN ANNOTATIONS

WrappingRule(identifier, identifier annotations).matcher =
identifier(annotations).matcher = sequence(creating expression's token, annotated with the expression's annotations) of a single rule using the expressions matcher but stripped of its annotations and transient

GENERATED EXPRESSION'S RULE HAS A NON-TRANSIENT TOKEN

WrappingRule(identifier, identifier annotations).matcher = Expression's matcher

Cache compiled regular expressions

Where a regular expression is used to represent a terminal, at this stage those regular expressions are recompiled each time the rule is referenced in the generated Swift code.

Those regular expressions should be identified and lazily compiled, but only once, rather than being recompiled on each reference

Refactor token streaming

At the moment there is little difference in implementation (and therefore potentially expected memory/performance profiles as well as behavioural characteristics) for streams, homogenous and heterogenous ASTs.

These areas could be refactored to improve

  • Ease of consumption
  • Provided the expected benefits (lazy and therefore low memory consumption)

In order to do this the following should be provided:

  • Streaming should identify "trigger" tokens that will be forwarded, but no others. Consideration will have to be given to range management on the node stack (or a new equivalent) but nodes should not be created
  • There is an attempt to leverage commonality in behaviour between homogenous and heterogenous AST generation. This should be preserved but without the complexity of having to provide an IR and a constructor and still being left in a situation where a lot of casting has to be done

Using OysterKit for syntax highlight

I'm pretty new to this language creation thing using tools like STLR. I was wondering if OysterKit and STLR would be of any help in creating a simple editor with rudimentary syntax highlight (by recognizing language node types and providing location information in source string).

Scanner branch optimisation

Provide a terminal tree scanner rule that optimises branched terminal searches. This may need to be done as an optimisation,

STLR - Sub transient evaluation optimiser

When a transient token is encountered all children are disposed of. An optimiser could be created to ensure that those children are not created in the first place so that no performance penalty is incurred.

An optimiser could mark all children as void or transient (investigate as this will impact the preservation of ranges).

Local absolute path in project file

I don't think absolute paths are a good idea. Found this in the project file, trying to guess how to use OysterKit.

/Users/nhughes/Documents/Code/XCode/GitHub/OysterKit/Mac/OysterKit/../../Common/Framework;

Bork tutorial example always fails with obscure error

Hi, thanks for the great library!

When I go through the tutorial, I copy the example grammar exactly as given:

//
// A grammar for the Bork text-adventure
//


// Vocabulary
//
@pin verb        = "INVENTORY" | "GO" | "PICKUP" | "DROP" | "ATTACK"
@pin noun        = "NORTH" | "SOUTH" | "KITTEN" | "SNAKE" | "CLUB" | "SWORD"
@pin adjective   = "FLUFFY" | "ANGRY" | "DEAD"
@pin preposition = "WITH" | "USING"

// Commands
//
subject     = (adjective .whitespaces)? noun
command     = verb (.whitespaces subject (.whitespaces preposition .whitespaces subject)? )?

to a file Bork.stlr

After I test this grammar with swift run stlrc -g Bork.stlr I always get this error with any of the tutorial test strings:

Parsing failed: 
constructionFailed([])

The error doesn't give any info on what exactly failed. OysterKit code is from master branch.

Improve syntactic sugar for @transient and @void

At the moment the "-" suffix signifies consume (poorly defined). The following modifiers should be defined

"-" the generated token should be void
"~" the generated token should be transient

Streams are not lazy enough

Streams are currently using the AST node constructor which is maintaining a hierarchy of nodes. In reality streams should simply pass matched tokens as they are encountered.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.