Giter VIP home page Giter VIP logo

rosie-pattern-language's Introduction

./CHANGELOG Build Status

Rosie Pattern Language (RPL)

RPL is a variant of modern Regular Expressions (regex) that is designed to scale to big data, many developers, and large collections of patterns. If you use regex, you already know a lot of RPL. Additional features over regex found in RPL:

  • Looks like a programming language, and plays well with development tools
  • Comes with a library of dozens of useful patterns (timestamps, network addresses, and more)
  • Has development tools: tracing, REPL, color-coded match output
  • Produces JSON output (and other formats)
Red: network; Red underlined: ipv6 specifically; Blue: date/time; Cyan: identifier; Yellow: word

Contents

See also:

Features

  • Small: the Rosie compiler/runtime/libraries take up less than 600KB of disk
  • Good performance: faster than Grok, slower than grep, does more than both of them
  • Extensible pattern library
  • Rosie is fluent in UTF-8, ASCII, and the binary language of moisture vaporators (arbitrary byte-encoded data)

Building

Platforms: (most of these were tested with docker)

Prerequisites: git, make, gcc, readline (readline-common), readline-devel (libreadline-dev)

To install Rosie, clone this repository and cd rosie-pattern-language (which we will call the build directory). Then:

  1. make
  2. make install (optional)

After make, you can run Rosie from the build directory using bin/rosie. Running make install creates a separate installation directory, by default in /usr/local. The executable is /usr/local/bin/rosie, and the other needed files can be found in /usr/local/lib/rosie/.

Using the CLI

Examples forthcoming

The CLI man page and an html version are available. A markdown version is forthcoming.

Using the REPL

Examples forthcoming

See the REPL documentation.

Using the API

Use Rosie in your own programs! Until this section is complete, see the high-level notes in this section below.

Examples forthcoming

To be written:

  • Language coverage
  • Building librosie
  • Full api documentation

Project extras

Project roadmap

Releases

  • Change to semantic versioning
  • v1.0.0-alpha release
  • v1.0.0-beta release
  • v1.0.0 release

Installation

  • Brew installer for OS X
  • RPM and debian packages

API and language support

  • API (C)
  • Python module
  • C, Go modules
  • Ruby, node.js modules

Packages

  • Dependency tool to identify dependencies of a set of packages, and to make it easy to upload/download those dependencies.
  • Source code parsing patterns (based on work done at NCSU, Raleigh, NC USA)
  • Log file parsing patterns (based on published examples and new contributions)

Features

  • Unicode character classes
  • Support JSON output for trace, config, list, and other commands
  • Customize color assignments
  • Customize initial environment
  • Generate patterns automatically from locale data
  • Linter
  • Toolkit for user-developed macros
  • Toolkit for user-developed output encoders
  • Compiler optimizations

Contributing

Write new patterns!

We are happy to add more patterns to the initial library we've started in the rpl directory, whether they build on what we have or are entirely new.

Calling Rosie from Go, Python, node.js, Ruby, Java, or ...?

Rosie is available as a C library that is callable from these languages. There are sample programs that demonstrate it, and these could be improved by turning them into proper libraries, one for each target language.

If you're a Python hacker, we could use your help turning our sample librosie client into a Python module. Same for the other languages.

And since librosie is built on libffi, it's pretty easy to access Rosie from other languages. This is another great area to make a contribution to the project.

Wanted: new tools

Because RPL is designed like a programming language (and it has an accessible parser, rpl_1_1.rpl, new tools are relatively easy to write. Here are some ideas:

  • Package doc: Given a package name, display the exported pattern names and, for each, a summary of the strings accepted and rejected.

  • Improved trace: The current trace output could be improved, particularly to make it more compact. A trace is represented internally as a table which could easily be rendered as JSON. And since this data structure represents a complete trace, it is the right input to a new algorithm that produces a compact summary. Or an animated output.

  • Linter: Users of most programming languages are aided by a linting tool, in part because of correct expressions that are not, in fact, what the programmer wanted. For example, the character set [_-.] is a range in RPL, but it is an empty range. Probably the author meant to write a set of 3 characters, like [._-].

  • Notebook: A Rosie kernel for a notebook would be useful to many people. So would adding Rosie capabilities to a general notebook environment (e.g. Jupyter).

  • Pattern generators: A number of techniques hold promise for automatically generating RPL patterns, for example:

    • Convert a format string to pattern, e.g. a printf format string, or the posix locale structure's fields that specify how to format numbers, dates/times, and monetary amounts.
    • Infer the format of each field in a CSV (or JSON, HTML, XML) file using analytics techniques such as statistics and machine learning.
    • Convert a regular expression to an RPL pattern.

Acknowledgements

In addition to the people listed in the CONTRIBUTORS file, we wish to thank:

  • Roberto Ierusalimschy, Waldemar Celes, and Luiz Henrique de Figueiredo, the creators of the Lua language (MIT License); and again Roberto, for his lpeg library (MIT License), which has been critical to implementing Rosie.

  • The Lua community (at large);

  • Mark Pulford, the author of lua-cjson (MIT License);

  • Brian Nash, the author of lua-readline (MIT License);

  • Peter Melnichenko, the author of argparse (MIT License);

Other sources

Rosie on IBM developerWorks Open:

For an introduction to Rosie and explanations of the key concepts, see Rosie's raison d'etre.

Rosie's internal components, as well as the utilities needed to build Rosie are listed here.

I wrote some notes on Rosie's design and theoretical foundation for my fellow PL and CS enthusiasts.

rosie-pattern-language's People

Contributors

jamiejennings avatar pkulchenko avatar subzidion avatar veratil avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.