liflab / beepbeep-3 Goto Github PK

View Code? Open in Web Editor NEW

21.0 7.0 17.0 41.89 MB

An event stream processor anyone can use

Home Page: http://liflab.github.io/beepbeep-3

License: GNU Lesser General Public License v3.0

Java 99.98% Shell 0.02%

beepbeep event-stream log-analysis cep stream-processing stream-processing-engine

beepbeep-3's Introduction

BeepBeep 3: an expressive query processor for event streams

BeepBeep is an event stream query engine. It can take as input various sources of events, pipe them through various processors, and produce various kinds of output streams from them. For more information about what is BeepBeep (including documentation, examples, etc.), please visit BeepBeep's website.

Using BeepBeep in a project

You can download the latest JAR file and place it in your classpath. Otherwise, you can declare BeepBeep as a dependency in your project.

Maven

<dependency>
  <groupId>io.github.liflab</groupId>
  <artifactId>beepbeep-3</artifactId>
  <version>0.11.1</version>
</dependency>

Ivy

<dependency org="io.github.liflab" name="beepbeep-3" rev="0.11.1"/>

Gradle

compileOnly group: 'io.github.liflab', name: 'beepbeep-3', version: '0.11.1'

Repository structure

The repository is separated across the following folders.

Core: main source files
CoreTest: test source files. You need to compile these files only if you want to run BeepBeep's unit tests.

Compiling the project contained in the present repository generates the file beepbeep-3.jar, which is the minimal file you need to run BeepBeep on your system.

Extensions

BeepBeep's engine contains very few processors. In typical use cases, these basic functionalities are extended by using one or more extra palettes, such as those found in the BeepBeep palette repository.

Compiling and Installing BeepBeep 3

First make sure you have the following installed:

The Java Development Kit (JDK) to compile. BeepBeep is developed to comply with Java version 8; it is probably safe to use any later version.
Ant to automate the compilation and build process

Although the project contains a file named pom.xml, it does not contain enough information to build from the sources (it only declares the project's name and dependencies). You must use Ant.

Download the sources for BeepBeep from GitHub or clone the repository using Git:

[email protected]:liflab/beepbeep-3.git

The repository is separated into multiple projects. Each of these projects has the same Ant build script that allows you to compile them (see below).

If the project you want to compile has dependencies, you can automatically download any libraries missing from your system by typing:

ant download-deps

This will put the missing JAR files in the dep folder in the project's root.

Compiling

Compile the sources by simply typing:

ant

This will produce a file called beepbeep-3.jar (or another library, depending on what you are compiling) in the folder. This file is runnable and stand-alone, or can be used as a library, so it can be moved around to the location of your choice.

In addition, the script generates in the doc folder the Javadoc documentation for using BeepBeep. To show documentation in Eclipse, right-click on the jar, click "Properties", then fill the Javadoc location.

Testing

BeepBeep can test itself by running:

ant test

Unit tests are run with jUnit; a detailed report of these tests in HTML format is availble in the folder tests/junit, which is automatically created. Code coverage is also computed with JaCoCo; a detailed report is available in the folder tests/coverage.

Coverity Scan

BeepBeep uses Coverity Scan for static analysis of its source code and defect detection. Instructions for using Coverity Scan locally are detailed here. In a nutshell, if Coverity Scan is installed, type the following:

cov-build --dir cov-int ant compile

(Make sure to clean up the directory first by launching ant clean, followed by ant download-deps.)

Developing BeepBeep using Eclipse

If you are using Eclipse to develop with BeepBeep, please refer to the dedicated tutorial Installing and Configuring in Eclipse, written by Jalves Nicacio.

In short:

Create a new empty workspace (preferably in a new, empty folder).
Create new projects for each of the folders Core, CoreTest, and optionally, any of the palette folders you with to develop. Note that these projects will not be located in the default location with respect to the workspace; you need to uncheck the "Use default location" option and fetch them manually.

Then, setup the build path for each project:

Core requires the Bullwinkle library (see above)
CoreTest depends on Core and requires the JUnit 4 library
Each of the palette folders depend on Core and require the JUnit 4 library
In addition, some of the palette projects may have other dependencies; please refer to their individual documentation

Warning

The BeepBeep project is under heavy development. The repository may be restructured, the API may change, and so on. This is R&D!

About the author

BeepBeep 3 was written by Sylvain Hallé, full professor at Université du Québec à Chicoutimi, Canada. Part of this work has been funded by the Canada Research Chair in Software Specification, Testing and Verification and the Natural Sciences and Engineering Research Council of Canada.

beepbeep-3's People

Contributors

Stargazers

Watchers

Forkers

phoenixxie leishi0622 qbetti vipup stepht6 fabiopetrillo blackpantheros hellois-barbosa riahtu team35mazda jalvesnicacio littlenag alapini alexisbedard mernst awesomeluffy timlouie01

beepbeep-3's Issues

Write more tests to increase coverage

We should reach 90%

page guide for Eclipse build not found

The page https://liflab.github.io/beepbeep-3/guide/building-eclipse.html is not found

A single instance of each constant

Constant objects should be immutable. Hence, when writing:

. . . new FunctionTree(IsGreaterThan.instance,
  new Constant(0),
  new Constant(0));

...two new constant objects will be created, while one could refer to a single instance both times.

New type of exception for Functions

The signature of a Function object should allow an exception to be thrown (such as a UndefinedException) when the function has no value to return for given arguments. (For example, when asked to fetch a field from a tuple that does not have such a field.) Currently, these functions silently return null, which may break things downstream for processors that do not expect null.

Restrictive constructor

We can use a function Function f = new **Bags.FilterElements();**

However htis function take only simple function like parameter: UnaryFunction<?, Boolean> but cannot allow to insert for example other function with arity 1:1 which is not extends of UnaryFunction like FunctionTree with only one StreamVariable.

You should add the following precondition in JML: /*@ requires f.getInputArity() == 1 @*/

StreamReader uses 100% CPU

The loop where the reader polls for incoming bytes is running too fast. It should work as this:

Poll for incoming bytes
If there are bytes, process them and poll immediately for new bytes
Otherwise, wait some reasonable amount of time (e.g. 0.25 sec) before polling again

Increase test coverage

Add `Doubler` and `Adder`

The first example in the book uses the Doubler processor, which is defined in the code example repository. As a result, somebody who just copy-pastes this example without cloning the whole example repo gets a compilation error.

We should just put Doubler and Adder in the core library, even though they are not strictly essential per se.

Fail-fast processors

Some processors (mainly of the ltl package) always return the same event once they reach some state. This is the case of:

All Boolean (not Troolean) processors once they reach a definite (true/false) value, including quantifiers
Any Moore machine that reaches a sink state

This fact should be detectable, to allow their instances to be cleaned up from memory (simply recalling their last value whenever needed). This would remove the need for an explicit cleanup function in the StateSlicer.

Add support for "last" event

In some cases, it may be desirable to call something only on the last event of a trace; we should add a signal to that effect

Implicit distribution of computation

Split parts of a query across sites. Implement inter-site communication through TCP sockets.

Create a minimax processor

For 2-player games

Python palette

Create a palette with a single class with a main, so that Py4j can be used. See: https://www.py4j.org/index.html

Write unit tests to check notifyEndOfTrace

Methods push and pull should throw exceptions

Use case: suppose you forgot to connect a processor to something else. When you call pull, your program will crash with a NullPointerException. This looks like there is a bug in BeepBeep, while in fact it is your fault. ;-)

A more elegant solution would be for push and pull to be able to throw exceptions (say, ProcessorException or something like that). Pullables and pushables should catch null pointers and the like, and wrap them into ProcessorExceptions that the user would be forced to deal with.

Retrieving CountDecimate last value when does not correspond to specified interval

Let's say we have this code :

QueueSource source = new QueueSource();
source.loop(false);
for(int i = 0; i < 500; i++) {
	source.addEvent(Integer.toString(i));
}

// Initializing Pump and FileWriter

Connector.connect(source, new CountDecimate(501), pump, writer);
pump.run();

Since the CountDecimate interval is superior to the number of events in the source (but it's also true when the number of events is not a strict multiple of the interval), it will only output the first event (or the event of the last interval).
It would be nice to have an option (or another class) to specify if we want the CountDecimate to output the LAST event (i.e. when there is nothing more to pull) even if it does not correspond to the interval.

Remove getImage() from GnuplotProcessor

This is taken care of by chaining GnuplotProcessor with a Caller for the gnuplot command instead, so this code is no longer necessary

Serialize/deserialize query state

The name says it all. None of the existing CEP tools have this feature.

Pass runtime type to functions that manipulate collections

http://stackoverflow.com/a/30754982

Inner class: Maps.Get

Exist an issue with Maps.Get inner class. In fact in some example you instanciate this class like that:

Function get = new Maps.Get("KeyString");

But a compilation error occurs because of Maps.Get constructor class. Indeed in ressource file the constructor is protected; you should just put the field in public

Relax parser rules

Surely we could get rid of some parentheses in obvious situations such as WHERE (a) = (b)

Create a "piano roll" processor

Replays recorded events with the same timing between events

Implement a Graph visitor

A visitor that performs a traversal of the complete processor chain (warning: the chain is not necessarily a tree, so we need graph traversal) and launches a callback exactly once for every distinct processor instance of the chain.

Possible usages: collect all the processors of a chain into a set, starting from any point in that chain

Replace CopyCrawler in GroupProcessor

Currently, the duplicate method of the GroupProcessor class uses an object called the CopyCrawler to crawl and pipe processors in the copy like in the original. Crawling could be replaced by merely going through the set of processors of the group (this set is known). There is no added value to a traversal of the processor graph (and this even caused a subtle bug, see a6d17d2.

TokenFeeder only returns first token if start delimiter is the empty string

Possible situation where you might want to do that: splitting a file according to CRLF

Maps.Get class get() methods

When we used Maps.Get class, if we want to get values of a map (thanks to get methods) we can just put a String in this method.

Indeed when we created a Map we can use an Object for KEYS and VALUES but for KEYS we can just get with String Object.

I think you should just replace: protected Get(String key) { ... } by protected Get(Object key) { ... }

Running variance

https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

Create a fork grammatical construct

Allow an expression to refer multiple times to the same trace. For example:

FORK (some expression) AS somename IN (some other expression).

Here, all occurrences of somename in the last parenthesis are connected to a fork on the first parenthesis.

Constructor Maps class

I find an issue in Maps.Get class an inner class of Maps. Indeed when I create a new Object Maps.Get I see the following error:
The constructor Maps.Get(String) is not visible
In fact, Maps.Get constructor is protected.

Pack ignores duplication with state

Lists.Pack's method duplicate ignores the with_state parameter

Bundle Bullwinkle into JAR

This way, the resulting JAR will be completely self-contained and not require any other library in the classpath

Benchmark energy consumption of BeepBeep and others

Using this: http://kliu20.github.io/jRAPL/

Subtraction fails for large numbers

Applying a numeric function (at least subtraction, possibly others) to numbers large enough to overflow an int will fail. The following code should output 10.0 but actually outputs 0.0.

import ca.uqac.lif.cep.util.Numbers;
import ca.uqac.lif.cep.functions.Function;

public class TestDiff {
	public static void main(String[] args) {
		Function negation = Numbers.subtraction;
		Object[] out = new Object[1];
		negation.evaluate(new Object[]{1590785415514L, 1590785415504L}, out);
		System.out.println("The return value of the function is: " + out[0]);
	}
}

Change signature of `compute()` to reuse queue objects

The signature of the compute() method in SingleProcessor is:

public Queue<Object[]> compute(Object[] inputs)

It forces the implementer to create one new instance of a Queue object on every call, only for it to be destroyed very shortly afterwards. We should change the signature of the method to:

public boolean compute(Object[] inputs, Queue<Object[]> outputs)

This would allow SingleProcessor to create a single instance of queue, and always pass it to the method. We save one constructor call (and the corresponding malloc) for every input event handled by every processor.

The Boolean return type would be used to replace the return null that the method currently uses to signal that no new output event will ever be produced.

Remove dependency on Commons CLI

The latest version of Bullwinkle already contains a CLI parser; use it and drop the requirement on Commons CLI

Write unit tests to check duplication

Not all processors are appropriately tested for their behaviour when duplicate is called. More unit tests should be written to check that they all work as intended.

Create a bridge to use R

http://rforge.net/JRI/

Get rid of `objectfactory` package

...and take out everything that relates to this package in other classes. This was the embryo of a yet-to-be-developed GUI editor for processor chains. This will be done instead with the Spiegel library, which is still under development.

Type checking for generic processors?

Most processors in the tmf package return Variant as their input and output type. Indeed, the event type of a Fork is determined by the output type of the source it is connected to: there can be a fork of integers, or a fork of strings, etc.

However, this means that type checking is disabled for these processors. So, one can connect a queue source of integers to a fork, and one of the outputs of that fork to the Negation processor, and this will not raise an exception, as the fork is "type-agnostic".

One possibility would be to create a Typable interface, which would allow a processor to be told what are its input and output types. The connect() method would check if one of the processors implements Typable, and if so, call its setType() method and give it the other processor's type.

Not sure if the benefit of a more precise runtime type checking is worth all this machinery (and the extra memory involved in each such processor just to remember types).

Create a stack-safe build() method

Rather than leave the user pop stuff from the parse stack by himself, build() should directly pass the objects extracted from the stack, and push the resulting processor. So if I have a rule like this:

<processor> := BLA <processor> FOO <number> BAR

The parser would look at the rule, pop 5 objects from the parse stack, ignore terminal symbols, and call the processor's build() method passing only the two remaining objects (a processor and a number). The signature of build() would be:

Object build(Object ... arguments)

build returns an object that the parser will put back on the stack, or `null' if nothing should be put on the stack.

This way, a user-defined build() method never manipulates the parse stack directly, so that it can't mess up with it (e.g. by popping/pushing the wrong number of elements).

Handle end of trace in Push mode

Some processors operate on finite traces and may want to know when the end has been reached.

Make static references to Pullables and Pushables

For objects that implement their own Pullables and Pushables (e.g. SingleProcessor, SmarFork, etc.) a new instance of pullable or pushable is created every time the getPullable/getPushable is called. Yet all instances for the same input or output n are identical at any point in time (they have no internal state).

The Processor class should take care of keeping in memory instances of pullables/pushables that are already created, and to just pass the reference to an existing pullable/pushable when it is asked again.

Load extensions dynamically

The interpreter should be able to look into the classpath (or into a specified folder for JARS?) and load any grammar extension it finds there. Thus one could dynamically extend the grammar by simply copying extension JARs somewhere.

Optimize window processor

Rather than use a queue to messages from the window, use a fixed-width array and implement a circular buffer.

Issue in code (GitHub)

On several occasions you used UtilityMethods class. But this class is not defined in BeepBeep package, you probably forgot to export this class on the package.

I realized this on page 22 with the UtilityMethods.pause() method which appeared to be undefined.

Check Slicer when sorting function returns null

Seems to crash right now...

DRY in grammar extensions

DRY = Don't Repeat Yourself

The names of nonterminals in a grammar extension could be assumed by default to be the class names in the current package; if this is the case, you wouldn't need to create an association in the associations.txt file.

Example: my extension (in my.package) contains a class MyClass. If I use <MyClass> in a grammar rule, this would imply I mean my.package.MyClass, without having to write

<MyClass>,my.package.MyClass

in associations.txt.

FileWriter throws NullPointerException in append mode

Following code throws NullPointerException :

QueueSource source = new QueueSource();
source.loop(false);
for(int i = 0; i < 42; i++) {
    source.addEvent(Integer.toString(i));
}

Pump pump = new Pump();
FileWriter writer = new FileWriter(new File("./Example/test.txt"), true);

Connector.connect(source, pump, writer);

pump.run();

But works correctly when append mode is disabled :
FileWriter writer = new FileWriter(new File("./Example/test.txt"), false);

The problem seems to come from FileWriter#append where m_outStream is not defined, whereas m_outStream is well defined in FileWriter#overwrite which is only called when m_append == false.

Iterator.next should throw NoSuchElementException

The Iterator.next() method is specified to throw NoSuchElementException if the iteration has no more elements. It is not supposed to return null in that situation.

Pullable.next is a subclass of Iterator but violates its specification:

  /**
   * Synonym of {@link #pull()}.
   * 
   * @return An event, or <code>null</code> if none could be retrieved
   */
  @Override
  public @Nullable Object next();

It would be helpful to clients if BeepBeep adheres to the iterator specification.
Or, is there a reason it does not do so?

Update Ant build script to use Markdown doclet

This one: https://github.com/Abnaxos/markdown-doclet