Giter VIP home page Giter VIP logo

finraos / datagenerator Goto Github PK

View Code? Open in Web Editor NEW
162.0 162.0 169.0 59.1 MB

DataGenerator is a Java library for systematically producing large volumes of data. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets.

Home Page: http://finraos.github.io/DataGenerator

License: Apache License 2.0

Java 64.79% Scala 35.21%

datagenerator's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datagenerator's Issues

Code cleanup

Fix all checkstyle issues. Switch checkstyle to break the build on errors.

We need to build set of standard data equivalence class generators

We have common data types, like:

  • Date
  • State
  • Country
  • Currency
  • Market symbol
  • Short random text
  • Long random text (with spaces and so on)
  • Phone
  • Address
  • zip code
  • First Name, Last Name
  • SSN
    ...

So, if it will be possible to use it inside xml configuration it will be really useful. Now it's necessary to create custom solutions.

We need standard way of postprocess data

One customer want to generate json data (to use it as mocks in testing his system)...

Maybe we can provide easy way to:

  • export as json
  • export as xml
  • export to rational DB
  • ...

Split into multiple modules

Split into multiple modules:

  1. The main DG library
  2. Samples depending on the DG library
  3. Main pom governing both. Make sure checkstyle still works for the new configuration.

We need result analyser

Now we need to analysis / verify results manually.
It's easy to make a mistake in xml script and generate little bit different data set from desire one.

So, if we will have some kind of analysis tool, we will be able to analysis generated data and easy verify it.

Also we can use it for analysis generated data and external/original data to see difference.

Negative scenarios

Features that allow DataGenerator to insert negative scenarios in the generated data set.

Jetty's data distribution

Allow the main job to fire a jetty instance that responds to requests from the tasks and distribute computations during execution.

Ability define expressions and calculations on nodes and edges.

R1.4.8: Ability define expressions and calculations on nodes and edges. Expressions should be able to reference variable types described in R1.4.2 to R1.4.5.

R1.4.8.1: Simple mathematical expressions. (Add, Subtract, Divide, Multiply)
R1.4.8.2: Simple string manipulation. (Concatenate, Replace, Trim, Sub String, RegEx replace)

https://github.com/FINRAOS/DataGenerator/wiki/DataGenerator-V2-Requirements#r14-common-rules-for-r11-r12-and-r13

GUI editor

A locally running jetty can share a GUI (jenkins style) that can be used to edit state charts and save them locally, possibly allowing user collaboration.

SearchWorker Race Condition

In simple models, the first SearchWorker finishes its DFS and sets the exit flag before subsequent SearchWorkers can enqueue results.

Allow DataGenerator to Generate Data based off of another Model's dataset

I don't think we've solved this one yet:

Suppose we wanted to generate a set of accounts, which contains some identifying information that is tied to that account (like an owner's name, birthday etc,.).

We now want generate a set of transactions that tie the transactions to each account that we generated above. As it stands right now, we need to generate the set of accounts and generate the transactions using a custom consumer that parses through the account file.

Would like to see if it's possible to eliminate the need for consumers to handle parsing previously generated datasets, and move towards generating the two data sets side by side (if feasable)
#55

Ability define a database queries and simple flat files(R1.4.9.2) as a way to define variable assignments.

R1.4.9: Ability define a database queries(R1.4.9.1) and simple flat files(R1.4.9.2) as a way to define variable assignments. Look at the diagram below; the specification on the left is identical to the specification on the right. The only difference is the specification on the right is using a SQL query to define its dataset. Note the SQL can be replaced with a file URL which would contain the table/joined data in a CSV format.

https://github.com/FINRAOS/DataGenerator/wiki/DataGenerator-V2-Requirements#r14-common-rules-for-r11-r12-and-r13

R1.4.12: Ability define conditionals on nodes and edges which would effectively enable selective traversal dependent on the condition evaluating to true.

R1.4.12: Ability define conditionals on nodes and edges which would effectively enable selective traversal dependent on the condition evaluating to true.

Note: This means if a conditional fails on a node or an edge there will be no further traversal till the end node. The variables already set in the path traversed will be either considered or not considered as a valid scenario set depending on the global job configuration variable set.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.