bentonam / fakeit Goto Github PK

View Code? Open in Web Editor NEW

86.0 86.0 21.0 8.28 MB

Generates JSON documents based on models defined in YAML and adds them to a Couchbase Bucket

License: MIT License

JavaScript 98.97% Makefile 1.03%

fakeit's People

Contributors

Stargazers

Watchers

Forkers

nodoherty mistersender agyryk kerrieclark831 brantburnett alexramey clupo ravirajalingam felipeauol muneebmaster ebroderick abhimanyug phanirajl osdamv facetdigital jns87

fakeit's Issues

Add local web server capability

It would be extremely helpful to create a web interface for the generated fake data.

The first call might have to be an init call specifying the modelset to use, then subsequent calls could provide the same fake data based on REST requests.

This enhancement could be as simple as simple GET requests for the data that was generated, or as complex as a fully capable CRUD manager, where you can create new data objects, with the schema enforced by the model, modify and delete objects as well.

It could alternatively integrate with couchbase after the init call, simply serving as a proxy for a couchbase DB spun up based on the provided dataset.

Javascript is painful to work with

I am making a lot of use of the custom javascript in just about every area it is possible to use it. It's been very challenging to work with for a few reasons:

no apparent way to console.log
no good/helpful error handling
comment support is intermittent at best. I have had some builds fail silently because a comment was in the javascript, or on a separate line in the javascript as opposed to tacked on to the end. Or sometimes, i get an error in the javascript that comes down to a comment, but there is helpful information to figure this out other than just removing lines until it stops breaking.
syntax is picky. sometimes it's upset about semicolons existing, sometimes not. Sometimes it seems ok with es6, sometimes not, but either way, it fails awkwardly, or doesn't fail at all

Allow the number of generated documents to specified from the command line

This should override the setting from the model.

Expose the document_index to the build functions

Provide the current document index to the build functions, this would be useful for generating a fixed number of documents from an input such as CSV. This way a counter variable would not have to be created and attached to globals

Allow input to be a URL

It would be nice if the input could point to one or more URLs, where the data is downloaded, parsed, and made available as an input.

Add support to output generated data directly to the console

Better Error Messaging for Build Function Syntax errors

Not all error messages are being properly displayed, and when they are it is very ambiguous as to what exactly is causing the error

Make sure type:boolean is treated correctly

Make sure that booleans are treated as such. The default value should be false and the value should be correctly typed after generation

Automatically resolve dependencies and inputs

Currently this is how it works.

fakeit \
  --models 'models/airlines.yaml,models/countries.yaml,models/regions.yaml,models/users.yaml,models/airline_reviews.yaml' \
  --input 'input/airlines.csv,input/countries.csv,input/regions.csv' \
  --exclude 'Countries,Regions,Users,Airlines'

Here's the table break down

Model	Other model dependencies	required inputs	Ouput
`models/airlines.yaml`	`models/countries.yaml`	`input/airlines.csv`	❌
`models/countries.yaml`	-	`input/countries.csv`	❌
`models/regions.yaml`	-	`input/regions.csv`	❌
`models/users.yaml`	`models/regions.yaml`	-	❌
`models/airline_reviews.yaml`	`models/airlines.yaml`, `models/users.yaml`	-	✅

It's awesome that we can generate a crap ton of data easily, but not awesome that I as a developer have to remember each model that is ultimately required and each input that all those required models need to function correctly. This is something that should be done for the user without them having to do it.

If a file requires an input of data then it needs to be specified in the options for that model, and we should resolve it automagically for them.
Also if a model requires other models those should also be resolved automatically instead of having to pass in data.

Doing these two things will reduce the total amount of options you have to pass to generate data dramatically.

inputs
dependencies

Definitions within Definitions

It would be great to be able to use definitions from within other definitions. I have some extremely complex data models that would benefit from being able to break definitions down more, but it doesn't appear that one can use a definition from inside of another. I attached a simple yaml file that demonstrates this-- email can be included from the primary document, but not from within the contacts definition (returns null).

name: Test
type: object
key: _id
data:
  dependencies:
properties:
  _id:
    type: string
    data:
      build: "return 'test-' + chance.guid();"
  emails:
    type: array
    description: An array of emails for the user
    items:
      $ref: '#/definitions/Email'
      data:
        min: 1
        max: 3
  contact:
    type: array
    description: An array of contact info for the user
    items:
      $ref: '#/definitions/Contact'
      data:
        min: 1
        max: 3
  name:
    type: string
    data:
      value: "Some name"
definitions:
  Email:
    type: object
    properties:
      type:
        type: string
        description: The phone type
        data:
          build: "return faker.random.arrayElement(globals.email_types);"
      email_address:
        type: string
        description: The email address
        data:
          build: "return faker.internet.email()"
      primary:
        type: boolean
        description: If the email address is the primary email address or not
        data:
          value: false
  Contact:
    type: object
    properties:
      address:
        type: "string"
        data:
          value: "123 test st"
      emails:
        type: array
        description: An array of emails for the user
        items:
          $ref: '#/definitions/Email'
          data:
            min: 1
            max: 3

Add Continue on Error Option

Allow for key to also be built and not just referenced

The key can only be a reference to a property, it would be nice for this to be a able to be defined by a build function

Remove current_document from build functions and assign as `this`

post_build steps not being run in referenced array item object definitions

When defining array items by a referenced definition, the post_build step does not get executed. The following is an example with output:

Array Property Definition:

tags:
  type: array
  description: An array of tags describing the prospect and their interests
  items:
    $ref: '#/definitions/Tags'
    data:
      min: 0
      max: 10

Items Definition

Tags:
  type: string
  data:
    pre_build: "console.log('pre');"
    build: >
      console.log('build');
      return chance.word();
    post_build: "console.log('post');"

Run command
fakeit -m prospects.yaml -i input/ -n 5 -d output/

Output
Generating 5 documents for Prospects model pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build

Add support for an `--input` option to make data available to the generator

General cleanup

Remove several dependencies that should only be installed for development.
Update outdated packages
Switch to use babel-preset-latest to avoid having to install so many babel plugins.
Add support for yarn if it's available.

Add a `-t` option for database timeout settings

Add glob support

Add support for globs for models and inputs

Add Verbose Option

Only output the models being generated and their counts if verbose is set

Add a way to debug the project

Currently the project has // console.log('some message') throughout most of the functions. This isn't to useful since you have to go through the code and uncomment these instances to see these messages.

We need to add the debug library to make it easy to debug problems that may come up.

data blocks not being run on definitions

If a referenced definition contains a pre_build/post_build/build block, the property containing the reference will generate no data. This can be seen in the contacts example as the emails array will always generate empty. Removing the data block and any dependent properties will result in data being generated via the properties block.

Create Documentation

Need to create documentation on what the generator does and how to use it

Restructure the way the JS is being run

This will allow for unit testing on every file in this app.

It will also update the way the data is being generated to make it a little bit easier to track down bugs and add new features by not using so many globals and separating out variables that are specific to the different functionalities of this app.

Add support for Elasticsearch

Allow Elasticsearch to be a destination

Remove current_value from build functions

Generate Models based on Bucket Analysis from Couchbase

Add New Version Available Alert

CSV Input is not auto-detecting columns

When using CSV as an input it should auto-detect the heads to create an array of objects

Add ability for models to be generated but excluded from output

Add documentation

The documentation will be done using the docs library.

The following files need to be document

We also should create a wiki on here or better yet build a site that people can see the output as they edit it a file.

Use helper libraries to reduce the amount of redundant code

There're several instances of code that could be reduced by using other libraries that already take care of these things to make the code more readable and make's it so we don't have to write as many unit tests.

fs-extra-promisify
This will handle reading and writing files, and ensuring directories exist. Which would remove the need for several different util functions spread out through the different files.

async-array-methods
This will remove all instances of code like this

let promises = []
for (let item of items) {
 promises.push(somePromiseFunction(item))
}
promises = await promises

and change it to be

import { map } from 'async-array-methods'
...
items = await map(something, somePromiseFunction)

es6-promisify
This will reduce the amount of code that's like this that is just converting a callback style function into a promise style function.

import cson from 'cson';

// parses a cson string
function parseCson(content) {
  return new Promise((resolve, reject) => {
    // console.log('input.load_cson_file');
    cson.parse(content, (err, result) => {
      if (err) {
        reject(err);
      } else {
        resolve(result);
      }
    });
  });
}

Would change to be used like this.

import promisify from 'es6-promisify'
import cson from 'cson';
cson.parse = promisify(cson.parse)

to-js
This would provide better type checking and converting of variables. Because there's typeof [] won't return 'array' it will return object
lodash
This provides an easy way to create nested objects and get nested items from an object using _.get and _.set as well as other useful functions.

Create Output Destination Directories if they do not exist

Ensure model defaults are set

For the `-n` argument only apply that number to models that are not being excluded

The -n argument should not apply to models that are being excluded.

fakeit -m models/users.yaml,models/regions.yaml,models/countries.yaml -e Regions,Countries -n 1

This should generated n number of countries and regions and just 1 user document

Add support for cson output

Add a loader for the cli

This will display the total documents that have been created and how many are left. There are several different ways of going about this so we just need to choose 1.

Make sure that generated CSV files can be archived

Add support for Output Destination Directory

Introduce a -d [path] option to specify a output destination directory, if not specified the current working directory is used

Allow to be called outside of command-line

Generate CSV files based on Model name

In many instances there are multiple models, and with the output of CSV each model should be represented by its own CSV file

Look into allowing models to be written in other filetypes

At least add js as an output type which will make it easier to write javascript functions.

Handle case when the `--models` option is a directory and not a comma-delimited list of files

Add unit tests, and continuous integration

Since this library has grown to accommodate several different functionalities there needs to be unit tests as well as comparison tests. Comparison tests are to ensure that things are outputting correctly but this will be a little tricky since we're dealing with random data so some sort of testing helper will need to be created for these.

This will be done using ava, and ava-spec.
The continuous integration will be done with travis ci.