bentonam / fakeit Goto Github PK
View Code? Open in Web Editor NEWGenerates JSON documents based on models defined in YAML and adds them to a Couchbase Bucket
License: MIT License
Generates JSON documents based on models defined in YAML and adds them to a Couchbase Bucket
License: MIT License
It would be extremely helpful to create a web interface for the generated fake data.
The first call might have to be an init call specifying the modelset to use, then subsequent calls could provide the same fake data based on REST requests.
This enhancement could be as simple as simple GET requests for the data that was generated, or as complex as a fully capable CRUD manager, where you can create new data objects, with the schema enforced by the model, modify and delete objects as well.
It could alternatively integrate with couchbase after the init call, simply serving as a proxy for a couchbase DB spun up based on the provided dataset.
I am making a lot of use of the custom javascript in just about every area it is possible to use it. It's been very challenging to work with for a few reasons:
console.log
This should override the setting from the model.
Provide the current document index to the build functions, this would be useful for generating a fixed number of documents from an input such as CSV. This way a counter variable would not have to be created and attached to globals
It would be nice if the input could point to one or more URLs, where the data is downloaded, parsed, and made available as an input.
Not all error messages are being properly displayed, and when they are it is very ambiguous as to what exactly is causing the error
Make sure that booleans are treated as such. The default value should be false
and the value should be correctly typed after generation
Currently this is how it works.
fakeit \
--models 'models/airlines.yaml,models/countries.yaml,models/regions.yaml,models/users.yaml,models/airline_reviews.yaml' \
--input 'input/airlines.csv,input/countries.csv,input/regions.csv' \
--exclude 'Countries,Regions,Users,Airlines'
Here's the table break down
Model | Other model dependencies | required inputs | Ouput |
---|---|---|---|
models/airlines.yaml |
models/countries.yaml |
input/airlines.csv |
❌ |
models/countries.yaml |
- | input/countries.csv |
❌ |
models/regions.yaml |
- | input/regions.csv |
❌ |
models/users.yaml |
models/regions.yaml |
- | ❌ |
models/airline_reviews.yaml |
models/airlines.yaml , models/users.yaml |
- | ✅ |
It's awesome that we can generate a crap ton of data easily, but not awesome that I as a developer have to remember each model that is ultimately required and each input that all those required models need to function correctly. This is something that should be done for the user without them having to do it.
If a file requires an input of data then it needs to be specified in the options for that model, and we should resolve it automagically for them.
Also if a model requires other models those should also be resolved automatically instead of having to pass in data.
Doing these two things will reduce the total amount of options you have to pass to generate data dramatically.
It would be great to be able to use definitions from within other definitions. I have some extremely complex data models that would benefit from being able to break definitions down more, but it doesn't appear that one can use a definition from inside of another. I attached a simple yaml file that demonstrates this-- email
can be included from the primary document, but not from within the contacts
definition (returns null
).
name: Test
type: object
key: _id
data:
dependencies:
properties:
_id:
type: string
data:
build: "return 'test-' + chance.guid();"
emails:
type: array
description: An array of emails for the user
items:
$ref: '#/definitions/Email'
data:
min: 1
max: 3
contact:
type: array
description: An array of contact info for the user
items:
$ref: '#/definitions/Contact'
data:
min: 1
max: 3
name:
type: string
data:
value: "Some name"
definitions:
Email:
type: object
properties:
type:
type: string
description: The phone type
data:
build: "return faker.random.arrayElement(globals.email_types);"
email_address:
type: string
description: The email address
data:
build: "return faker.internet.email()"
primary:
type: boolean
description: If the email address is the primary email address or not
data:
value: false
Contact:
type: object
properties:
address:
type: "string"
data:
value: "123 test st"
emails:
type: array
description: An array of emails for the user
items:
$ref: '#/definitions/Email'
data:
min: 1
max: 3
The key
can only be a reference to a property, it would be nice for this to be a able to be defined by a build function
When defining array items by a referenced definition, the post_build step does not get executed. The following is an example with output:
Array Property Definition:
tags:
type: array
description: An array of tags describing the prospect and their interests
items:
$ref: '#/definitions/Tags'
data:
min: 0
max: 10
Items Definition
Tags:
type: string
data:
pre_build: "console.log('pre');"
build: >
console.log('build');
return chance.word();
post_build: "console.log('post');"
Run command
fakeit -m prospects.yaml -i input/ -n 5 -d output/
Output
Generating 5 documents for Prospects model pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build pre build
babel-preset-latest
to avoid having to install so many babel plugins.Add support for globs for models and inputs
Only output the models being generated and their counts if verbose is set
Currently the project has // console.log('some message')
throughout most of the functions. This isn't to useful since you have to go through the code and uncomment these instances to see these messages.
We need to add the debug
library to make it easy to debug problems that may come up.
If a referenced definition contains a pre_build/post_build/build block, the property containing the reference will generate no data. This can be seen in the contacts example as the emails array will always generate empty. Removing the data block and any dependent properties will result in data being generated via the properties block.
Need to create documentation on what the generator does and how to use it
This will allow for unit testing on every file in this app.
It will also update the way the data is being generated to make it a little bit easier to track down bugs and add new features by not using so many globals and separating out variables that are specific to the different functionalities of this app.
Allow Elasticsearch to be a destination
When using CSV as an input it should auto-detect the heads to create an array of objects
The documentation will be done using the docs library.
The following files need to be document
We also should create a wiki on here or better yet build a site that people can see the output as they edit it a file.
There're several instances of code that could be reduced by using other libraries that already take care of these things to make the code more readable and make's it so we don't have to write as many unit tests.
fs-extra-promisify
This will handle reading and writing files, and ensuring directories exist. Which would remove the need for several different util functions spread out through the different files.
async-array-methods
This will remove all instances of code like this
let promises = []
for (let item of items) {
promises.push(somePromiseFunction(item))
}
promises = await promises
and change it to be
import { map } from 'async-array-methods'
...
items = await map(something, somePromiseFunction)
es6-promisify
This will reduce the amount of code that's like this that is just converting a callback style function into a promise style function.
import cson from 'cson';
// parses a cson string
function parseCson(content) {
return new Promise((resolve, reject) => {
// console.log('input.load_cson_file');
cson.parse(content, (err, result) => {
if (err) {
reject(err);
} else {
resolve(result);
}
});
});
}
Would change to be used like this.
import promisify from 'es6-promisify'
import cson from 'cson';
cson.parse = promisify(cson.parse)
to-js
This would provide better type checking and converting of variables. Because there's typeof []
won't return 'array'
it will return object
lodash
This provides an easy way to create nested objects and get nested items from an object using _.get
and _.set
as well as other useful functions.
The -n
argument should not apply to models that are being excluded.
fakeit -m models/users.yaml,models/regions.yaml,models/countries.yaml -e Regions,Countries -n 1
This should generated n number of countries and regions and just 1 user document
This will display the total documents that have been created and how many are left. There are several different ways of going about this so we just need to choose 1.
Introduce a -d [path]
option to specify a output destination directory, if not specified the current working directory is used
In many instances there are multiple models, and with the output of CSV each model should be represented by its own CSV file
At least add js
as an output type which will make it easier to write javascript functions.
Since this library has grown to accommodate several different functionalities there needs to be unit tests as well as comparison tests. Comparison tests are to ensure that things are outputting correctly but this will be a little tricky since we're dealing with random data so some sort of testing helper will need to be created for these.
This will be done using ava
, and ava-spec
.
The continuous integration will be done with travis ci.
The type:
value should be optional for properties
It will be extremely valuable to allow generated data to be sent directly to a couchbase instance, instead of just generating files or archives
I have several locations, mostly in the pre-build and post-build where it would be great to be able to reference an external js file containing my javascript logic, instead of placing it inline.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.