viacominc / data-point Goto Github PK

View Code? Open in Web Editor NEW

67.0 67.0 33.0 3.13 MB

JavaScript Utility for collecting, processing and transforming data

License: Apache License 2.0

JavaScript 97.86% HTML 2.05% Shell 0.09%

aggregate reduce transform

data-point's People

Contributors

Stargazers

Watchers

data-point's Issues

WIP accumulator.resolve api

SPEC IS WORK IN PROGRESS

This feature would allow a reducer to resolve a transform at different levels. This means a transformation can be prematurely exited with a given value or it can be resolved to a new entity id.

API:

Resolve the current Transform with:

given value: accumulator.resolveTransformWith(value:*)
given entityId: accumulator.resolveTransformTo(entityId:string)

Implementation examples:

Using accumulator.resolveTransformWith to resolve a transform with a given value.

dataPoint.transform([
    ()=> 'foo',
    (acc) => acc.resolveTransformWith('bar')
    (acc) => 'will not execute'
  ])
  .then( acc => console.log(acc.value) )
  // should output: 'bar'

Using accumulator.resolveTransformTo to resolve a transform to another transform.

dataPoint.addEntities({
  'transform:foo': () => 'foo',
  'transform:bar': () => 'bar'
})

dataPoint.transform([
    ()=> 'useFoo',
    (acc) => {
       acc.value === 'useFoo'
         ? acc.resolveTransformTo('transform:foo')
         : acc.resolveTransformTo('transform:bar')
    }
  ])
  .then( acc => console.log(acc.value) )
  // should output: 'foo'

Add shorthand for pass through values in reducer objects

Problem description:

Sometimes in reducer objects, we want to pass a value from the input object directly to the output object. Setting the value to true could be a shorthand for doing this:

const reducer = {
  a: {
    x: true, // shorthand for writing '$a.x'
    y: '$a.x',
    z: true, // shorthand for writing '$a.z'
  }
}

const input = {
  a: {
    x: 'X',
    y: 'Y',
    z: 'Z'
  }
}

dataPoint.transform(reducer, input)

// result

{
  a: {
    x: 'X',
    y: 'X',
    z: 'Z',
  }
}

This will be easier to read, especially with deeply nested properties.

Fix request string template example

Documentation in Request String Template has an error

https://github.com/ViacomInc/data-point#stringtemplate

Lines to be changed:
https://github.com/ViacomInc/data-point/blame/master/README.md#L1198
https://github.com/ViacomInc/data-point/blame/master/README.md#L1226

Change the occurences of:

transform('entry:getNodeOrgInfo', {

dataPoint.transform('request:getNodeOrgInfo', {

Improve the regex for detecting entity ids

The regular expression in this file could be improved:

data-point/packages/data-point/lib/reducer-entity/factory.js

Line 23 in 0f84eaf

_.isString(source) && source.match(/^[^$]([\w.]+):?([\w.]+)(\[])?/) !== null

It should match the pattern <EntityType>:<EntityId>, but these examples are false positives: aaa, :aaa, and a:. The regex also fails to match some valid patterns, like a:a.

Some entity names in this file also use hyphens, which suggests they should be valid, but the regex does not support them:

data-point/packages/data-point-express/lib/route-middleware.test.js

Line 17 in 0f84eaf

'transform:my-test-query': acc => ({

For example, the name transform:my-test-query is valid, but the current regex treats it as transform:my because it stops at the first hyphen.

PathReducer mapping

This would be a new feature, the idea is to use the same pattern for entity mapping which is to add [] at the end of an entity id:

hash:Foo[]

this feature would expect the same mapping behavior but using a ReducerPath:

$some.path[]

By adding a [] at the end of a ReducerPath the developer should expect to get a mapping of the path passed.

const input = [
   { a: { b: { c: 'foo' }}},
   { a: { b: { c: 'bar' }}},
]
dataPoint.transform('$a.b.c[]', input)
  .then( acc => {
    // ['foo', 'bar']
    console.log(acc.value) 
  })

Make TransformExpressions a type of reducer

Problem description:

TransformExpressions contain reducers, and some reducers contain TransformExpressions (which contain more reducers and so on). The API would be easier to describe if we just have reducers, which may or may not contain other reducers. Then everything is either an entity or a reducer, with no TransformExpression middleman. This would also simplify the code a bit, and entities could use single reducers without needing to wrap them in TransformExpressions.

Suggested solution:

Replace TransformExpressions with ReducerLists.

add link to hello world video

https://www.youtube.com/watch?v=3VxP-FIWgF0

Do not add empty lifecycle transforms to entities

Problem description:

Entities are given before, value, after, and error transforms even when they're empty. This is unnecessary and adds clutter when trying to inspect the entity definitions.

// example input:
{
  'entry:example': {
    value: '$'
  }
}

// turns into this when passed to createEntity:
{
  id: 'entry:example',
  value: {
    reducers: [
      {
        type: 'ReducerPath',
        name: '',
        asCollection: false
      }
    ],
    typeOf: 'TransformExpression'
  },
  before: {
    reducers: [],
    typeOf: 'TransformExpression'
  },
  error: {
    reducers: [],
    typeOf: 'TransformExpression'
  },
  after: {
    reducers: [],
    typeOf: 'TransformExpression'
  },
  params: {}
}

Suggested solution:

Do not add empty transforms to entity definitions.

coveralls not working

Replace the TransformObject type with TransformMap

TransformMaps, which are used by hash entities, are objects where each value is a TransformExpression.

TransformObjects, which are used by request entities, are objects where keys that start with $ refer to TransformExpressions, but other keys refer to constants.

This is inconsistent, so we should combine these types to make a single ReducerObject type (this is also related to #53).

In this proposed type, keys that start with $ refer to constants, but other keys refer to TransformExpressions.

{
  'hash:example': {
    mapKeys: {
      a: 'hash:other', // resolves to the 'hash:other' entity
      $a: 'hash:other', // resolves to the string 'hash:other'
    }
  }
}

This type would be used by hash and request entities, and it would mean that addValues could be removed from the hash API, because addKeys would have the same functionality.

// in this example, addKeys and addValues are doing the same thing

{
  'hash:example': {
    addKeys: {
      $a: 'hardcoded string'
    },
    addValues: {
      a: 'hardcoded string'
    }
  }
}

Fix addEntityTypes method

The readme documents addEntityTypes like this:

dataPoint.addEntityTypes(specs:Object)

...but it's actually used like this:

manager.addEntityTypes('transform', EntityTransform)

The readme also incorrectly states that the method returns a promise. Either the readme or the code should be updated for accuracy.

encourage users to list reducers in execution order

To make users more cognizant of reducers' execution order, encourage them to list the reducers in execution order. For example, given this code:

const input = {
  a: {
    text: 'Hello World'
  }
}

dataPoint.addEntities({
  'hash:example': {
    addValues: {
      b: { text: 'See You Later' }
    },
    mapKeys: {
      a: '$a.text',
      b: '$b.text',
    },
  }
})


dataPoint.transform('hash:example', input)

A user might expect { a: 'Hello World', b: 'See You Later' }
But instead they'll get { a: 'Hello World', b: { text: 'See You Later' } }

Therefore a warning should be given when reducers aren't in execution order

Remove "Chained Reducers" section from readme

Problem description:

The readme has a section on "Chained Reducers," but that's just another term for a Transform Expression, which is described in a different part of the readme (both terms refer to an array of reducers)

Suggested solution:

Remove the "chained reducers" section and let the transform expressions shine.

Support anonymous entities

This feature introduces anonymous entities, which are just entities defined without names. Entities that do have names are referred to as declared entities.

API

DataPoint will expose a factory function for each registered entity type:

const dataPoint = require('data-point')

const factories = dataPoint.entityFactories

// {
//   entry: [Function: create],
//   request: [Function: create],
//   hash: [Function: create],
//   ...
// }

const requestEntity = factories.request({
  url: 'https://api.github.com/orgs/nodejs'
})

dataPoint.transform(requestEntity)

Example

This example shows the same transform with declared entities and anonymous entities:

declared:

dataPoint.addEntities({
  'request:getOrgInfo': {
    url: 'https://api.github.com/orgs/nodejs'
  },
  'hash:OrgInfo': {
    mapKeys: {
      reposUrl: '$repos_url',
      eventsUrl: '$events_url'
    }
  }
})

dataPoint.transform('request:getOrgInfo | hash:OrgInfo')

anonymous:

const { request, hash } = dataPoint.entityFactories

dataPoint.transform([
  request({
    url: 'https://api.github.com/orgs/nodejs'
  }),

  hash({
    mapKeys: {
      reposUrl: '$repos_url',
      eventsUrl: '$events_url'
    }
  })
])

These are main benefits I see from anonymous entities:

No pressure to think of good names for everything 😄
Currently, entities can reference any other entity, but anonymous entities are not declared in the "global" scope, so there's no question about where they're being used
More readable, because anonymous entities keep more code in the place where it's being used
On the other hand, it's easier to share entities between files. Example:

// data-hash.js

module.exports = hash({
  mapKeys: {
    reposUrl: '$repos_url',
    eventsUrl: '$events_url'
  }
}) 

// main.js

const dataHash = require('./data-hash')

dataPoint.transform([
  request({
    url: 'https://api.github.com/orgs/nodejs'
  }),
  dataHash
])

Any thoughts?

Use ESNext pipeline operator instead of single pipe

To use |> instead to | to pipe reducers in TransformExpressions.

This would align with https://github.com/tc39/proposal-pipeline-operator proposal which will make it familiar to new people using the library how to expect this feature to work and also ... looks beautiful 💅 if you have a code font that uses ligatures.

The feature can be backward compatible, so for version 1.x it will support both | and |> and eventually remove single pipe in favor of the new operator.

Menlo font type: (no ligatures)

With ligatures (Firacode)

Assigning an Invalid Transform does not yield enough information

When assigning an Invalid TransformExpression the error does not give enough information to know which is the Entity with the bad Transform.

Current Error message is:

Error: Transform [object Object] is not valid type (Array|String|Function)

Store id on reducer entities

Problem description:

We define ReducerEntities with the <entityType>:<entityName> pattern, but we have to reconstruct that ID when fetching entity definitions:

manager.entities.get(`${reducer.entityType}:${reducer.name}`)

Suggested solution:

Store the full <entityType>:<entityName> ID on reducer entities.

examples/custom-entity-type.js is broken

Getting error when running:

const entity = Helpers.createEntity(RenderTemplate, spec)
                         ^
TypeError: Cannot read property 'createEntity' of undefined

Reorganize the code for reducers

Problem description:

Each reducer type has a factory file and a resolve file, but they're in different directories. This makes it a little more difficult to find the code you need when updating a specific reducer.

Suggested solution:

The code for each reducer type should be kept in a single directory.

Warn users if reducer doesn't exist for entity

If a user misspells a reducer or tries to use a reducer that doesn't exist for that entity, give them a warning like:

error: maaapKeys is not a valid reducer for the hash entity, maybe try using: mapKey, addKeys....

error: mapKeys is not a valid reducer for collection entity, maybe try using: map reducer or hash entity

Remove predefined execution order for hashes and collections

Problem description:

Hash and collection entities don't always execute in the order they're defined. @YiningChen explains this problem in more depth here #24, but instead of warning users, I propose that we remove this behavior entirely. This would make it easier to read entity definitions without needing to consult the documentation to know the execution order for each of the keys.

Suggested solution:

If users want a hash or collection to use more than one modifier (e.g. addValues, mapKeys, etc), they should be required to use compose.

For example, this would be valid:

dataPoint.addEntities({
  'hash:example': {
    addValues: {
      b: { text: 'See You Later' }
    }
  }
})

...and this would be valid:

dataPoint.addEntities({
  'hash:example': {
    compose: [
      addValues: {
        b: { text: 'See You Later' }
      },
      mapKeys: {
        a: '$a.text',
        b: '$b.text',
      }
    ]
  }
})

...but this would NOT be valid, because addValues and mapKeys are both defined on the root object instead of using compose:

dataPoint.addEntities({
  'hash:example': {
    addValues: {
      b: { text: 'See You Later' }
    },
    mapKeys: {
      a: '$a.text',
      b: '$b.text',
    }
  }
})

Hash Entity error message is not clear

When a hash receives a non-hash type (array, number, string) it throws an error, the error comes out as:

Error Entity "hash:Foo" received a context.value = [object Object] that does not resolve to a plain object, this entity can only process Objects.

Fix the [object Object] for stringified representation and output the type to help the user know the type he or she received.

Create templates for PRs and Issues on GitHub

It would be nice to have a template for both issues and pull requests that facilitates the process of making new issues/PRs with all of the requisite information for review and consideration. This is particularly helpful when bugs are found as it reminds contributors what information is useful for reproducing issues, or when reviewing a pull request to be sure that the steps necessary to merge a pull request have been done.

Use consistent naming for entity functions

Problem description:

The functions for creating different entities are named inconsistently.

Examples

The schema function is EntitySchema:
https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/entity-types/entity-schema/factory.js#L10

The collection function is just Collection:
https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/entity-types/entity-collection/factory.js#L12

Suggested solution:

Rename these functions in lib/entity-types to use the Entity<EntityType> naming convention that schema uses.

Add All Contributors to project

Problem description:

It's difficult to know who has contributed to the project or what they've contributed.

Suggested solution:

Consider adding https://github.com/kentcdodds/all-contributors to this repository to show a table of contributors on the main README.

request inspect options not showing

Remove redundant errors when creating reducers

Problem description:

lib/transform-expression/factory.js and lib/reducer/factory.js both validate the input for creating reducers, but that's redundant because input that's validated in the first file will later be validated in the second file. The logic in reducer/factory.js is also better because it uses the isType methods for each reducer type:

https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/reducer/factory.js#L18

https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/transform-expression/factory.js#L60

Suggested solution:

Remove the validate and isValid functions from lib/transform-expression/factory.js, but add the line "try using an Array, String, Object, or Function" to the error in lib/reducer/factory.js:

Fix execution order of collection modifiers

The readme says that collection entities should go filter -> map -> find, but they currently execute find before map, which isn't useful because map expects an array but find returns an element from an array.

Strongly recommend `yarn install`?

When cloning and setting up this project, running npm install with node v8.7.0 and npmv5.4.0+ will cause a number of issues:

A package-lock.json is created, which is an untracked file
Running the tests will error due to missing dependencies (most notably bluebird, request-promise, winston, deep-freeze). I believe this is because they are not listed as peerDependencies and the package.json in the various packages is where they are actually listed.

If instead yarn install is run, then neither of the above issues are present. Maybe the installation instructions in the CONTRIBUTING.md can be copied to the main README to make it easier to notice that yarn is strongly preferred?

refactor source entity to request entity

Change Entity Source to Entity Request

This change is long overdue, the refactor is important since makes the entity more descriptive. this change should keep full backward compatibility with the source entity, we can remove source entity in the next mayor

datapoint factory fails if no entities are passed

ability to inspect a request before making the call

New feature that would allow the developer to inspect a request before being sent

dataPoint.addEntities({
  'request:<entityId>': {
    params: {
      inspect: Boolean|Function
    }
  }
})

If params.inspect is true it will output the entity's information to the console.

If params.inspect is a function, you may execute custom debugging code to be executed before the actual request gets made. The function receives the current accumulator value as its only parameter.

normalize transform entity

From the readme:

All entities share a common API (except for Transform)

A Transform entity is meant to be used as a 'snippet' entity that you can re-use in other entities. It does not expose the common before/after/error/params API that other entities have.

Transforms do expose the before, after, and error callbacks though. Either the readme or the code should be updated for accuracy.

point README.md badges to master branch

data-point version: *
node version: N/A
npm version: N/A

Any relevant code:

All READMe.md files point to branch=ci, should point them to master

[![Build Status](https://travis-ci.org/ViacomInc/data-point.svg?branch=ci)](https://travis-ci.org/ViacomInc/data-point) [![Coverage Status](https://coveralls.io/repos/github/ViacomInc/data-point/badge.svg?branch=ci)](https://coveralls.io/github/ViacomInc/data-point?branch=ci)

What happened:

At times coverage badge shows 'unknown' since that branch does not exist.

Reproduction:

Go to: https://github.com/ViacomInc/data-point or any README under the packages folder

Problem description:

Points to ci branch but should point to master

Fix integrations express example

Documentation in Basic Express Example has an error

https://github.com/ViacomInc/data-point#integrations

Lines to be changed:
https://github.com/ViacomInc/data-point/blame/master/README.md#L2615
https://github.com/ViacomInc/data-point/blame/master/README.md#L2616

Change

    res.send(result.value))
  }))

   res.send(result.value)
  })

Validate entities during compilation

This proposes a second "compilation" step when defining entities and entity types. DataPoint currently parses top-level entities when they're defined, but after finishing this work, the second step would examine and validate the contents of each entity. Having this step would prevent some runtime errors, and it would also make #40 possible.

In the following example, dataPoint.addEntities does not cause an error, but dataPoint.transform fails at runtime because transform:two is not defined:

dataPoint.addEntities({
  'transform:one': ['transform:two']
})

dataPoint.transform('transform:one', {})

With a second compilation step, the call to addEntities would trigger a warning/error, and calling addEntityTypes would perform similar checks.

conditional operator

For conditional execution, we already have the control entity. But there are moments where I have needed to just conditionally execute an entity depending on the value being passed.

Also, one drawback of using the control entity is that code will execute the entire entity lifecycle to just resolve a condition, which in many cases makes sense, but in others, it might be too much/too expensive.

Feature proposal #15 is to use the acc.resolve method, but it might not go anywhere soon (see comments on this).

Proposed feature:

spec:

'?<entityId>'

Usage:

// throws error if input is null/undefined
dataPoint.transform('hash:Foo', undefined)
  .catch(e => e)
  .then( acc => {
    expect(acc).toBeInstanceOf(Error)
  })

// with the ? operator, to just return undefined with out throwing an error
dataPoint.transform('?hash:Foo', undefined).then( acc => {
  expect(acc.value).toEqual(undefined)
})

Validate ajv schemas

Problem description:

Schema entities do not validate the ajv schemas they're given.

Suggested solution:

When schema entities are created, they should call ajv's validateSchema method and throw with any errors that are generated.

allow reducer functions to be resolved as a promise

Allow reducer functions to be written as sync, async, and node style format

currently reducer functions are only allowed to accept the following format:

(acc, next) => {
  next(err, value)
}

The Feature is to enable for the developer to write functions such as:

sync resolution

(acc) => {
   return value
}

async resolution

(acc) => {
   return Promise.resolve(value)
}

async await resolution

async (acc) => {
   return await value
}

Advantages:

Reducers will also be easier to test
Easier to reason with
Embraces newer js patterns

attach entityId to error

to help with debugging attach entityId to the error object

Line where to add code

https://github.com/ViacomInc/data-point/blob/master/lib/entity-types/resolve-entity/resolve-entity.js#L126

issues with hash and collection skipping if value is empty

This issue was uncovered through #44

the following lines of:

Hash entity:

data-point/packages/data-point/lib/entity-types/entity-hash/reducer.js

Lines 136 to 139 in 9d4415e

 // if there is nothing to do, lets just move on 

 if (typeof acc.value === 'undefined' || acc.value === null) { 

 return Promise.resolve(acc) 

 }

Collection entity:

data-point/packages/data-point/lib/entity-types/entity-collection/reducer.js

Lines 105 to 108 in 9d4415e

 // if there is nothing to do, lets just move on 

 if (_.isEmpty(accumulator.value)) { 

 return Promise.resolve(accumulator) 

 }

May cause unexpected behaviors because they happen at entity resolver level. this means their before/after will get executed regardless.

Standardize asCollection behavior

How should asCollection reducers behave when the input is not an array?

Path reducers return null, but entities resolve the value as if asCollection had been false (and arguably, the entity behavior makes it possible to abuse the [] notation, because there's no drawback for putting that on every entity reference).

If a reducer expects an array but doesn't get one, a consistent approach would be logging an error but returning an empty array. We could also introduce a ?[] notation that behaves the way that asCollection entities behave now.

path reducer code:
https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/resolve/reducer-path.js#L31

entity code:
https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/entity-types/resolve-entity/resolve-entity.js#L147

Rename ComposeModifier to ComposeReducer

The terms ComposeModifier and ComposeReducer are used inconsistently. For example, the documentation for hash uses both of them when referring to the compose property.

Create a base entity directory

Problem description:

It's hard to trace the functions that create and resolve entities.

The main resolve function lives in lib/entity-types/resolve-entity/resolve-entity.js, but the createEntity function lives in lib/helpers/helpers.js, which is not the first place you might look for such a crucial function:

https://github.com/ViacomInc/data-point/blob/master/packages/data-point/lib/helpers/helpers.js#L53

Suggested solution:

Move this code to a single entity-types/base-entity directory.

Support TransformObject as a reducer type

TransformMap is currently used for hash entities, but it should be supported as a reducer type (meaning it could be part of a TransformExpression). This would make it easier to manipulate objects without creating full-blown hash entities. One main use case would be combining multiple requests into a single object:

{
  'entry:example': [
    {
      a: 'request:a',
      b: 'request:b'
    },
    // this transform could access the data from requests 'a' and 'b'
    'transform:data'
  ]
}

Do not use _.default for input to transform

The value parameter for DataPoint.transform (which becomes acc.value) should be attached to the accumulator without modification. Primitive values are currently wrapped in objects, which might lead to unexpected behavior. For example, this currently throws an error:

dataPoint
  .transform((acc) => {
    console.log(acc.value) // [String: 'test']
    assert(acc.value === 'test') // ERROR
  }, 'test')

The readme example for the control entity currently does not work because of this behavior (it tries to compare a string against a string wrapped as an object).

Document what's exported on the DataPoint object

Problem description:

It's not clear what methods/properties are available on the DataPoint object.

const DataPoint = require('data-point')

For example, DataPoint.createEntity is used in a readme example, but that method is never documented.

Suggested solution:

Add this info to the readme
Refactor lib/index.js to make it clear what's being exported. Currently it does this:

data-point/packages/data-point/lib/index.js

Line 4 in 28b6ed9

module.exports = Object.assign({}, require('./core'), require('./helpers'))

...and ./helpers, which it's pointing to, does the same thing...

data-point/packages/data-point/lib/helpers/index.js

Line 3 in 28b6ed9

module.exports = Object.assign({}, require('./helpers'), require('../utils'))

...but listing the exports by name would make these files more understandable:

// rough example...

module.exports = {
  create: require('./core').create,
  inspect: require('./helpers').inspect
  // and so on
}

Create path reducer functions

data-point version: 1.x
node version: N/A
npm version: N/A

Problem description:

To resolve a Path Reducer some logic must be applied. This logic is currently computed at the resolution cycle which is not the most efficient approach.

README.md examples are hard to read on mobile

Deprecate the '$.' path reducer

The $. path reducer is too easily confused with $.., and it's redundant because $ does the same thing. It also follows a different pattern than $ and $..:

'$..a' accesses acc['a']
'$' accesses acc.value
'$a.' accesses acc.value
'$a' accesses acc.value['a']
'$.a' accesses acc.value['.a'] (it includes the . in the path, which is not intuitive because $ and $. are equivalent by themselves)

	// if there is nothing to do, lets just move on
	if (typeof acc.value === 'undefined' \|\| acc.value === null) {
	return Promise.resolve(acc)
	}

	// if there is nothing to do, lets just move on
	if (_.isEmpty(accumulator.value)) {
	return Promise.resolve(accumulator)
	}

viacominc / data-point Goto Github PK

data-point's People

Contributors

Stargazers

Watchers

Forkers

data-point's Issues

API

Example

Proposed feature:

Recommend Projects

Recommend Topics

Recommend Org