Giter VIP home page Giter VIP logo

dwm's People

Contributors

ayushkesar avatar colemanja91 avatar swardlincoln avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dwm's Issues

Run UDFs as extension of a class

UDFs should be defined and executed as extensions of a base UDF class:

from dwm import UDF

class MyUDF(UDF):
  udf_name = 'publicly-visible name' 
  udf_docs = 'publicly-visible docs'
  def run(self, data, hist):
  ...

Refactor: set top-level config in a DWM class object

Currently, the DWM config dictionary is passed down to each sub-function, as is the MongoDB config.
We need to refactor this to have a top-level DWM object which is configured once, and all subsequent calls reference.

Example:

from dwm import DwmInstance

DWM = DwmInstance()
DWM.set_mongo('mongo_connection_string')
DWM.set_config({...}) # pass in dictionary with config object

INPUT = {...} # input record for cleaning

OUTPUT, HISTORY = DWM.clean(INPUT)

We should be able to preserve most, if not all, of the low-level functions used in DWM already; this would just replace how we interact with it.

BUG: Break loop in DeriveDataLookupAll when match found

Currently loop through config.derive settings breaks only when the actual input field value is changed, which leads to inconsistent behavior. Need to instead return a boolean flag from the child functions which would break the loop.

Return contactHistory separately from dwmOne

Currently, running dwmOne results in the contactHistory record being auto-added to MongoDB. To make this package more scalable as a microservice, we need to return the history record separately.

Example:

INPUT = {...} # single record to clean
OUTPUT, HISTORY = dwmOne(INPUT, ...)

This update should include removing _current, timestamp, and the key identifier from history record.

Add metadata to contactHistory records

When records are added to contactHistory, include custom metadata for better database cleanup/management.

{
    "IdentifierKey": "12345",
    "metadata": {
        "batchId": "abcde",
        "batchTimestamp": timestamp
    },
    ...

Typo in Read me

In Readme.md there is a small typo. Should it be Usage instead of Useage?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.