Giter VIP home page Giter VIP logo

pillar's Introduction

Pillar

Pillar is a REST based web service written in Go. It provides the following services:

  • Imports external data into Coral data model
  • Allows CRUD operation on Coral data model
  • Provides simple queries on Coral data model

All of the Pillar documentation (including installation instructions) can be found in the Coral Project Documentation.

The Pillar documentation lives in Github in the coralproject/docs/docs_dir/pillar repository.

pillar's People

Contributors

aiboyles avatar alexbyrnes avatar ardan-bkennedy avatar buth avatar gabelula avatar impronunciable avatar jde avatar kgardnr avatar pablocubico avatar samshub avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

pillar's Issues

More unit and integration tests for Pillar

Pillar is a REST based web-service module. This calls for more unit tests and integration tests. Not only should we be able to test code changes within the server, we must also be able to test various end-points provided by the server.

We do have both in place, but not enough - this issue is to make sure we expand tests and have a common framework for anyone to be able to test them properly.

a) Expand tests within the server module
b) Expand tests within the client module

Code reorg (cleanup) and few model changes

  • The handler methods can all be consolidated into a single go file
  • Few duplicates code in service and model can be consolidated into model for reuse
  • Few model changes to make it simpler
    • user.user_name should be user.name
    • note.target_type should be note.target
    • action.target_type should be action.target

Build Tag APIs

Challenge: The Trust product requires the ability to apply tags to users.

Concept: Taggable ("Users are Taggable")

Spec

  • Create tags collection to hold all possible tags:
{
  _id: type ObjectId()
  name: type string, // required, not empty, unique
  group: type string // default 'users'
}
  • Expose CRUD endpoints for Tags
    • [POST, PUT, GET, DELETE] /tags/
  • Design reference scheme (similar to Issue #8) to associate tags with users
    • We should have the ability to assign tags to any document in any collection

New endpoint needed to create 'Action'

There are two new requirements:
a) Make Action as a first-class citizen. IOW, have a separate collection for all actions
b) Provide an endpoint to create Action

Create data randomizer

Challenge: Our demo needs a data set. The best dataset we have, currently, is the WaPo data, which is proprietary. Generating random data will lead to nothing but noise. We need to be able to obscure the data such that:

  • it is not possible to trace users in the obscured set back to users in the live site
  • no real numbers can be found
  • patterns that make the data interesting are presrved

Solution: Write a script that crawls all nytimes collections and:
Users:

  • obscures all user names with "user"+randomNumber
  • insert record into target database

Comments:

  • Throws away comments based on a "double random" method. (generate a random number between .2 and .3, then another random number between 0 and 1. If the first is lower than the second, throw the data away.
  • Keep the full comment text
  • Insert into target database

Assets:

  • copy in all the assets as they exist in the wapo database

Actions:

  • copy in all actions using the randomize throwaway method for comments that haven't been thrown away

Make sure all counts are up to date.

Create an endpoint to capture user activity on the front end

We need to be able to capture data about how our users are using our demo. Create an endpoint that adds documents to cay_user_actions.

{
  _id: ObjectId(),
  time: ISOTime that the packet was received,
  data: contents of the POST payload,
  release: "0.1.0"  // we can eventually make this dynamic
}

Add unstructured metadata property to all models

Challenge: We need to account for metadata for all of our entities that varies from client to client.

Solution: Add a metadata property to every struct along with the endpoint code to capture it.

metadata map[interface{}]interface{}

This should allow us to capture any kind of data sent to pillar under the metadata attribute.

Make all import endpoints "upsert" (insert or update)

Situation: During a large import process, it's often the case that we need to go over already inserted data in multiple passes in order to add new fields, etc... This creates a problem when keys are already established. For example, if we import users, assets and comments, but then want to reimport users, we need to drop the users collection. At this point the keys in comments will break and we will need to reinsert comments as well.

Also, in some cases such as importing users from the comment records, we will intentionally (dumbly) be sending duplicate records. We shouldn't see errors for these if they are successfully in the db.

Solution: Instead of throwing an error when an existing record is posted to an import endpoint (asset/user/comment and action), the data in the database should be updated keeping all the _id fields in unchanged. The endpoint should return a 200 and a message saying whether or not the record was updated or inserted.

pseudocode:

entityHandler:
check source id against database
entity already exists:
run update command for all non mongo id values
respond with 200 && update message
entity doesn't exist:
run code that exists now (insert and key translation)
respond with 200 && insert message
reply with 500 only if the another error occurs

Technical discovery: Configurable Model

Our 2nd product, Ask, will revolve around the ability to create a custom form that will allow an arbitrary datatype to be handled as an "ask". The Ask will be defined by a front end tool that will store the 'schema' for that ask as a json object. The model that powers the ask api, therefore, must be as configurable as possible based on a json object.

Configurable model elements include:

  • Schema
    • Any number of fields
    • Names, descriptions, other labels for fields
    • Field types
    • Required?
    • Default values.
  • Methods for dealing with files
    • Uploads with validation
    • Triggering workflows (aka, resizing/resampling)

Tech Challenge:
Create a configurable model package and a basic set of apis that allow crud on that model.

Adopt web package

Let's start using a web package to handle http requests. Wrapping each of our http handlers in importer.go will allow us to:

Requirements:

  • standardize behavior between all Coral services (features, config, etc.. need to be identical)
  • centralize headers
  • apply middleware (such as auth)
  • cut down on repetitive code
  • CORS support
  • JSONP support

Using https://github.com/ardanlabs/kit/tree/master/web will allow us to standardize auth, config and logging with Xenia, which will be essential for consistency across the project.

Implement Count and other statistics

Challenge: A lot of the analytics that we want to provide depend on counts. How many times has a user recommended and article. Calculating these on the fly is not practical.

Solution: Implement counters and lists in documents that are updated upon creates/updates.

Note: Specific counts to be cached to be defined in the Data Model Wiki page.

Update and Delete Tag behaviors

We need the ability to update and delete tags. The current upsert and delete functionality will update the master tag list, but does not update or remove the tags that have already been applied to entities.

Functionality to be added:

  • On update scan for all entities that have the tag and update the tag in the subarray
  • On delete scan for all entities that have the tag and remove it from the subarray

Implement tracking and metrics

Challenge: we need to know how people are using our products and how well our products are performing (on an opt in basis.)

Evaluate the two leading candidates for monitoring and metrics:

prometheus.io
elk stack

Create endpoint to add indexes to mongo

Challenge: With variations in metadata, we cannot predict at the api level which fields will need to be queried.

Solution: Publish an endpoint with Pillar that creates an index on one or more fields. The endpoint should accept 3 params:

collection: required, string, the name of the collection to index
keys: required, object, a json object to be dropped in the first argument of createIndex()*
options: optional, object, a json object to be dropped in the second argument of createIndex()*

Note: passing keys and options directly into the function call will allow us to take advantage of all of mongodb's indexing features, which are substantial: https://docs.mongodb.org/manual/reference/method/db.collection.createIndex/#db.collection.createIndex

Consumer: coralproject/sponge#21

Dockerize Pillar Server

Various items to be taken care of:

Create a Dockerfile
Make sure it has everything needed to make a Pillar Server container

Data migration with new ID (bson.ObjectId) and its impacts

So now that we’re going to be using bson.ObjectId as primary key for most of our first-class citizens, there are some side-effects I want to bring to your attention. And yes, discuss the remedy as well.

Order

Really the discussion boils down to inserting data in order. In other words, start with the least dependent and go all the way to the most dependent one.

User
Asset
Comment (start with the root of the tree, since each child needs to have a ref to its parent)
Notes
Actions

References

Another challenge is to fix the fidelity using original reference. For example a Comment is associated with a User and that means Fiddler must also pass the ObjectId for the User. Similarly the ObjectId for parent Comment if any.

We have two choices:

  • Fiddler finds the ObjectId using the original id from User or parent Comment
  • Fiddler passes all the original id as a sub-item (field) and let Pillar take care of it.

I’m proposing that we go with option (b). Introduce a sub-json say ref (or whatever) and pass all original ids.

ref {
“parent_id”: “sndlfkslfjlsd”,
“user_id” : “aljdlkfafjsjf”,
***
}

This ref field will not be serialized in the DB, but will only be only used as a way to IMPORT data.

refactor model

separate model package with different file names
separate service package

Coral Data Model

coral-schema

comments

  • id
  • userId
  • assetId
  • parentId
  • children (array)
  • body
  • status
  • dateCreated
  • dateUpdated
  • dateApproved
  • Actions
  • Notes
  • Source: original IDs from external source (publisher)

User

  • ID
  • UserName
  • SourceID: original IDs from external source (publisher)

Asset

Content the comments are on

  • ID
  • URL
  • SourceID: original ID from external source (publisher)

Data - Compute "meta stats subdocuments"

Challenge: Provide a 'meta' level of stats on each stats packet calculated.

Each dimensional breakdown offers a consistent set of values. In order to intelligently work with them (and build front ends to do so), we need to know information about what values we can expect, and how they are formed.

Sample "meta stats" packet:

  min: // the minimum value in that dimension
  max: // the maximum
  mean: // the mean
  median:  // the median
  stdev: // the standard deviation value
  distribution: [ // a breakdown of the distribution of values in the range between man and max
    ##, // number of elements falling between 0% and 5% of range
    ##, // number of elements falling between 5% and 10% of range
    ...
    ## // number of elements falling between 95% and 100% of range
  ]

A meta stats packet must be provided for each field in each dimensional breakdown. For example, we might want to architect it this way:

  user_statistics.meta.comments.all.all.count: {
    // an entire meta stats packet for user_statistics.statistics.comments.all.all.count 
  }

This would allow a client who knows they want to work in a certain dimension to request a meta packet to render the interface for that dimension.

How should we go microservice?

The signs and omens are clear, the time has come to adopt a microservice architecture and start developing our messaging protocol.

This is a discussion thread to track the conversation. Go!

Support CORS pre-flight requests

We are using fetch() on the front-end, which uses a "cors" mode and sends a pre-flight OPTIONS request to ask for available methods on the API side.

Using fetch's "no-cors" mode allows for some POST requests, but when using "no-cors" you can't consume the response body, which is quite crucial. So CORS it is.

I think the gorilla handlers do have support for OPTIONS requests, the docs aren't very clear on how to set it up, but you can see some of the options on the code.

See related PR: #38

Create indexes on mongo collections

Pillar needs to create indexes that prevent table scans for all operations it handles.

db.collection.createindex() will not override existing indexes, so we can call createindex each time the server starts without incurring cost:

https://docs.mongodb.org/v3.0/reference/method/db.collection.createIndex/#db.collection.createIndex

Note, indexes should not be created in the background, as we want to ensure that they are in place before the server starts accepting requests.

Use ardanlabs/kits/log

Change from log to "github.com/ardanlabs/kit/log" for logging. Look at shelf or sponge on how to initiate and use log.

Create search_history collection

Challenge: As searches change over time, people will want to be able to see what the effects of those changes are. In anticipation of this, we need to create a history of creates and updates that store each search state along the way.

Solution: Create a search_history collection and update it whenever a search is created or updated. Documents should look like this:

{
  action: "[create|update]",
  when: date,
  search: { full user group record }
}

Merge backend and service mongo code

Reuse and idea of a backend and create a generic package to merge duplicate code out there. Currently we have mongo code in service as well as mongodb package.

Establish variable naming conventions for Coral Schemas

What?

Since Xenia is a pass-through from the mongo storage of our data, the field names are carried through. JSON is traditionally camelCase, while our field names are currently PascalCase. Fixing this will lead to less typos, and more importantly, expected behavior for users of our software.

MongoDB naming conventions say that field names should be lowercase (camelCase or snake_case).
http://stackoverflow.com/questions/9868323/is-there-a-convention-to-name-collection-in-mongodb
This makes sense if you think about how mongodb speaks JavaScript on the cli and is basically storing JSON blobs.

The Google JSON Style Guide says that JSON should be camelCased, in the same naming conventions as JavaScript https://google.github.io/styleguide/jsoncstyleguide.xml#Property_Name_Guidelines

JavaScript naming conventions

How to fix?

Change all field names to lowercase (optionally snake_case if you prefer). If we fix this now, it will be less painful than going back later and updating every instance.

Make Source field consistent in all Collections - Simplify Referential Integrity in Import

Since we're creating bson.ObjectId for all ID fields in our collections, we established a standard approach to identify/lookup using original source fields as strings. This was done to find the references and keep integrity in the system.

For example the Source field in a Comment looks as follows:

// CommentSource encapsulates all original id from the source system
type CommentSource struct {
    ID       string `json:"id" bson:"id" validate:"required"`
    AssetID  string `json:"asset_id" bson:"asset_id" validate:"required"`
    UserID   string `json:"user_id" bson:"user_id" validate:"required"`
    ParentID string `json:"parent_id" bson:"parent_id"`
}

These fields are used to lookup respective items in their own collection and fix the references in a comment.

However, this is not done consistently in other collections. We should make a conscious effort to keep this consistent for all collections such as Asset, User and Action as well.

Allow notes on Comments and Users

Challenge: The Trust product will allow users to leave notes on comments or users. Most of our source data structures only allow comments on notes. Our schema will need to allow notes to be placed on documents in any collection.

Concept: Notable (aka, "Comments and Users are Notable")

Spec

  • Build CRUD apis for notes
    • [POST, GET, PUT && Delete] /notes/
  • Implement note counts on comments/users
  • Implement strategy to return notes along with comments/users

3 possible solutions depending on how we end up dealing with relations in mongo:

  • Append notes to a subdocument on the document,
  • Create a separate notes collection. Each document has a field indicating which document the note is on, or
  • Create separate noes collections for notes on each document type. aka, make user_notes and comment_notes collections for notes on users and comments respectively.

Add author(s), section, sub-section to an Asset

Proposal below:

type Author struct {
    ID       string        `json:"id" bson:"_id" validate:"required"`
    Name     string        `json:"name" bson:"name" validate:"required"`
    URL      string        `json:"url,omitempty" bson:"url,omitempty"`
    Twitter  string        `json:"twitter,omitempty" bson:"twitter,omitempty"`
    Facebook string        `json:"facebook,omitempty" bson:"facebook,omitempty"`
}

type Asset struct {
    ID         bson.ObjectId `json:"id" bson:"_id"`
    URL        string        `json:"url" bson:"url" validate:"required"`
    Tags       []string      `json:"tags,omitempty" bson:"tags,omitempty"`
    Authors    []Author      `json:"authors,omitempty" bson:"authors,omitempty"`
    Section    string        `json:"section,omitempty" bson:"section,omitempty"`
    Subsection string        `json:"subsection,omitempty" bson:"subsection,omitempty"`
    Source     ImportSource  `json:"source" bson:"source"`
    Metadata   bson.M        `json:"metadata,omitempty" bson:"metadata,omitempty"`
}

Prevent duplicate actions

Currently, pillar allows users to perform the same action on a single target more than once.

When an action is posted, pillar should check to see if there's already an action matching:

  • the user
  • the target
  • the action type

If this already exists, we should not insert another copy and, instead, respond with an appropriate message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.