Giter VIP home page Giter VIP logo

chronicle's Introduction

Chronicle

find everything you've ever found

Build Status: Travis

Installation

Large Tools

Chronicle is built using Node.js, ElasticSearch, PostgreSQL, and Redis, so you'll want to install the current stable version of all of these.

If you are using Mac OS and have Homebrew installed, this incantation should work:

$ brew install nodejs elasticsearch postgresql redis

Code

The server-side code dependencies are managed with npm and requires that Grunt is globally installed (npm install -g grunt-cli). The front-end dependencies are managed with Bower; you can install it via npm install -g bower if you don't have it on your system.

To fetch dependencies and get cooking:

  1. npm install and ensure redis, elasticsearch, postgres are all running
  2. As part of the npm install process, the postinstall script will install the Bower dependencies for you.
  3. Copy config/local.json.example to config/local.json, and put your local info in there.
  4. Run ./bin/create_db.sh to create the database
  • this script currently hard-codes the db user, password, and dbname to 'chronicle' (issue #112)
  1. Run ./bin/migrate.js to run all the migrations that create the database tables and indexes. (This script also reindexes elasticsearch, but on the first pass, you don't have data in postgres to copy over.)
  2. Run ./bin/create_test_data.js to create a test user and test data
  • the test user is defined in the config file
  • the test data is a set of visits created using the URLs in config/test-urls.js. Over time we'll experiment with different test data sets, might wind up with a test-urls directory instead.
  1. npm start
  2. You're up and running! surf to http://localhost:8080 ๐Ÿ„

Tests

Right now the test suite consists entirely of functional tests that require Selenium Server 2.44.0.

Prerequisites

Run the tests

Run the following in separate terminal windows/tabs:

  • java -jar path/to/selenium-server-standalone-2.44.0.jar
  • grunt test

Available Grunt Tasks

Name Description
autoprefixer Adds vendor prefixes to CSS files based on http://caniuse.com statistics.
build Build front-end assets and copy them to dist.
changelog Generate a changelog from git metadata.
clean Deletes files and folders.
contributors Generates a list of contributors from your project's git history.
copy Copies files and folders.
copyright Checks for MPL copyright headers in source files.
css Alias for "sass", "autoprefixer" tasks.
hapi Starts the hapi server.
jscs JavaScript Code Style checker.
jshint Validates files with JSHint.
jsonlint Validates JSON files.
lint Alias for "jshint", "jscs", "jsonlint", "copyright" tasks.
sass Compiles Sass files to vanilla CSS.
serve Alias for "hapi", "build", and "watch" tasks.
validate-shrinkwrap Submits your npm-shrinkwrap.json file to https://nodesecurity.io for validation.
watch Runs predefined tasks whenever watched files change.

npm Scripts

Name Description
authors Alias for grunt contributors Grunt task.
lint Alias for grunt lint Grunt task. This task gets run during the precommit Git hook.
outdated Alias for npm outdated --depth 0 to list top-level outdated modules in your package.json file. For more information, see https://docs.npmjs.com/cli/outdated.
postinstall Runs after the package is installed, and automatically installs/updates the Bower dependencies.
shrinkwrap Alias for npm shrinkwrap --dev and npm run validate to generate and validate npm-shrinkwrap.json file (including devDependencies).
start Runs grunt serve.
test Runs unit and functional tests.
validate Alias for grunt validate-shrinkwrap task (ignoring any errors which may be reported).

Creating Dummy Data

If you just want to test something quickly with a small, known test data set:

  1. Run ./bin/create_db.sh to drop and re-create the local Postgres database.
  2. Run ./bin/migrate.js to apply any Postgres migrations specified in the server/db/migrations/ directory.
  3. To enable test data, ensure the testUser.enabled config option is set in config/local.json.
  • You can use the default id and email (defined in server/config.js), or set them yourself. You can set the values via env vars or config values. See server/config.js for the defaults and which config values or env vars to use.
  1. Run ./bin/create_test_data.js to create a dummy user and a few dummy visits.
  • The created dummy visits which will be created can be found in the config/test-urls.js file.

Learn More

chronicle's People

Contributors

jaredhirsch avatar johngruen avatar nchapman avatar pdehaan avatar vladikoff avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

chronicle's Issues

add scraper service (extract interesting data from URLs)

Thoughts

  • new URLs should be added to scraper queue
  • scraper output should be JSON structured & sent to elasticsearch, it'll also need to be transformed and inserted into MySQL
  • scraper-worker can just poll the scraper's endpoint (or wait for a callback, whatever), then chuck it in elasticsearch when ready
  • scraper might be third party, or we might own it and build it ourselves
  • scraper will need to canonicalize URL, generate summary, find a suitable image/media blob, and possibly also generate keywords
  • let's have a provider-agnostic interface/contract that lets us use the same API whether it's embedly, our own scraper, or some combination of the two
  • also: be sure to insert canonical URL into visits.url and add visits fields for other new fields

Tasks

  • Create schema for user pages to store scraped data
  • Create scraper worker (embedly)
  • Index scraped data
  • Expose scraped data in visits API
  • Expose scraped data in search API
  • Show scraped data on the front-end in the visits index

allow visit creation via PUT /visits/:visitId

Right now, clients can only create visits via POST to /v1/visits, but they can optionally specify the visitId.

RESTfully speaking, the only reason to POST to an endpoint is because you don't know what the representation will be. If the client knows the visitId, then the client could just PUT the visit to its URL, /v1/visits/visitId.

Not a big deal, but would be nice to eventually fix.

api server: build out visits API

  • provisional API definition: https://etherpad.mozilla.org/chronicle-api
  • since we have no real hapi directory layout standard, maybe models, views, controllers?
    • views transform the model output into JSON (do we need this?)
    • controller includes route handler, coordinates model/view
    • model includes biz logic, touches the sequelize ORM "models" (kinda confusing, hmm), emits via views
  • after writing it out, I'm unconvinced this is a useful abstraction. we'll see how implementation goes.
  • thinking of something like
/server
  /controllers
  /routes
  /views
  /models
  /db
    /models
    /migrations
  • if this turns out to be overkill, just write it out using the hapi route handler and get on with life :-)

Server Error: Unauthorized

I get this error in the log every time I load the root path (http://localhost:8080/). I assume this is a temporary problem until we finish setting up auth.

Debug: auth, unauthenticated, error, session
    Error: Unauthorized
    at Object.exports.create (/Users/nchapman/Code/chronicle/node_modules/hapi-auth-cookie/node_modules/boom/lib/index.js:21:17)
    at Object.exports.unauthorized (/Users/nchapman/Code/chronicle/node_modules/hapi-auth-cookie/node_modules/boom/lib/index.js:85:23)
    at validate (/Users/nchapman/Code/chronicle/node_modules/hapi-auth-cookie/lib/index.js:114:49)
    at Object.scheme.authenticate (/Users/nchapman/Code/chronicle/node_modules/hapi-auth-cookie/lib/index.js:179:13)
    at /Users/nchapman/Code/chronicle/node_modules/hapi/lib/auth.js:214:30
    at internals.Protect.run (/Users/nchapman/Code/chronicle/node_modules/hapi/lib/protect.js:56:5)
    at authenticate (/Users/nchapman/Code/chronicle/node_modules/hapi/lib/auth.js:205:26)
    at internals.Auth._authenticate (/Users/nchapman/Code/chronicle/node_modules/hapi/lib/auth.js:328:5)
    at internals.Auth.authenticate (/Users/nchapman/Code/chronicle/node_modules/hapi/lib/auth.js:164:17)
    at /Users/nchapman/Code/chronicle/node_modules/hapi/lib/request.js:321:13

Get auth sorted out, working with visits api

The current auth code basically duplicates the hapi bell plugin inside the /auth/complete route handler. Decide if it's even worth bothering with bell; if it's simpler to just make it a fully custom auth strategy, do that.

Document visits api

Put the docs inside /docs/API.md, follow fxa formatting conventions where those make sense

convict error when trying to clone and run from GitHub

Steps to repro

$ git clone [email protected]:mozilla/chronicle.git
$ cd chronicle
$ npm install
$ cp config/local.json.example config/local.json
$ npm start

Actual results

> [email protected] start /Users/pdehaan/dev/tmp/chronicle
> node server/index.js


/Users/pdehaan/dev/tmp/chronicle/node_modules/convict/lib/convict.js:393
        throw new Error(errBuf);
              ^
Error: server.session.duration: must be a positive integer: value was "7 days"
    at Object.rv.validate (/Users/pdehaan/dev/tmp/chronicle/node_modules/convict/lib/convict.js:393:15)
    at Object.<anonymous> (/Users/pdehaan/dev/tmp/chronicle/server/config.js:183:6)
    at Module._compile (module.js:456:26)
    at Object.Module._extensions..js (module.js:474:10)
    at Module.load (module.js:356:32)
    at Function.Module._load (module.js:312:12)
    at Module.require (module.js:364:17)
    at require (module.js:380:17)
    at Object.<anonymous> (/Users/pdehaan/dev/tmp/chronicle/server/index.js:8:14)
    at Module._compile (module.js:456:26)

npm ERR! [email protected] start: `node server/index.js`
npm ERR! Exit status 8
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is most likely a problem with the mozilla-chronicle package,
npm ERR! not with npm itself.
npm ERR! Tell the author that this fails on your system:
npm ERR!     node server/index.js
npm ERR! You can get their info via:
npm ERR!     npm owner ls mozilla-chronicle
npm ERR! There is likely additional logging output above.
npm ERR! System Darwin 12.5.0
npm ERR! command "node" "/usr/local/bin/npm" "start"
npm ERR! cwd /Users/pdehaan/dev/tmp/chronicle
npm ERR! node -v v0.10.33
npm ERR! npm -v 1.4.28
npm ERR! code ELIFECYCLE
npm ERR!
npm ERR! Additional logging details can be found in:
npm ERR!     /Users/pdehaan/dev/tmp/chronicle/npm-debug.log
npm ERR! not ok code 0

Expected results

No errors.

Need to dig in a bit more, but I think the problem w/ convict is this in the config/local.json.example file:

    "session": {
      "password": "Wh4t3ver.",
      "isSecure": false,
      "duration": "7 days"
    },

I think convict may want that duration in milliseconds here in the JSON file (but oddly accepts it as a valid default in server/config.js).

'grunt server' should set process timezone to UTC

We want timestamps on the server to always be in UTC.

This won't be a problem in real servers, but when developing locally, there are a couple ways this can fail:

  1. The MySQL server needs to have its timezone set to UTC. Otherwise it'll store the TIMESTAMP fields as UTC, but convert them to the local timezone in query results.
    • This can be easily fixed by sending the query SET time_zone=+00:00. I'm adding this to the logic that grabs a connection from the connection pool.
  2. Node needs to have its timezone set to UTC, via process.env.TZ, before starting the server.
    • Trying to set this on a running process produces non-deterministic results; see this bug for details.
    • When node-mysql gets a timestamp from the database, it converts it into a Date object, which is formatted to the system local time.
    • Other workarounds for node-mysql returning results in PST aren't attractive:
      • use the deprecated node-mysql typeCast function to write a custom type mapping for dates (ughhh) (example)
      • manually handle this in the db layer and hope we don't miss a spot

The simplest workaround is to ensure that the env var 'TZ' has been set before the server starts. I'm not sure how to map this request to grunt's declarative syntax. @pdehaan any thoughts on how best to handle this?

Track which device created a visit

In order to give users a more "seamful" experience, we should know which device they were browsing from so that we can offer them better context.

Create base worker code / work queue

This is not really user visible, but does provide a pluggable infrastructure for every additional service to hook into, while keeping them all decoupled.

add workers to send URLs to embed.ly, ES, MySQL

Assume hapi tosses a new URL into the redis work queue, we'll want a few workers doing various things with that data:

  • drop it into mysql (directly or via ORM? feels awkward)
  • send it off to embed.ly
  • after it returns from embed.ly, throw (some subset of) the response into elasticsearch

A couple of JSHint warnings

STR:

  1. Set up JSHint.
  2. Run it.

Actual results:

$ npm run lint

> [email protected] lint /Users/pdehaan/dev/github/chronicle
> grunt lint

Running "jshint:app" (jshint) task

app/scripts/views/base.js
  line 72  col 31  'text' is defined but never used.

app/scripts/views/visits/index.js
  line 12  col 34  Extra comma. (it breaks older versions of IE)

  โš   2 warnings

Warning: Task "jshint:app" failed. Use --force to continue.

Aborted due to warnings.

72:        context.l = function (text) {
73:          return function (text, render) {
74:            return render(self.localize(text));
75:          };
76:        };

Warning 1: app/scripts/views/base.js:72 โ€” Note that the text attribute is defined twice (lines 72 and 73), so the outermost one is probably out of scope and overridden by the nested one and used on line 74.


11:  var VisitsIndexView = BaseView.extend({
12:    template: VisitsIndexTemplate,
13:  });

Warning 2: app/scripts/views/visits/index.js:12 โ€” trailing comma thing there...

strip out trailing slashes

hapi is fairly stupid about mapping /v1/visits/ to /v1/visits. Build some middleware to handle this before routing.

Create /app/images directory and copy over the files into dist/ during grunt build

I think the rest of the pieces are in place, we'll just need to create an app/images/ directory and then modify grunttasks/copy.js to copy everything from "images/*.{gif,jpeg,jpg,png}" into dist/ as well.

THEN we should be able to use the handy image-url('logo.png'); to correctly resolve image paths to /assets/images/logo.png using our Sass helpers.

Set up grunt watch task to restart Hapi server on server changes?

I tried hacking on grunt-hapi task to see if we could come up with a nice system of restarting the server on client/server changes, but it seems to be choking on "Error: listen EADDRINUSE" when the server restarts.

Not sure if there is a mis-config in my minimal Grunt tasks, or something deeper down the rabbit hole.

Rough prototype at https://github.com/pdehaan/grunt-examples/tree/master/grunt-hapi-example

$ grunt server
Running "hapi:async" (hapi) task

Running "watch" task
Waiting...
>> File "server/index.js" changed.
Running "hapi:async" (hapi) task

Done, without errors.

events.js:72
        throw er; // Unhandled 'error' event
              ^
Error: listen EADDRINUSE
    at errnoException (net.js:904:11)
    at Server._listen2 (net.js:1042:14)
    at listen (net.js:1064:10)
    at net.js:1146:9
    at asyncCallback (dns.js:68:16)
    at Object.onanswer [as oncomplete] (dns.js:121:9)
Completed in 0.702s at Thu Dec 18 2014 17:59:28 GMT-0800 (PST) - Waiting...

roll our own image proxy service

  • we want a richer view of history than just URLs, titles
  • we expect the scraper's analysis of the page will return 1 or a few image URLs
  • grab that image, resize/optimize it, host it somewhere (https and http accessible)
  • expose final proxied link, so we can store it in the DB (or something, TBD)
  • do we want this service to check referrers and disallow non-chronicle traffic?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.