Giter VIP home page Giter VIP logo

snowplow-mini's Introduction

Snowplow Mini

Discourse posts Build Status Release License

An easily-deployable, single instance version of Snowplow that serves three use cases:

  1. Gives a Snowplow consumer (e.g. an analyst / data team / marketing team) a way to quickly understand what Snowplow "does" i.e. what you put in at one end and take out of the other
  2. Gives developers new to Snowplow an easy way to start with Snowplow and understand how the different pieces fit together
  3. Gives people running Snowplow a quick way to debug tracker updates (because they can)

Features

  • Data is tracked and processed in real time
  • Added Iglu Server to allow for custom schemas to be uploaded
  • Data is validated during processing
    • This is done using both our standard Iglu schemas and any custom ones that you have loaded into the Iglu Server
  • Data is loaded into Opensearch
    • Can be queried directly or through a Opensearch dashboard
    • Good and bad events are in distinct indexes
  • Create UI to indicate what is happening with each of the different subsystems (collector, enrich etc.), so as to provide developers a very indepth way of understanding how the different Snowplow subsystems work with one another

Note: Until version 0.15.0, Snowplow data was loaded to Elasticsearch 6.x in the Mini. However, a licensing change in Elasticsearch prevented us from upgrading it to more recent versions. To make sure we stay up to date with important security fixes, we've decided to replace Elasticsearch with Opensearch. Also, Kibana is replaced with Opensearch Dashboards. However, you may still encounter elasticsearch and kibana terms in the project.

Documentation

Cloud setup guides for AWS and GCP, in addition to a usage guide, are available at our docs website.

Local Quick Start

To run snowplow-mini on your local machine you will need to install the following pre-requisites:

Then you should be able stand up a snowplow-mini locally by then running:

$ git clone https://github.com/snowplow/snowplow-mini.git
  Cloning into 'snowplow-mini'...
$ cd snowplow-mini
$ vagrant up
  Bringing machine 'default' up with 'virtualbox' provider...

This will take a little time to complete, so grab yourself a โ˜•๏ธ and come back in a few minutes. See the troubleshooting section below if you encounter any errors.

Once complete, a Snowplow Collector will be running on http://localhost:8080 and the Snowplow Mini UI will be on http://localhost:2000/home.

To log in to the Snowplow Mini UI for the first time, follow the First time usage section within the documentation for the version of Snowplow Mini you have just created.

Once you are finished with Snowplow Mini locally, it is wise to stop the virtual machine:

$ vagrant halt
  ==> default: Attempting graceful shutdown of VM...

If you wish to tidy up all the resources, including deleting the virtual machine:

$ vagrant destroy
  default: Are you sure you want to destroy the 'default' VM? [y/N] y
  ==> default: Destroying VM and associated drives...

Vagrant Troubleshooting

Some advice on how to handle certain errors if you're trying to build this locally with Vagrant.

The box 'ubuntu/bionic64' could not be found or could not be accessed in the remote catalog.

Your Vagrant version is probably outdated. Use Vagrant 2.0.0+.

npm install results in enoent ENOENT: no such file or directory, open '/package.json'

This is caused by trying to use NFS. Comment the relevant lines in Vagrantfile.

Most likely this will happen on TASK [sp_mini_5_build_ui : Install npm packages based on package.json.] but see also: https://discourse.snowplowanalytics.com/t/snowplow-mini-local-vagrant/2930.

Topology

Snowplow Mini runs several distinct applications on the same box which are all linked by NSQ topics. In a production deployment each instance could be an Autoscaling Group and each NSQ topic would be a distinct Kinesis Stream.

  • Stream Collector:
    • Starts server listening on http://< sp mini public ip>/ which events can be sent to.
    • Sends "good" events to the RawEvents NSQ topic
    • Sends "bad" events to the BadEvents NSQ topic
  • Enrich:
    • Reads events in from the RawEvents NSQ topic
    • Sends events which passed the enrichment process to the EnrichedEvents NSQ topic
    • Sends events which failed the enrichment process to the BadEvents NSQ topic
  • Elasticsearch Sink Good:
    • Reads events from the EnrichedEvents NSQ topic
    • Sends those events to the good Elasticsearch index
    • On failure to insert, writes errors to BadElasticsearchEvents NSQ topic
  • Elasticsearch Sink Bad:
    • Reads events from the BadEvents NSQ topic
    • Sends those events to the bad Elasticsearch index
    • On failure to insert, writes errors to BadElasticsearchEvents NSQ topic

These events can then be viewed in Kibana at http://< sp mini public ip>/kibana.

topology

Copyright and license

Snowplow Mini is copyright 2016-present Snowplow Analytics Ltd.

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

snowplow-mini's People

Contributors

aldemirenes avatar benfradet avatar benjben avatar dilyand avatar eldarshamukhamedov avatar istreeter avatar jbeemster avatar jshbrntt avatar lmath avatar miike avatar oguzhanunlu avatar peel avatar pondzix avatar spenes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowplow-mini's Issues

Add check that tables exist and create them if not

From the structure of the shredded data it is possible to infer the required table name

e.g.:

Automate current user-data

We should keep user-data to just stuff that the user literally has to add (e.g. a private superuser API key for Iglu); the current user-data stuff looks like it could go live in Upstart or similar.

Add real-time statistics section to UI

This will be a sidebar style box which will refresh every N seconds and will tell us:

  • State of the collector (OK or N/A)
  • State of elasticsearch
  • Count of good and bad events

This will be especially nice when coupled with the Example events page as you will see the counts increment straightaway.

Create AMI

Only do this if it looks like creating the docker image will be a lot of work

Remove Dockerfile

The 0.1.0 approach of one Dockerfile bringing up all the apps isn't relevant for 0.2.0

Extend "Example events" tab

Rename from "Example events" to "Tracking"

Break this into three sections:

Sending events with JavaScript Tracker

  • Existing buttons for some pre-manufactured events
  • Dynamic console-style box where you can type in your own event
  • Link to JS Tracker documentation

Sending events with Objective-C Tracker

  • Just a static code excerpt
  • Link to Obj-C Tracker documentation

Sending events with Android Tracker

  • Just a static code excerpt
  • Link to Android Tracker documentation

Add script to automatically populate Iglu Server from local directory

The basic idea is when initializing Snowplow Mini you specify:

  • A git repo
  • A branch or tag within that repo
  • Git authentication details

And then Snowplow Mini will populate the local Iglu server from that git repo on initialization.

This will be particularly useful for developers wanting to test with schemas which aren't yet published to their main Iglu repo.

Add Caddy proxy to add embedded TLS support

Hi.

This is more a feature request than an issue itself. My website uses SSL (including homolog and staging environments), and I would like to use Snowplow-Mini as the collector for those internal environments.

Besides setting up SSL termination, is there any plan to add out-of-the-box SSL support for the collector? Something like https://letsencrypt.org/ could be used.

Thanks a lot,

Add management and overview UI for Snowplow Mini

As there are a lot of services running under the hood it would be good to be able to show the flow of information and the state of the services running that do not have an easy to digest UI.

ui-example

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.