Giter VIP home page Giter VIP logo

snowplow-golang-analytics-sdk's Introduction

Snowplow Golang Analytics SDK

Build Status Coveralls Go Report Card Release License

snowplow-logo

Snowplow is a scalable open-source platform for rich, high quality, low-latency data collection. It is designed to collect high quality, complete behavioural data for enterprise business.

Note

Due to issues in the release process, v0.2.2 should be used in favour of v0.2.0 or v0.2.1.

Snowplow Pipeline Overview

snowplow-pipeline

The Snowplow trackers enable highly customisable collection of raw, unopinionated event data. The pipeline validates these events against a JSONSchema - to guarantee a high quality dataset - and adds information via both standard and custom enrichments.

This data is then made available in-stream for real-time processing, and can also be loaded to blob storage and data warehouse for analysis.

The Snowplow atomic data acts as an immutable log of all the actions that occurred across your digital products. The analytics SDKs are libraries in a range languages which facilitate working with Snowplow Enriched data, by transforming it from its original TSV format to a more amenable format for programmatic interaction - for example JSON.

Quickstart

go get github.com/snowplow/snowplow-golang-analytics-sdk
main.go
package main

import (
    "fmt"

    "github.com/pkg/errors"

    "github.com/snowplow/snowplow-golang-analytics-sdk/analytics"
)

var (
    event      = `test-data1	pc	2019-05-10 14:40:37.436	2019-05-10 14:40:35.972	2019-05-10 14:40:35.551	unstruct	e9234345-f042-46ad-b1aa-424464066a33			py-0.8.2	ssc-0.15.0-googlepubsub	beam-enrich-0.2.0-common-0.36.0	user<built-in function input>	18.194.133.57				d26822f5-52cc-4292-8f77-14ef6b7a27e2																																									{"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/add_to_cart/jsonschema/1-0-0","data":{"sku":"item41","quantity":2,"unitPrice":32.4,"currency":"RON"}}}																			python-requests/2.21.0																																										2019-05-10 14:40:35.000			{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:nl.basjes/yauaasd_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}},{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}}, {"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}}]}		2019-05-10 14:40:35.972	com.snowplowanalytics.snowplow	add_to_cart	jsonschema	1-0-0		`
    valueToGet = `platform`
)

func main() {
    // parse the enriched event string
    parsedEvent, err := analytics.ParseEvent(event)
    if err != nil {
        fmt.Println(errors.Errorf(`error parsing event: %v`, err))
        return  
    }

    // Get specific value from event
    _, err = parsedEvent.GetValue(valueToGet)
    if err != nil {
        fmt.Println(errors.Errorf(`error getting value %s from event: %v`, valueToGet, err))
        return
    }
    
    // Get object in JSON format
    _, err = parsedEvent.ToJson()
    if err != nil {
        fmt.Println(errors.Errorf(`error converting parsed event to JSON: %v`, err))
        return
    }
    
    // Get object in map format
    _, err = parsedEvent.ToMap()
    if err != nil {
        fmt.Println(errors.Errorf(`error converting parsed event to map: %v`, err))
        return
    }
    
    // Get a JSON of values for a set of canonical fields
    _, err = parsedEvent.GetSubsetJson("page_url", "unstruct_event")
    if err != nil {
        fmt.Println(errors.Errorf(`error getting subset JSON: %v`, err))
        return
    }
    
    // Get a map of values for a set of canonical fields
    _, err = parsedEvent.GetSubsetMap("page_url", "domain_userid", "contexts", "derived_contexts")
    if err != nil {
        fmt.Println(errors.Errorf(`error getting subset map: %v`, err))
        return
    }
    
    // Get a value from all contexts using its path
    _, err = parsedEvent.GetContextValue(`fieldToRetrieve`, `subfieldToRetrieve`, 1) // context.fieldToRetrieve.subfieldToRetrieve[1]
    if err != nil {
        fmt.Println(errors.Errorf(`error getting context value: %v`, err))
        return
    }
    
    // Get a value from the unstruct_event field using its path
    _, err = parsedEvent.GetContextValue(`snowplow_add_to_cart_1`, `currency`, 0) // unstruct_event.snowplow_add_to_cart_1.currency[0]
    if err != nil {
        fmt.Println(errors.Errorf(`error getting unstruct_event value: %v`, err))
        return
    }
}

API

func ParseEvent(event string) (ParsedEvent, error)

ParseEvent takes a Snowplow Enriched event tsv string as input, and returns a 'ParsedEvent' typed slice of strings. Methods may then be called on the resulting ParsedEvent type to transform the event, or a subset of the event to Map or Json.

func (event ParsedEvent) ToJson() ([]byte, error)

ToJson transforms a valid Snowplow ParsedEvent to a JSON object.

func (event ParsedEvent) ToMap() (map[string]interface{}, error)

ToMap transforms a valid Snowplow ParsedEvent to a Go map.

func (event ParsedEvent) GetSubsetJson(fields ...string) ([]byte, error)

GetSubsetJson returns a JSON object containing a subset of the event, containing only the atomic fields provided, without processing the rest of the event. For custom events and contexts, only "unstruct_event", "contexts", or "derived_contexts" may be provided, which will produce the entire data object for that field. For contexts, the resultant map will contain all occurrences of all contexts within the provided field.

func (event ParsedEvent) GetSubsetMap(fields ...string) (map[string]interface{}, error)

GetSubsetMap returns a map of a subset of the event, containing only the atomic fields provided, without processing the rest of the event. For custom events and contexts, only "unstruct_event", "contexts", or "derived_contexts" may be provided, which will produce the entire data object for that field. For contexts, the resultant map will contain all occurrences of all contexts within the provided field.

func (event ParsedEvent) GetValue(field string) (interface{}, error)

GetValue returns the value for a provided atomic field, without processing the rest of the event. For unstruct_event, it returns a map of only the data for the unstruct event.

func (event ParsedEvent) ToJsonWithGeo() ([]byte, error)

ToJsonWithGeo adds the geo_location field, and transforms a valid Snowplow ParsedEvent to a JSON object.

func (event ParsedEvent) ToMapWithGeo() (map[string]interface{}, error)

ToMapWithGeo adds the geo_location field, and transforms a valid Snowplow ParsedEvent to a Go map.

func (event ParsedEvent) GetUnstructEventValue(path ...interface{}) (interface{}, error) {

GetUnstructEventValue gets a value from a parsed event's unstruct_event using it's path (example1[0].example2).

func (event ParsedEvent) GetContextValue(contextName string, path ...interface{}) (interface{}, error) {

GetContextValue gets a value from a parsed event's contexts using it's path (contexts_example_1.example[0])

Copyright and license

Snowplow Golang Analytics SDK is copyright 2021 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

snowplow-golang-analytics-sdk's People

Contributors

colmsnowplow avatar tiganetearobert avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snowplow-golang-analytics-sdk's Issues

Explore API change/addition to improve Getter performance

Currently the Get Methods require one to parse the entire json string before calling a method on the resultant object.

There may be ways to facilitate a more performant means of achieving the use case where one cares about operating on only the subset of data that Get* is concerned with.

One option is to explore whether it's faster to find an index in the tsv string and extract only the value from the tsv string directly. If so, an additional API method to do so might be a nice addition to the project.

Extend Get* methods to allow getting fields from self-describing data

Currently we can only get the entire object for unstruct_event, contexts or derived_contexts values in the Getter functions.

We could potentially leverage jsoniter's Get function to allow one to extract fields from the self-describing JSON without needing to process the entire object .

We might consider specific methods for this kind of Get if it becomes too complicated/confusing an interface. Especially considering that contexts are arrays of JSON.

Update json-iterator

There's a new patch version for json-iterator. Since we're almost ready to move from beta to prod, might as well bump it before release.

Consider using gjson

We chose json-iterator to parse JSON in this project because it's much faster than encoding/json for our purposes, and it offers a means to grab data directly from json without needing to unmarshall the whole thing - which is key to performance for unstruct and context fields.

The API for the latter of these, however, is awkward - it expects a list of interface arguments. So when we're using it in customer-facing applications, we must either have an unintuitive configuration, or parse the configuration in a way that's not easily and reliably done.

GJson seems to offer a solution to this latter problem, by allowing jsonpath dot notation syntax (or something very similar at least). This could make the Get functions much more straightforward to work with: https://github.com/tidwall/gjson#path-syntax

It also claims having benchmarked better performance than json-iterator - however I'm hesitant to immediately accept that this means that it would be more performant for our purposes at face value. The characteristics of the job we need it to do may well differ from what those benchmarks are based on.

I suggest that we:

  1. explore the utility of gjson for the Get methods, and benchmark the performance vs. current implementation. If it offers a better API without a big performance penalty, we should use it for that purpose.

  2. Benchmark gjson vs. current implementation across the other tasks in this project, and make a decision as to what we should use moving forward based on those benchmarks.

Ensure unicode characters are interpreted as desired

In #4 we made a change which we believed would result in unicode characters like < being escaped. It turns out this was incorrect, because of a testing error.

We should return to this issue, but I now have new questions about it:

  • Should we keep escaping html characters, or revert that config?
  • Should we care that unicode characters are represented as unicode, if that's how go natively treats them?

Fix error in tests

We enabled this configuration, thinking that it would make unicode characters more consistent with the other analytics SDKs.

This turned out not to be the case, a fact which we missed because we're using the default JSON library in our tests, which also leaves chevrons as unicode.

We should address this, and use comparisons with hardcoded values to ensure that we don't make similar mistakes in future.

Explore efficiency improvements to GetContextValue

In GetContextValue(), we begin with two JSON strings, we parse them into a slice of maps, we then marshal the maps back to json and get the values we're looking for using jsoniter.

I suspect we might be able to find an efficiency improvement on this along the lines of this:

we start with two strings which look like this:

{
  "schema": "iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0",
  "data": [
    {
      "schema": "iglu:org.schema/WebPage/jsonschema/1-0-0",
      "data": {
        "genre": "blog",
        "inLanguage": "en-US",
        "datePublished": "2014-11-06T00:00:00Z",
        "author": "Fred Blundun",
        "breadcrumb": [
          "blog",
          "releases"
        ],
        "keywords": [
          "snowplow",
          "javascript",
          "tracker",
          "event"
        ]
      }
    },
    {
      "schema": "iglu:org.w3/PerformanceTiming/jsonschema/1-0-0",
      "data": {
        "navigationStart": 1415358089861,
        "unloadEventStart": 1415358090270,
        "unloadEventEnd": 1415358090287,
        "redirectStart": 0,
        "redirectEnd": 0,
        "fetchStart": 1415358089870,
        "domainLookupStart": 1415358090102,
        "domainLookupEnd": 1415358090102,
        "connectStart": 1415358090103,
        "connectEnd": 1415358090183,
        "requestStart": 1415358090183,
        "responseStart": 1415358090265,
        "responseEnd": 1415358090265,
        "domLoading": 1415358090270,
        "domInteractive": 1415358090886,
        "domContentLoadedEventStart": 1415358090968,
        "domContentLoadedEventEnd": 1415358091309,
        "domComplete": 0,
        "loadEventStart": 0,
        "loadEventEnd": 0
      }
    }
  ]
}

The parse functions currently parse these into these objects:

type SelfDescribingData struct {
	Schema string
	Data   map[string]interface{} // TODO: See if leaving data as a string or byte array would work, and would be faster.
}

type Contexts struct {
	Schema string
	Data   []SelfDescribingData
}

Instead of this, if we leave SelfDescribingData.Data as a string, we can avoid parsing things into a map and back to a JSON.

If it makes sense, we could modify the existing SelfDescribingData object (and related code) - and possibly also find efficiency improvements in the toJSON functions too. This may be complicated however, so we could also just create a separate object and a parse function specific to GetContexts for this purpose.

Fix errors made during release 0.2.0 and 0.2.1

A 0.2.0 tag was created in error on a working branch, and the go cache now contains incorrect code for 0.2.0.

To correct this, we should commit a note on the README to specify that 0.2.1 should be used instead of 0.2.0, update the 0.2.0 release notes to mention this, and push a new 0.2.1 release.

Use JSONEq in tests

Could have saved ourselves some work by just reading the docs. We can make our tests better by using the JSONEq method to test equivalence of JSON strings.

Make `GetContextValue()` and `GetUnstructValue()` behaviour consistent when field does not exist

GetUnstructValue() returns an error when the path provided doesn't exist, whereasGetContextValue() returns nil with no error.

I believe these were simply two API design choices that were made independently, and there's good reason for these in isolation to behave this way - for contexts, there can be several of the same context - some of which may contain an optional field whereas some don't. In that case, we don't want to error if we find one but not the other.

We should reconsider whether or not this is the best design, since it's unintuitive - one would expect these to behave in something of a consistent manner.

Fix parsing error when self describing data is not a map

Currently, the type assumed for Data in SelfDescribingData struct is a map.
This results in error (Error unmarshaling context JSON) when trying to parse unstruct events or contexts having a different type (e.g. json array) for their data.
Error can be reproduced in tests when replacing the contextString with something like:

var contextsString = `{"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-0","data":[{"schema":"iglu:com.acme.test/testing/jsonschema/1-0-0","data":["aa","bb"]}]}`

Disable html escaping for as-is extraction of unicode characters

At least when printing output to console, unicode characters don't seem to be rendered appropriately after transformation.

For example, < appears as \u003c.

Since the output object for JSON is bytes in the first place, this mightn't be an issue. But we should investigate whether or not it is for both JSON and map outputs.

Bump dependencies

A fix in Go 1.18 breaks jsoniter because of its dependencies. They have patched this: json-iterator/go@024077e

If there are no complications and it's easy, we should bump our dependencies in 0.3.0. If that does present challenges we should release 0.3.0 and bump deps in 0.3.1.

Then, we should update tests as per #30

Introduce safer release process

In most of our projects, CI/CD checks that versions and tags are correct before deploying releases. We haven't done so on this repo up to now, because Go automatically picks up the release based on the tag itself, rather than having us deploy ourselves.

This leaves the release process open to errors which can be messy. I think we can address this and introduce a version check with a change to the release process:

  • Add a VERSION file
  • Once PR is merged, developer creates a tag of the format X.X.X (eg. 0.2.0) - since Go requires format vX.X.X this won't get published to the go cache
  • CI/CD runs on master only, checks that tag version == VERSION file
  • If version is correct, CD tags latest commit as vVERSION

This would only run on the master branch, if we wish to publish a pre-release from a branch we can do so by manually tagging the branch with format vX.X.X-beta.1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.