Giter VIP home page Giter VIP logo

snowplow-indicative-relay's Introduction

Snowplow Indicative Relay

Build Status Release License

Snowplow Indicative Relay is an AWS Lambda function that reads Snowplow enriched events from a Kinesis Stream and transfers them to Indicative. It processes events in batches, whose size depends on your AWS Lambda configuration.

Detailed setup instructions, as well as more technical information, are provided on the wiki page.

Copyright and license

Snowplow is copyright 2018-2021 Snowplow Analytics Ltd.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

snowplow-indicative-relay's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

snowplow-indicative-relay's Issues

Add option to choose which field to use as event name for structured events

At the moment, Indicative event name is retrieved from enriched event's event_name (see this line). This field corresponds to the name of the event's schema (see this line).

In case of structured events, the value of this field is always "event" (see this line and this line).

We should make it possible to specify in the configuration of the lambda which field to use as event name when the value is just "event" (e.g. se_action or se_category).

Limit fields and contexts

One option to improve the performance of the relay would be to reduce the properties sent to what makes sense in an Indicative world:

app_id
platform
event
event_id
page_url
page_urlhost
page_urlpath
refr_medium
refr_term
page_referrer
refr_url
refr_urlhost
refr_urlpath
refr_source
page_title
mkt_medium
mkt_source
mkt_campaign
mkt_term
mkt_content
mkt_network
tr_orderid
tr_affiliation
tr_total
tr_tax
tr_shipping
tr_total_base
tr_tax_base
tr_shipping_base
tr_city
tr_state
tr_country
ti_orderid
ti_sku
ti_name
ti_category
ti_price
ti_price_base
ti_quantity
ti_currency
base_currency   
user_id
domain_userid
user_ipaddress
domain_sessionidx
network_userid
os_timezone
geo_country
geo_region
geo_city
geo_zipcode
geo_region_name
geo_timezone
br_family
br_name
br_type
br_lang
br_version
br_renderengine
os_name
os_family
os_manufacturer
useragent
dvce_type
event_vendor
se_action
se_property
se_label
se_category
se_value

As for contexts, it seems org.w3.PerformanceTiming can take a lot of space and is not useful in indicative.

Improve error logging

Errors are not easily searchable in logs. Cloudwatch logs offers both filtering by type and regex search, but currently the relay makes it v hard to find messsages for diagnosis.

It would be best to both add log levels, and introduce a searchable format to the error messages eg. "ERROR: some error description"/ "INFO: x events sent".

Refactor unit tests to use mutable Specification

The unit test suit currently uses org.specs2.Specification. However, this specification does not allow multiple assertions to be tested subsequently in the same spec. This means any test that does that, actually only passes the last assertion. For example:

def e9 = {
    val base = "a" -> Json.fromString(List.fill(20)("a").mkString)
    val js   = List(Json.obj(base))
    val (toSend, tooBig) =
      Transformer.constructBatches(Transformer.getSize _, Transformer.constructJson("a") _, js, 10, 10)
    toSend shouldEqual Nil
    tooBig shouldEqual js
  }

will pass if tooBig == js, regardless of whether toSend == Nil is true or false.

Refactor the suit, so it uses org.specs2.mutable.Specification, which allows for multiple assertions to be tested in the way we want.

Fix typo in user ID field name for mobile events

Every event in Indicative must have a user identifier (eventUniqueId).

The eventUniqueId is one of:

  • the user_id field in atomic.events, or if that is missing
  • the user_id field from the client_session context, or if that is missing
  • the domain_userid from atomic.events.

For documentation on user_id and domain_userid, see the canonical event model.

For documentation on client_session_userId, see the schema.

There is a typo that misspells client_session_userId as client_session_user_id, with the upshot that all mobile events sent through the relay are considered to not have a eventUniqueId unless a user_id field is also present.

val userId = extractField(flattenedEvent, "user_id")
      .leftFlatMap(_ => extractField(flattenedEvent, "client_session_user_id"))
      .leftFlatMap(_ => extractField(flattenedEvent, "domain_userid"))
      .toOption

should be

val userId = extractField(flattenedEvent, "user_id")
      .leftFlatMap(_ => extractField(flattenedEvent, "client_session_userId"))
      .leftFlatMap(_ => extractField(flattenedEvent, "domain_userid"))
      .toOption

Move to scalaj

hammock represents too much overhead for a lambda.

This is already done in #18

Limit payload size

In addition to a cap in the number of events, there is a cap in the total payload size.

This is done in #18.

Link_click events are being automatically excluded

Following Snowplow's out of the box configuration documentation, Indicative was not receiving link_click events, but was receiving page_view and impression events. I triggered link_click a few times to be certain it was firing. After editing the Lambda function to explicitly exclude one of the unused atomic fields, Indicative started receiving link click events.
Link_click events aren't listed in unused events, but seems to be excluded by default.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.