Giter VIP home page Giter VIP logo

bigquery-analytics's Introduction

Big Query queries

Repo contains all our queries used on BQ.
This repo is essentially for backing up all the Quries used, alongside with base queries and archived queries.
.vscode workspace config file is also saved. Feel free to use it or not.

Repo structure and descriptions

  • archived (Old queries that are no longer used in production, but these are still kept for reference use only)

    • AKM (App Key Metrics)
      • This folder contains all the queries that are used for generating the data used in the app key metrics spread sheet.
    • Base Queries
      • Base Queries that are archived
      • Mainly base queries for the User Database, that was used before the new firestore data import method was used.
      • This queries works with the Firestore database schema created with the use of the old manual data insertion method with the cron job code in "pipeline/"
      • As the data from firestore is now automatically streamed in as firebase data uses BQ as its data warehouse, the schema has drastically changed to a form where the NoSQL is automatically flattened by BQ to work in the RDBMS. More info on the schema is available online.
    • Descrepancy
      • This folder contains Queries that were used to see why is there data descrepancies between Firebase analytics reporting and the data retrieved via BQ Queries.
      • Alot of these have hardcoded date/time values, which makes them unusable now.
    • Weekly Cohort Retention
      • This folder contains the queries used for doing Weekly cohort retention in a now replaced way.
    • Others
      • Other queries here and there that are no longer used.
  • base_queries

    • Contains all the base queries that can be built upon to create more complex queries.
    • Most of these queries can be used for chaining purposes, as they can be built into a "WITH" clause.
  • current

    • Contains all the queries currently used in production.
    • Also contains all the queries that are being worked on right now.
    • Similiar queries in this directory may also be grouped together too.

Calculating cohort using BigQuery

Note on the data for queries like Cohort analysis:
Although we are able to get the same numbers as those shown in firebase analytics and events dashboard, we will not be following their numbers as they differ from our calculated values due the difference in data filters. However, even though the numbers are not the same despite being pretty close as firebase, the month to month difference in terms of percentage and all should still present to us the same set of "growth" numbers. Like if there is a 10% user base growth shown on firebase analytics, it should be around the same too for our values.
All time and date data needed should be extracted from a timestamp value. And all timestamp values should be in UTC format for consistency. Timestamps that are available for use in the events database are:

  • event_timestamp
  • user_first_touch_timestamp
  • (Not recommended) user_properties.value.int_value
    • This holds the value for the first_open timestamp as part of every single events' user property.
  • (Not recommended) event_date
    • "event_date" is not recommended because it is based on the locale of the user's device. Meaning that there is no fix timezone used across all the events, which makes clustering by dates/times difficult and erroneous.

For event time or event date, always use the "event_timestamp" value. To get the event date from the timestamp value, use the query below:

DATE(TIMESTAMP_MICROS(event_timestamp)

To get the user's first_open_time, use the "user_first_touch_timestamp" value. To get the date from the timestamp, use the query below:

DATE(TIMESTAMP_MICROS(user_first_touch_timestamp))

Count the number of users by distince "user_pseudo_id" because for events like "first_open" there will be no real user_id available yet, thus firebase will automatically generate a unique value to use as the "user_pseudo_id" for identification

COUNT(distinct user_pseudo_id)

For most of the analysis, especially cohort retention analysis, we would want to exclude the manual installs, as these are installation / users that are created when we do app testing, and thus should not count towards the actual stats. To Filter out manual installs:

WHERE
  app_info.install_source NOT LIKE "%manual%"

To remove reinstall numbers:

-- @TODO to be implemented
WHERE
  event.params.value > 1

bigquery-analytics's People

Contributors

jaimeloeuf avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.