Giter VIP home page Giter VIP logo

fireant's Introduction

FireAnt - Analytics and Reporting

BuildStatus CoverageStatus Codacy Docs PyPi License

fireant is a a data analysis tool used for quickly building charts, tables, reports, and dashboards. It defines a schema for configuring metrics and dimensions which removes most of the leg work of writing queries and formatting charts. fireant even works great with Jupyter notebooks and in the Python shell providing quick and easy access to your data.

Installation

To install fireant, run the following command in the terminal:

pip install fireant

Introduction

fireant arose out of an environment where several different teams, each working with data sets often with crossover, were individually building their own dashboard platforms. fireant was developed as a centralized way of building dashboards without the legwork.

fireant is used to create configurations of data sets using DataSet which backs a database table containing analytics and defines sets of Field. A Field can be used to group data by properties, such as a timestamp, an account, a device type, etc, or to render quantifiers such as clicks, ROI, conversions into a widget such as a chart or table.

A DataSet exposes a rich builder API that allows a wide range of queries to be constructed that can be rendered as several widgets. A DataSet can be used directly in a Jupyter notebook, eliminating the need to write repetitive custom queries and render the data in visualizations.

Data Sets

DataSet are the core component of fireant. A DataSet is a representation of a data set and is used to execute queries and transform result sets into widgets such as charts or tables.

A DataSet requires only a couple of definitions in order to use: A database connector, a database table, join tables, and dimensions and metrics. Metrics and Dimension definitions tell fireant how to query and use data in widgets. Once a dataset is created, it's query API can be used to build queries with just a few lines of code selecting which dimensions and metrics to use and how to filter the data.

Instantiating a Data Set

from fireant.dataset import *
from fireant.database import VerticaDatabase
from pypika import Tables, functions as fn

vertica_database = VerticaDatabase(user='myuser', password='mypassword')
analytics, customers = Tables('analytics', 'customers')

my_dataset = DataSet(
    database=vertica_database,
    table=analytics,
    fields=[
        Field(
            # Non-aggregate definition
            alias='customer',
            definition=customers.id,
            label='Customer'
        ),
        Field(
            # Date/Time type, also non-aggregate
            alias='date',
            definition=analytics.timestamp,
            type=DataType.date,
            label='Date'
        ),
        Field(
            # Text type, also non-aggregate
            alias='device_type',
            definition=analytics.device_type,
            type=DataType.text,
            label='Device_type'
        ),
        Field(
            # Aggregate definition (The SUM function aggregates a group of values into a single value)
            alias='clicks',
            definition=fn.Sum(analytics.clicks),
            label='Clicks'
        ),
        Field(
            # Aggregate definition (The SUM function aggregates a group of values into a single value)
            alias='customer-spend-per-clicks',
            definition=fn.Sum(analytics.customer_spend / analytics.clicks),
            type=DataType.number,
            label='Spend / Clicks'
        )
    ],
    joins=[
        Join(customers, analytics.customer_id == customers.id),
    ],

Building queries with a Data Set

Use the query property of a data set instance to start building a data set query. A data set query allows method calls to be chained together to select what should be included in the result.

This example uses the data set defined above

from fireant import Matplotlib, Pandas, day

 matplotlib_chart, pandas_df = my_dataset.data \
      .dimension(
         # Select the date dimension with a daily interval to group the data by the day applies to
         # dimensions are referenced by `dataset.fields.{alias}`
         day(my_dataset.fields.date),

         # Select the device_type dimension to break the data down further by which device it applies to
         my_dataset.fields.device_type,
      ) \
      .filter(
         # Filter the result set to data to the year of 2018
         my_dataset.fields.date.between(date(2018, 1, 1), date(2018, 12, 31))
      ) \
      # Add a week over week reference to compare data to values from the week prior
      .reference(WeekOverWeek(dataset.fields.date))
      .widget(
         # Add a matpotlib chart widget
         Matplotlib()
            # Add axes with series to the chart
            .axis(Matplotlib.LineSeries(dataset.fields.clicks))

            # metrics are referenced by `dataset.metrics.{alias}`
            .axis(Matplotlib.ColumnSeries(
                my_dataset.fields['customer-spend-per-clicks']
            ))
      ) \
      .widget(
         # Add a pandas data frame table widget
         Pandas(
             my_dataset.fields.clicks,
             my_dataset.fields['customer-spend-per-clicks']
         )
      ) \
      .fetch()

 # Display the chart
 matplotlib_chart.plot()

 # Display the chart
 print(pandas_df)

License

Copyright 2020 KAYAK Germany, GmbH

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Crafted with ♥ in Berlin.

fireant's People

Contributors

aweller avatar azisk avatar dependabot[bot] avatar gl3nn avatar mikeengland avatar robinpapke avatar twheys avatar wahwynn avatar x8lucas8x avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fireant's Issues

Add L1 and L2 Loss operations

Create the first post-operations for L1 and L2 Loss. These should replace the TODO in the slicer manager data function.

Adding a metric to an operation not specified in the metrics array throws an exception

Example:

my_slicer.highcharts.line_chart(
	metrics=[['metric1']], 
	dimensions=['date'], 
	dimension_filters=[
		RangeFilter('date', start=date.today() - timedelta(days=3), stop=date.today())
	],
	operations=[L1Loss('metric1', 'metric2')],
	)


Exception:
Traceback (most recent call last):
File "<input>", line 7, in <module>
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/managers.py", line 420, in _get_and_transform_data
return tx.transform(dataframe, display_schema)
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 78, in transform
series = self._make_series(dataframe, dim_ordinal, display_schema)
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 110, in _make_series
for i, (idx, item) in enumerate(dataframe.iteritems())]
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 110, in <listcomp>
for i, (idx, item) in enumerate(dataframe.iteritems())]
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 115, in _make_series_item
'name': self._format_label(idx, dim_ordinal, display_schema, reference),
File "/Users/user/.venv/dashmore/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 161, in _format_label
metric_label = metric['label']
TypeError: string indices must be integers

Equality Filter on a dimension is not returning any chart data

Example

import json
   res = _slicer.highcharts.line_chart(
       metrics=['clicks'],
       dimensions=['date'],
       dimension_filters=[RangeFilter('date', date.today() - timedelta(days=14), date.today()),
                          EqualityFilter('column', EqualityOperator.eq, 'value')],
   )
   print(res)
   print(json.dumps(res))

Result:

{'xAxis': {'type': 'linear'}, 'chart': {'type': 'line'}, 'title': {'text': None}, 'yAxis': [{'title': None}], 'tooltip': {'shared': True}, 'series': [{'yAxis': 0, 'data': [], 'name': 'clicks', 'dashStyle': 'Solid'}]} {"xAxis": {"type": "linear"}, "chart": {"type": "line"}, "title": {"text": null}, "yAxis": [{"title": null}], "tooltip": {"shared": true}, "series": [{"yAxis": 0, "data": [], "name": "clicks", "dashStyle": "Solid"}]}

Timestamp is not JSON serializable error when rendering a row_index_table

When running code very similar to the example for rendering a row_index_table and attempting to output this as JSON data, an error is raised:

...raise TypeError(repr(o) + " is not JSON serializable") TypeError: Timestamp('2016-06-12 00:00:00') is not JSON serializable

Code causing the error:
slicer.manager.row_index_table( metrics=['clicks', 'revenue'], dimensions=['date', 'device_type'], dimension_filters=[RangeFilter('date', date.today() - timedelta(days=60), date.today())] )

Example Python dictionary returned from the above FireAnt statement (Note the Pandas Timestamp data type which cannot be converted to JSON):

{'Clicks': 123, 'DT_RowId': 'row_171', 'Date': Timestamp('2016-08-02 00:00:00'), 'Device Type': 'Tablet', 'Revenue': 123}, {'Clicks': 123, 'DT_RowId': 'row_172', 'Date': Timestamp('2016-08-03 00:00:00'), 'Device Type': 'Tablet', 'Revenue': 123},

Rollup function is not working for any transformer type

Returning data when using the Rollup operation for any transformer type e.g. column_index_table or line_chart results in an error.

Example call:
print(json.dumps(slicer.manager.line_chart( metrics=['clicks'], dimensions=['date'], dimension_filters=[RangeFilter('date', date.today() - timedelta(days=60), date.today())], operations=[Rollup(['date'])] )))

Error message:
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

Add request validation to transformer managers

Add validation for requests to the transformer managers so that exceptions are thrown for invalid requests before the query is executed:

Line Charts:
Require 1 continuous dimension as the first dimension

Bar/Column Charts:
Require maximum 2 dimensions+1 metric or 1 dimensions+n metrics

This should apply to both matplotlib and highcharts.

Refactor Reference

Refactor references so they can be extended externally. Use an interval property which is passed to the query builder.

Specifying Totals for a date dimension is raising an exception

Example call:
res = slicer.highcharts.line_chart( metrics=['clicks'], dimensions=['date'], dimension_filters=[RangeFilter('date', date.today() - timedelta(days=14), date.today())], operations=[Totals('date')] )

Traceback (most recent call last): operations=[Totals('date')] File "/lib/python3.4/site-packages/fireant/slicer/managers.py", line 279, in _get_and_transform_data return tx.transform(df, display_schema) File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 47, in transform series = self._make_series(data_frame, dim_ordinal, display_schema) File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 90, in _make_series for idx, item in data_frame.iteritems()] File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 90, in <listcomp> for idx, item in data_frame.iteritems()] File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 95, in _make_series_item 'data': self._format_data(item), File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 168, in _format_data for key, value in column.iteritems() File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 169, in <listcomp> if not np.isnan(value)] File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 173, in _format_point return (_format_data_point(x), _format_data_point(y)) File "/lib/python3.4/site-packages/fireant/slicer/transformers/highcharts.py", line 13, in _format_data_point if np.isnan(value): TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'

Using totals throws exception

All requests using totals results in an exception

File "/Users/llira/.virtualenvs/dashmore/lib/python3.4/site-packages/fireant/slicer/managers.py", line 395, in _get_and_transform_data
   display_schema = self.manager.display_schema(metrics, dimensions, references, operations)
 File "/Users/llira/.virtualenvs/dashmore/lib/python3.4/site-packages/fireant/slicer/managers.py", line 153, in display_schema
   'metrics': self._display_metrics(metrics, operations),
 File "/Users/llira/.virtualenvs/dashmore/lib/python3.4/site-packages/fireant/slicer/managers.py", line 324, in _display_metrics
   metric_key = getattr(operation, 'metric_key')
AttributeError: 'Totals' object has no attribute 'metric_key'

notebooks.row_index_table can't handle empty dataframes

I have a slicer that returns an empty dataframe (as expected) when called with

my_slicer.manager.data()

but when called with

my_slicer.notebooks.row_index_table()

it throws an Exception as follows:

(some parts of the stacktrace removed)

/Users/aweller/.virtualenvs/antfarm-strategies34/lib/python3.4/site-packages/fireant/slicer/managers.py in _get_and_transform_data(self, tx, metrics, dimensions, metric_filters, dimension_filters, references, operations)
    464         display_schema = self.manager.display_schema(metrics, dimensions, references, operations)
    465 
--> 466         return tx.transform(dataframe, display_schema)

/Users/aweller/.virtualenvs/antfarm-strategies34/lib/python3.4/site-packages/fireant/slicer/transformers/notebooks.py in transform(self, dataframe, display_schema)
     15 class PandasRowIndexTransformer(Transformer):
     16     def transform(self, dataframe, display_schema):
---> 17         dataframe = self._set_display_options(dataframe, display_schema)
     18         if display_schema['dimensions']:
     19             dataframe = self._set_dimension_labels(dataframe, display_schema['dimensions'])

/Users/aweller/.virtualenvs/antfarm-strategies34/lib/python3.4/site-packages/fireant/slicer/transformers/notebooks.py in _set_display_options(self, dataframe, display_schema)
     37 
     38                 if isinstance(dataframe.index, pd.MultiIndex):
---> 39                     dataframe.index.set_levels(display_values, key, inplace=True)
     40 
     41                 else:

/Users/aweller/.virtualenvs/antfarm-strategies34/lib/python3.4/site-packages/pandas/indexes/multi.py in set_levels(self, levels, level, inplace, verify_integrity)
    236             if not is_list_like(levels):
    237                 raise TypeError("Levels must be list-like")
--> 238             if is_list_like(levels[0]):
    239                 raise TypeError("Levels must be list-like")
    240             level = [level]

IndexError: list index out of range

ContainsFilter is raising an exception

The ContainsFilter is raising an exception when it is used.

Example call:
res = searchads_slicer.highcharts.line_chart(
metrics=['clicks'],
dimensions=['date'],
dimension_filters=[RangeFilter('date', date.today() - timedelta(days=14), date.today()),
ContainsFilter('device_type', ['c'])],
)

print(res)

print(json.dumps(res))```

Error:

Traceback (most recent call last): File "", line 205, in <module> ContainsFilter('device_type', ['c', 't'])], File "/site-packages/fireant/slicer/managers.py", line 280, in _get_and_transform_data references=references, operations=operations) File "/site-packages/fireant/slicer/managers.py", line 69, in data references=references, operations=operations) File "site-packages/fireant/slicer/managers.py", line 103, in query_schema self._default_dimension_definition), File "site-packages/fireant/slicer/managers.py", line 205, in _filters_schema filters_schema.append(f.schemas(definition)) File "site-packages/fireant/slicer/filters.py", line 58, in schemas return element.isin(self.values) AttributeError: 'Coalesce' object has no attribute 'isin'

Column Indexed table contains the wrong number of columns

As per the example at http://fireant.readthedocs.io/en/latest/2_slicer.html, running the following code is returning three columns instead of the expected seven.

A column-indexed table will contain only one index column and display a metrics column for each combination of subsequent dimensions. For example with the same parameters as above, the result will include seven columns: Day, Clicks (Desktop), Clicks (Tablet), Clicks (Mobile) and Conversions (Desktop), Conversions (Tablet), and Conversions (Mobile).

slicer.manager.column_index_table( metrics=['clicks', 'conversions'], dimensions=['date', 'device_type'] )

This is however returning three columns: Day, Clicks (clicks) and Conversions (conversions).

Having zero dimensions breaks when trying to render a chart

In fireant/slicer/transformers/highcharts.py an exception is thrown on slicer.dimensions[dimensions[0]] when there are no dimensions on a slicer.

        from fireant.slicer import ContinuousDimension
        dimension0 = slicer.dimensions[dimensions[0]] ...
        if not dimensions or not isinstance(dimension0, ContinuousDimension):
            raise TransformationException('Highcharts line charts require a continuous dimension as the first '
                                          'dimension.  Please add a continuous dimension from your Slicer to '
                                          'your request.')

Error installing Fireant from Github

An exception is being raised

Collecting fireant[vertica] from git+https://github.com/kayak/fireant.git@master#egg=fireant[vertica] (from -r bandit/requirements.txt (line 51))
  Cloning https://github.com/kayak/fireant.git (to master) to /private/var/folders/k4/3gh_7drn1t78lh8hzlz8w6t08hdy18/T/pip-build-8g2fcnjb/fireant
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/private/var/folders/k4/3gh_7drn1t78lh8hzlz8w6t08hdy18/T/pip-build-8g2fcnjb/fireant/setup.py", line 36, in <module>
        long_description=readme(),
      File "/private/var/folders/k4/3gh_7drn1t78lh8hzlz8w6t08hdy18/T/pip-build-8g2fcnjb/fireant/setup.py", line 12, in readme
        return f.read()
      File "/Users/mike/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 7399: ordinal not in range(128)```

Spaces in metric labels causes exception when rendering a row or column_index_table in Datatables.net

The JSON returned when using the row_index_table or column_index_table managers returns data in a label: value format.

This causes issues with the datatables.net API as space characters may appear in the column name e.g. 'exchange rate'. When mapping the columns to a table in datatables, having spaces in the column name causes the data table to raise an exception as can be seen from the fizz buzz column below.

Example JSON returned from FireAnt:

{ "draw": 1, "data": [ { "foo": 12.1111 "bar": "1.1", "fizz buzz": 123, "DT_RowId": "row_0" }, { "foo": 12.1111 "bar": "1.1", "fizz buzz": 123, "DT_RowId": "row_1" }, ], "recordsFiltered": 2, "recordsTotal": 2 }

Example of column indexes to the columns found in the data:
$(#test).DataTable({ displayLength: 25, search: false, lengthMenu: [[10, 25, 50, 100, -1], [10, 25, 50, 100, 'All']], orderCellsTop: true, filtering: true, destroy: true, data: data, order: [0, 'desc'], columns: [ {data: 'foo'}, {data: 'bar'}, {data: 'fizz buzz'} ], ...

The exception returned is jquery.js:8264 Uncaught (in promise) TypeError: Cannot convert a Symbol value to a string(…).

There are a few solutions to how we could overcome this.

  1. Ensure each JSON key for a column value has no spaces e.g. "fizz_buzz": 5. Users will have to work out a human friendly name for each column's header programatically.
  2. Ensure each JSON key for a column value has no spaces e.g. "fizz_buzz": 5, but a label field in the nested data that a user can use to dynamically create a header column with a friendly name. e.g. "fizz_buzz": {"label": "Fizz Buzz", "value": 5}. Example at: https://datatables.net/examples/ajax/orthogonal-data.html
  3. Return data using an array of arrays instead of an array of objects. Then the user can either work out what header should map to an array index or maybe FireAnt could pass an array which defines a label for each column, which could then be used by the user to set a human friendly header. Example at: https://datatables.net/examples/ajax/simple.html.

Cannot serialize data tables results to JSON

Data tables results are transformed with np.int64 values in the results which cannot be serialized to JSON. Please cast these to python int.

import json

res = my_slicer.manager.column_index_table(
    metrics=['clicks', 'impressions'],
    dimensions=['date', 'device_type'],
    dimension_filters=[RangeFilter('date', date.today() - timedelta(days=14), date.today())])

print(res)
print(json.dumps(
    res
))

Spaces in key names in the datatables configuration JSON object breaks datatables rendering

If a user creates a unique dimension where the definition and display name both use a non-ID column e.g. a name, this will break the data tables. This is because the name often contains spaces or special characters and this breaks the data tables orthogonal data implementation.

One way around this would be to always hash the values used as keys in the datatables configuration that FireAnt creates. For example, hashing using hashlib and returning the hexdigest of the hashed value. This would ensure that no weird characters or spaces would appear in the keys.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.