monosidev / monosi Goto Github PK

View Code? Open in Web Editor NEW

320.0 6.0 32.0 26.4 MB

Open source data observability platform

License: Apache License 2.0

Makefile 0.10% Python 58.08% HTML 0.61% CSS 1.34% TypeScript 38.18% HCL 1.49% Shell 0.20%

data monitoring metrics python observability observability-data react monitoring-tool

monosi's Introduction

Open Source Data Observability Platform

Newsletter | Docs | Website | Contact us

Join the Data Reliability Engineering Community

Monosi offers data quality monitoring as a service to root cause data issues with end-to-end observability.

🏆 Ensure data quality

🚨 Monitor your data & alert on anomalies

🎛 Analyze root cause of data issues

This project is an OSS alternative to proprietary data quality and observability systems. Get started monitoring your data in less than 10 minutes.

Installation

Note: Monosi works through Docker, ensure Docker Compose v2 is installed.

Run the following commands:

git clone https://github.com/monosidev/monosi.git
cd monosi
make compose

Navigate to http://localhost:3000 to access the web application once it has started.

For more instructions on getting started, check out our documentation.

Community

Overview

Start the UI through Docker and quickly connect your data sources and alert integrations

Get alerts in slack when Monosi detects anomalies in defined monitors.

Own your stack

Avoid integration pitfalls with fragmented, legacy tools by using open source & prevent vendor lock-in by owning your (meta)data.

Contributing

To start contributing, check out our Contributing Guide and join the Slack.

monosi's People

Contributors

Stargazers

Watchers

monosi's Issues

Timestamp columns inferred incorrectly

Timestamp columns are incorrectly inferred on nested columns

On (at least) a bigquery dataset, when using tables that have nested/repeated columns that contain a timestamp element (i.e. column type array<timestamp, timestamp>), the field is incorrectly inferred as being a timestamp field due to containing timestamp in the type name.
The first encountered timestamp field is used to calculate freshness here which errors out and causes the monitor for the table to fail.

Based on the source code, similar behaviour can be expected for columns of type e.g. struct<int, timestamp>, or or type struct/array with nested date/datetime-like fields

Expected behavior

I would expect one of two things to happen:

The nested field is ignored for freshness calculation and if no other timestamp-like column is found, no freshness monitor is created for the table
The nested field is unnested and the first field within that is used to calculate freshness. This is more complex and so more error-prone.

Steps to reproduce

Create table with a nested timestamp/date-like field
[Optional] create a control table without the nested structure
Add the data source that includes the table with the nested field
a. Navigate to /sources,
b. Click "Create Data Source"
c. Fill in credentials for the data source
Wait for a few minutes to allow the monitors to run
Run docker logs <CONTAINER_NAME> and check for progress/errors
See error -- This should look something like:

ERROR:root:(google.cloud.bigquery.dbapi.exceptions.DatabaseError) 400 Invalid cast from ARRAY<STRUCT<start TIMESTAMP, end TIMESTAMP>> to TIMESTAMP at [37:25]
...

[SQL:
            SELECT
                CURRENT_TIMESTAMP() as `WINDOW_START`,
                CURRENT_TIMESTAMP() as `WINDOW_END`,
                COUNT(*) as `ROW_COUNT`,
                'xx' as `TABLE_NAME`,
                'xx' as `DATABASE_NAME`,
                'xx' as `SCHEMA_NAME`,
...
TIMESTAMP_DIFF(MAX(CAST(windows AS TIMESTAMP)), CURRENT_TIMESTAMP(), MINUTE) AS windows___freshness
            FROM xx.xx.xx;
        ]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

Validation on Data Source Form

Description

There's currently no validation or error handling for the data source form.

Expected behavior

The data source form needs to have all fields be required and should prevent submission if any of the fields are empty.

Steps to reproduce

Go to settings/sources
Click on create data source button
See that validation doesn't happen on empty form submission

Additional context

Simple validation is already implemented for BigQuery. All other datasources should have this type of validation as well. Ideally, on form submit, the form shows an error in red.

If a source is deleted, the monitor page no longer shows any data

Description

If a source is deleted, the monitor page no longer shows any of the data from previous scheduled job runs for all other (non-deleted) sources

Expected behavior

When a source is deleted, the historical data in the monitor page should still be available for non-deleted sources

Steps to reproduce

Add a data source
Wait until after an execution has successfully completed
Navigate to the monitor page and view the results of the job
Add another, new data source
Delete the new data source just added
Navigate to monitor page
See error i.e. no data is shown for the source that still exists

Fix icons changing on Profile and Sources pages

Problem

Currently, the icons on the Profile and Sources pages are not the same.

Solution

Standardize the icons used across all settings pages and update all pages accordingly

Requirements

Update the icons on the data sources and integrations page to match those on the profiles page

Additional Context

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Integrations Table Empty State

Description

The Integrations table in the UI on the /settings/integrations page is missing an empty table state.

Expected behavior

There should be an empty table state that is shown on the Integrations page. This empty state should be similar to the monitors or jobs/executions table empty states.

Steps to reproduce

Navigate to /settings/integrations
Click on Integrations
Observer that when there are no integrations connected, there is no empty table state showing

Additional context

TypeError: float() argument must be a string or a number, not 'NoneType'

The following error occurs when the values collected come out to be NULL. For example if we collect AVG MIN for a value that ends up being NULL for all the values.

Short term solution seems to be to simply make the float optional, but then we would need to check everywhere that it is used for calculations that there is actually a value associated.

The other alternative is that we don't save the data point if it's NULL but then we may think that it wasn't calculated in the future and try to recalculate it again.

I would tend towards the former solution for now.

(.venv) ➜  monosi-example git:(master) ✗ monosi -c run
Traceback (most recent call last):
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/bin/monosi", line 8, in <module>
    sys.exit(main())
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/__main__.py", line 6, in main
    parser.parse(sys.argv)
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/cli.py", line 28, in parse
    getattr(self, args.command)()
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/cli.py", line 52, in run
    job.run()
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/jobs/base.py", line 50, in run
    return self._process_jobs()
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/jobs/base.py", line 45, in _process_jobs
    results = [job.run() for job in self.job_queue]
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/jobs/base.py", line 45, in <listcomp>
    results = [job.run() for job in self.job_queue]
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/jobs/run.py", line 25, in run
    stats = self.compile_and_execute()
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/jobs/run.py", line 12, in compile_and_execute
    return self.monitor.execute(driver_config)
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/monitors/table_metrics.py", line 45, in execute
    stats = self.interpret_results(results)
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/monitors/table_metrics.py", line 85, in interpret_results
    stats = self._pivot_results(results)
  File "/Users/kevinunkrich/Developer/monosidev/monosi-example/.venv/lib/python3.9/site-packages/monosi/monitors/table_metrics.py", line 77, in _pivot_results
    value=float(row[alias]),
TypeError: float() argument must be a string or a number, not 'NoneType'

Add support for MS SQL database engine

Problem

Currently monosi does not support Microsoft's MS SQL database engine, which is a very commonly used database engine and it would be a great feature to add

Solution

Add MS SQL Server engine as a supported database

Requirements

Base MS SQL Server implementation on the method used to implement Posgresql, with appropriate drivers

SQL compilation error: syntax error line 9 at position 4 unexpected 'FROM'.

version: 0.0.2

when running monosi -c run on the sample snowflake data, after completing and returning 0 metric, 0 failures the following trace is returned:

Traceback (most recent call last):
  File "/Users/annaweber/Code/monosi_dev/.venv/bin/monosi", line 33, in <module>
    sys.exit(load_entry_point('monosi==0.0.1.post1', 'console_scripts', 'monosi')())
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/__main__.py", line 6, in main
    parser.parse(sys.argv)
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/cli.py", line 29, in parse
    getattr(self, args.command)()
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/cli.py", line 53, in run
    task.run()
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/tasks/base.py", line 53, in run
    return self._process_tasks()
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/tasks/run.py", line 21, in _process_tasks
    results = [task.run() for task in self.task_queue]
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/tasks/run.py", line 21, in <listcomp>
    results = [task.run() for task in self.task_queue]
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/tasks/run.py", line 11, in run
    runner.run()
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/runner.py", line 46, in run
    results = self.execute(sql_stmt)
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi/runner.py", line 32, in execute
    results = self.driver.execute_sql(sql)
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/monosi-0.0.1.post1-py3.9.egg/monosi_drivers/snowflake/configuration.py", line 142, in execute_sql
    cs.execute(sql, params)
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/snowflake/connector/cursor.py", line 790, in execute
    Error.errorhandler_wrapper(
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/snowflake/connector/errors.py", line 272, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/snowflake/connector/errors.py", line 327, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/Users/annaweber/Code/monosi_dev/.venv/lib/python3.9/site-packages/snowflake/connector/errors.py", line 206, in default_errorhandler
    raise error_class(
snowflake.connector.errors.ProgrammingError: 001003 (42000): SQL compilation error:
syntax error line 9 at position 4 unexpected 'FROM'.

Read configuration details from dbt profiles.yml

Problem

I want monosi to read my dbt profiles.yml if I already have a connection to a data store set up there.

Solution

If I have a profiles.yml file, I want to specify that monosi should look at that file for the connection strings and accordingly use them for the creation of a connection to the data store.

Requirements

dbt dependency

Additional Context

The easiest way to approach this would be to

Set a flag in the configuration file that we should read from the dbt profiles.yml file
Read the flag in, if set to True, parse the dbt file and convert it over to a monosi configuration file

Give ability to send test message when creating integration

Problem

There's no good way to check that an integration is working as expected today until an anomaly occurs and sends a message through.

Solution

Implement a test button similar to datasources that send a test message through to the integration to confirm it is working. Today, these integrations are slack and webhooks.

Additional Context

First, add test button / call to frontend.
Refer to data sources for how to do this easily
https://github.com/monosidev/monosi/blob/master/src/ui/src/pages/settings/Sources/components/SourcesTable.tsx#L26-L42
https://github.com/monosidev/monosi/blob/master/src/ui/src/pages/settings/Sources/components/SourcesTable.tsx#L95-L101

Add an endpoint in integrations to handle test, again refer to data sources
https://github.com/monosidev/monosi/blob/master/src/server/handlers/datasources.py#L76-L85
https://github.com/monosidev/monosi/blob/master/src/server/handlers/__init__.py#L30

Do the same in:
https://github.com/monosidev/monosi/blob/master/src/server/handlers/integrations.py
https://github.com/monosidev/monosi/blob/master/src/server/handlers/__init__.py#L30

Add method to integrations which is responsible for creating a test message and sending it.

Unexpected Behavior when overriding default workspace and source

Description

Attempting to override the default workspace and source in the monosi_project.yml is not successful. When monosi profile is run the overides are removed from the monosi_project.yml file and the default workspace an source are used.

Expected behavior

When monosi profile is run on a project with overrides for workspace and source the specified workspace and source should be used. The monosi_project.yml file should still have the specified workspace and source in it after running monosi profile

Please provide clear and concise description of what you expected to happen.

Steps to reproduce

Add an additional workspace to /.monosi/workspaces.yml as shown below

default:
    sources:
        default:
            type: snowflake
            user: <snowflake_username>
            password: <snowflake_password>
            account: <snowflake_acct>
            warehouse: COMPUTE_WH
            database: SNOWFLAKE_SAMPLE_DATA
testworkspace:
  sources:
        testsource:
            type: postgres
            user: <postgres_username>
            host: <postgres_host>
            port: 5432
            database: <postgres_database_name>
            password: <postgres_password>

Navigate to your monosi project directory and open monosi_project.yml
Add override configurations to the file as shown below

name: monosi-repo
version: 0.0.3
monitor-paths:
- ./monitors
- ./bootstrapped-monitors
workspace: testworkspace
source: testsource

run monosi profile
when it completes navigate to the bootstrapped-monitors directory and run ls you will see that the default snowflake data was used
navigate back to the root directory and open the monosi-project.yml file and you will see the workspace and source overrides are no longer there

Additional context

before running monosi profile

after running monosi profile

Add email as an Integration

Description

Currently we support alerts via webhook and slack (webhooks). Both are relatively early, but we are looking to support email notifications as well.

Expected behavior

Expected behavior is to be able to create an email alert notification.

This would require adding an integration option on the frontend: https://github.com/monosidev/monosi/blob/master/src/ui/src/components/forms/IntegrationForm/index.tsx

As well as an integration implementation on the backend
https://github.com/monosidev/monosi/tree/master/src/server/integrations

We should consider better subclassing and defining the alert that is going out to each of these integrations as well.

Additional context

Implementation likely uses https://docs.python.org/3/library/smtplib.html to send email and exposes configuration variables to define the SMTP server/auth information

Better Visualization of Plotly Data Points

Problem

The current way we visualize data points on our Plotly graph is not pretty (Month, Day, Year, Timestamp, Value)

Solution

Show the data point in a better format as described below.

Requirements

The data point should have a (Month, Day, Timestamp) on one line and Value on a new line.

Additional Context

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Create Table Component Wrapper

Problem

Currently, the tables implementation in the code is dirty and not componentized.

Solution

We need to componentize the table implementation for reuse across the application

Requirements

Create a component named <TableComponentWrapper> that implements a table which can receive data from react props and be reused across pages

Additional Context

Find the use cases of the table components in Monosi. Implement a generic componentized version. Replace all places in the code where tables are used with <TableComponentWrapper>

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Replace window.alert on testing connection with toast

Problem

When we test the database connection, we trigger a window.alert() when the API call returns a response.

Solution

A toast should be shown that replaces the testing connection toast.

Requirements

Either show a new toast or replace the existing toast text with success or fail and respective colored (green check or red X)

Additional Context

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Execution Interval increases after an execution is ran

Description

The execution interval of a job seems to be increasing (by an hour) between every execution.

Expected behavior

The execution interval should stay consistent and only be an hour interval.

Steps to reproduce

Start Monosi, connect a data source
Wait for multiple executions to run
Notice that the execution timestamp interval is increasing

Additional context

Data Observability for the AWS Glue ETL Pipeline

Problem

To begin with not all companies may have a full grown datawarehouse and might use the datalake itself as a single place to start with. Our use case is kind of similar where s3 is our datalake, Glue jobs are our Transform step and then Athena is our query engine. AWS Glue provides its own monitoring dashboard but its only at the job levels, like how many jobs were run, how many successful and how many failed

Solution

It would be great to not only have the Job level metrics but also the Data level metrics, like counts of each table corresponding to a particular Glue job (if a table is exposed). All these can be easily pulled from the Glue Catalog Metadata. Was there any anomaly in the counts for the regular jobs. Some of the metrics can be exposed from your current solution, where we can have when was the last time the job was run/updated, table counts were updated etc
Glue job also comes with some metadata within itself like the number of workers used for a particular job, the timeout associated with it, Python version etc. Any way to observe that would also be a great addition.

Requirements

Any requirements that will be necessary for the feature to work.

Additional Context

Add any other context, screenshots, or related issues about the feature request here.

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Test database connection - CLI

Description

There needs to be a command to test the connection to a database for ensuring that the setup was correct.

Other notes

This should be a CLI command e.g. monosi test-connection or monosi ping or something similar.

Test DataSource call: Fix indexing

Description

Currently we provide functionality to test whether we are correctly connected to a data source. In this method, we rely on an exception happening rather than truly checking whether the response is correct due to the fact that we index on a theoretically empty array.

See: https://github.com/monosidev/monosi/blob/master/src/ingestion/sources/base.py#L169

The shortest solultion might be something like

return len(rows) > 1 and len(rows[0]) > 1 and rows[0][columns[0]] == 1

Expected behavior

It's expected that we do not throw an error here even if we are not truly connected, though it is still likely despite fixing indexing. Ideally we would fix indexing and catch specific errors related to not actually connecting.

Datasources Table Empty State

Description

The Data Sources table in the UI on the /settings/sources page is missing an empty table state.

Expected behavior

There should be an empty table state that is shown on the Data Sources page. This empty state should be similar to the monitors or jobs/executions table empty states.

Steps to reproduce

Navigate to /settings/sources
Click on Data Sources
Observer that when there are no sources connected, there is no empty table state showing

Additional context

Ability to edit data source

Problem

Allow the user the ability to update their data source instead of deleting and recreating it.

Solution

Create an edit button in the datasources table which links to the datasource form and updates the information via PUT using the already existing DatasourceService

Coordinate Metrics Table Values with Plotly Transforms

Problem

The Plotly chart can be transformed (e.g. zoomed in / out) to analyze data. When a user interacts with the chart, we don't edit the table associated with the chart in the user interface.

Solution

When a certain section of the Plotly chart is selected, only the data points in that section should show in the table below.

Requirements

Determine the callback that Plotly sends when a user selects a certain part of the Plotly graph. Filter the data displayed in the chart accordingly based on what is received from the callback.

Additional Context

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Make Home page cards fully clickable

Problem

The information cards on the home page are currently not clickable.

Solution

Create a link that wraps the full card entity such that the whole card is clickable

Requirements

Fully wrap the cards in their respective <a> tags

Additional Context

Questions, or need help getting started?

Feel free to ask below, or ping us on Slack

Add support for SqlServer

Problem

I want to use monosi with my SqlServer data store

Solution

Monosi needs to interpret SqlServer configuration details and run on SqlServer accordingly

Requirements

SqlServer instance
Refactor to support multiple formats for configurations
Add SqlServer driver & dialect

Update monitors on refresh

Problem

Monitors are created on the initial run via a SchemaCollectorJob https://github.com/monosidev/monosi/blob/master/src/server/handlers/datasources.py#L55

After this, if new tables, columns, or other entities are introduced, we do not automatically monitor them and we should consider doing so.

Solution

The simplest solution may be to keep this job running and not remove it as in line https://github.com/monosidev/monosi/blob/master/src/server/handlers/datasources.py#L69

This is a shortsighted but temporary possible solution is to simply delete this line (mostly made possible due to db constraints to make sure duplicate monitors aren't created).

Execution timestamp stays the same as the first execution run

Description

The execution run at timestamp stays the same after the first execution run.

Expected behavior

The execution run at timestamp should be representative of the actual time that the execution was ran.

Steps to reproduce

Add a data source
Navigate to the executions page
Wait until 2 or more executions are run
See error

Additional context

snowflake.connector.errors.ProgrammingError when running monitors

Description

When running a monitor on Snowflake sample data, this error is encountered:

Table Health Monitor: SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.lineitem

Finished in 0.52 seconds (SQL results took 0.52 seconds to load)
0 metrics, 0 failures

Traceback (most recent call last):
  File "/usr/local/bin/monosi", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/monosi/__main__.py", line 6, in main
    parser.parse(sys.argv)
  File "/usr/local/lib/python3.9/site-packages/monosi/cli.py", line 32, in parse
    getattr(self, args.command)(argv[1:])
  File "/usr/local/lib/python3.9/site-packages/monosi/cli.py", line 59, in run
    task.run()
  File "/usr/local/lib/python3.9/site-packages/monosi/tasks/base.py", line 53, in run
    return self._process_tasks()
  File "/usr/local/lib/python3.9/site-packages/monosi/tasks/run.py", line 21, in _process_tasks
    results = [task.run() for task in self.task_queue]
  File "/usr/local/lib/python3.9/site-packages/monosi/tasks/run.py", line 21, in <listcomp>
    results = [task.run() for task in self.task_queue]
  File "/usr/local/lib/python3.9/site-packages/monosi/tasks/run.py", line 11, in run
    runner.run()
  File "/usr/local/lib/python3.9/site-packages/monosi/runner.py", line 49, in run
    results = self.execute(sql_stmt)
  File "/usr/local/lib/python3.9/site-packages/monosi/runner.py", line 34, in execute
    results = self.driver.execute_sql(sql)
  File "/usr/local/lib/python3.9/site-packages/monosi/drivers/snowflake/configuration.py", line 145, in execute_sql
    cs.execute(sql, params)
  File "/usr/local/lib/python3.9/site-packages/snowflake/connector/cursor.py", line 790, in execute
    Error.errorhandler_wrapper(
  File "/usr/local/lib/python3.9/site-packages/snowflake/connector/errors.py", line 272, in errorhandler_wrapper
    handed_over = Error.hand_to_other_handler(
  File "/usr/local/lib/python3.9/site-packages/snowflake/connector/errors.py", line 327, in hand_to_other_handler
    cursor.errorhandler(connection, cursor, error_class, error_value)
  File "/usr/local/lib/python3.9/site-packages/snowflake/connector/errors.py", line 206, in default_errorhandler
    raise error_class(
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 7 at position 23
invalid identifier 'I_CLASS'

Steps to reproduce

Set up monosi project with the source database as SNOWFLAKE_SAMPLE_DATA
~~Run monosi profile to bootstrap the sample data monitors~~
Create the following monitor

monosi:
  monitors:
  - table: SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.lineitem
    timestamp_field: l_commitdate
    type: table

Run monosi run to run the monitors
Notice that the error appears

Other notes

~~I believe this is happening with the SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.item table~~

This is happening with the SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.lineitem table

Lookback Days

Description

Users are seeing a "Pending" status even after the first run and fetch for a monitor has been defined and run. There are no metrics showing.

This normally occurs if there is no data in the table OR when the data is older than 100 days. Unfortunately, there was a change which allowed for greater granularity to go from days to minutes, and in doing so we changed from 100 days to 10,000 minutes.

Unfortunately, this means the lookback is only 6 days now accordingly.

Expected behavior

Ideally the default lookback is 100 days, and the user can specify the number of lookback days in the case that it is greater or less than this number of days worth of data they would like to monitor.

Steps to reproduce

Start Monosi
Get a dataset with data where all timestamp fields are greater than 6 days
Create a monitor on that dataset
No data / metrics will show

Originally discovered / brought to attention by @tawfiq9009 , many thanks - will be fixing shortly.

Inconsistent usage of UTC & local timezones for timestamps

Description

There are Inconsistent timezones being used between various data models throughout the application. Some are UTC while others are local times.

Expected behavior

Timezones should be consistent for all data models (Monitors, Jobs, Executions, Datasources, Integrations) throughout the entirety of the application.

Steps to reproduce

Add a data source
Wait until monitors are created
Navigate to executions page
Notice timestamp
Navigate to jobs tab on the executions page
Notice different timestamp
Navigate to the monitors page
Notice different timestamp

Send monitoring results to data store table

Problem

I want my monosi monitoring results to be sent to a table in a database for analysis, visualization, etc.

Solution

Provide ability to pipe monosi monitor results to a database table.

Requirements

Some sort of flag / indicator that I want to send results to my data store.
Format of results in a way that can be save in a data store.

Ability to edit integration

Problem

Allow the user the ability to update their integration instead of deleting and recreating it.

Solution

Create an edit button in the integrations table which links to the integration form and updates the information via PUT using the already existing IntegrationService

Add support for Postgres

Problem

I want to use monosi with my Postgres data store

Solution

Monosi needs to interpret Postgres configuration details and run on Postgres accordingly

Requirements

Postgres instance
Refactor to support multiple formats for configurations
Add Postgres driver & dialect

Show pagination only when multiple pages exist in table

Description

Every table view in the UI has pagination that shows a single number when there's less than one page.

Expected behavior

Pagination should only be rendered if there are multiple table pages. This should be done by checking if there is more than a certain amount of elements and then displaying pagination accordingly.

Steps to reproduce

Go to any page that has a table view (data sources, integrations, monitors, etc.)
Notice that the pagination is showing but the table only has one page

Additional context

Populate profile page with email if email is given

Description

Currently on application start, an email is collected to create a user for monosi. When navigating to the settings > profile page, there is a form for editing this email.

It does not currently show the existing email or allow the user to update the email.

Expected behavior

Expected behavior is that the user would be able to see the email they are using and update it here.

Steps to reproduce

Start application
Go to Settings > Profile

Potential Fix

Extract email form from GettingStarted into ProfileForm
https://github.com/monosidev/monosi/blob/master/src/ui/src/pages/app/onboarding/GettingStarted/index.tsx#L35-L49
https://github.com/monosidev/monosi/blob/master/src/ui/src/pages/settings/Profile/index.tsx#L53
https://github.com/monosidev/monosi/blob/master/src/ui/src/components/forms/ProfileForm/index.tsx
Add extra fetch on load to fill the form's value/placeholder with the existing email if exists.
e.g.

  const [email, setEmail] = useState<string>('');
  useEffect(() => {
    async function loadEmail() {
      let res = await UserService.getAll();
      if (res !== null && res.user && res.user.email) {
        setEmail(res.user.email);
      }
    }

    loadEmail();
  }, []);

Remove EUI Dependency

Description

We were originally using the EUI user interface library to build some of our components. We have since switched to the React Bootstrap library, but have not fully removed the EUI dependency (since we are still using some of their components for forms). The EUI dependency has unnecessary second order dependencies such as moment.js that we don't use but make the build heavier

Expected behavior

We need to refactor the forms that use EUI components into Bootstrap forms to fully remove the EUI dependency. Then, we need to delete EUI from the package.json.

Steps to reproduce

Go to Sources or Integrations page
Click on Create Data Source or Create Integration
Notice that the drawer is an EUI component

Additional context

Add advanced repository tracking

There's a cool github action for tracking repository stats - https://github.com/jgehrcke/github-repo-stats
It's worth taking a quick look and potentially adding it.

Automate code reviews on commits and pull requests

Problem

I would like to Automate code reviews on my commits and pull requests

Solution

Implement integration with an automated code review system such as Codacy (www.codacy.com)

Requirements

Needs to

integrate at the repo level i.e. with GitHub
be automated
- support the Monosi technologies i.e. JavaScript, React etc.

Docs is down

Description

https://docs.monosi.dev/ is broken with an error:

ERR_SSL_VERSION_OR_CIPHER_MISMATCH

Expected behavior

https://docs.monosi.dev/ should open and show documentation

Steps to reproduce

Go to https://docs.monosi.dev/
See error

Additional context

N/A

can't run development environment. server is crashing immediately

Description

Start the development environment as described in https://docs.monosi.dev/docs/contributing/local-development/#running-with-docker. The server crashes though.

Expected behavior

Server does not crash.

Steps to reproduce

make compose-build
make compose-up

Additional context

Docker container logs from docker-monosi-server

[2022-04-13 23:52:38 +0000] [1] [INFO] Starting gunicorn 20.1.0
[2022-04-13 23:52:38 +0000] [1] [INFO] Listening at: http://0.0.0.0:5000 (1)
[2022-04-13 23:52:38 +0000] [1] [INFO] Using worker: sync
[2022-04-13 23:52:38 +0000] [9] [INFO] Booting worker with pid: 9
INFO:apscheduler.scheduler:Scheduler started
DEBUG:apscheduler.scheduler:Looking for jobs to run
DEBUG:apscheduler.scheduler:Next wakeup is due at 2022-04-14 00:48:00.229931+00:00 (in 3321.534835 seconds)
[2022-04-13 23:52:38 +0000] [9] [ERROR] Exception in worker process
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 719, in do_execute
    cursor.execute(statement, parameters)
psycopg2.errors.UndefinedColumn: column msi_users.anonymize_usage_data does not exist
LINE 1: SELECT msi_users.anonymize_usage_data AS msi_users_anonymize...
               ^


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
    worker.init_process()
  File "/usr/local/lib/python3.9/site-packages/gunicorn/workers/base.py", line 134, in init_process
    self.load_wsgi()
  File "/usr/local/lib/python3.9/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
    self.wsgi = self.app.wsgi()
  File "/usr/local/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
    self.callable = self.load()
  File "/usr/local/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
    return self.load_wsgiapp()
  File "/usr/local/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
    return util.import_app(self.app_uri)
  File "/usr/local/lib/python3.9/site-packages/gunicorn/util.py", line 359, in import_app
    mod = importlib.import_module(module)
  File "/usr/local/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/app/server/wsgi.py", line 9, in <module>
    app = create_app()
  File "/app/server/__init__.py", line 28, in create_app
    user = User.create_or_load()
  File "/app/server/models.py", line 164, in create_or_load
    return db.session.query(cls).one()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2856, in one
    return self._iter().one()
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/query.py", line 2894, in _iter
    result = self.session.execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 1689, in execute
    result = conn._execute_20(statement, params or {}, execution_options)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20
    return meth(self, args_10style, kwargs_10style, execution_options)
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
    return connection._execute_clauseelement(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
    ret = self._execute_context(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 719, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedColumn) column msi_users.anonymize_usage_data does not exist
LINE 1: SELECT msi_users.anonymize_usage_data AS msi_users_anonymize...
               ^

[SQL: SELECT msi_users.anonymize_usage_data AS msi_users_anonymize_usage_data, msi_users.receive_updates AS msi_users_receive_updates, msi_users.setup_completed AS msi_users_setup_completed, msi_users.email AS msi_users_email, msi_users.id AS msi_users_id 
FROM msi_users]
(Background on this error at: https://sqlalche.me/e/14/f405)
[2022-04-13 23:52:38 +0000] [9] [INFO] Worker exiting (pid: 9)
[2022-04-13 23:52:38 +0000] [1] [INFO] Shutting down: Master
[2022-04-13 23:52:38 +0000] [1] [INFO] Reason: Worker failed to boot.

Validation on Integration Form

Description

There's currently no validation or error handling for the integration form.

Expected behavior

The integration form has both fields required and should prevent submission if any of the fields are empty.

Steps to reproduce

Go to settings/integrations
Click on create integration button
See that validation doesn't happen on empty form submission

Additional context

Support ClickHouse as data source

Problem

Lot of users need data observability for their in-house data platform

Solution

Use ClickHouse as data source

Requirements

Look https://clickhouse.com/docs/en/
it's a blazing fast and open source Apache 2.0 database

Better Error Handling for ValueError: invalid literal for int() with base 10: 'None'

Description

Seeing an error when running the application locally with flask run

 flask run
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off
Traceback (most recent call last):
  File "/Users/ivanporollo/Desktop/monosi/.venv/bin/flask", line 33, in <module>
    sys.exit(load_entry_point('Flask==2.0.2', 'console_scripts', 'flask')())
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 994, in main
    cli.main(args=sys.argv[1:])
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 600, in main
    return super().main(*args, **kwargs)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/decorators.py", line 84, in new_func
    return ctx.invoke(f, obj, *args, **kwargs)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/click-8.0.4-py3.9.egg/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 849, in run_command
    app = DispatchingApp(info.load_app, use_eager_loading=eager_loading)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 324, in __init__
    self._load_unlocked()
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 350, in _load_unlocked
    self._app = rv = self.loader()
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 410, in load_app
    app = locate_app(self, import_name, None, raise_if_not_found=False)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/Flask-2.0.2-py3.9.egg/flask/cli.py", line 260, in locate_app
    __import__(module_name)
  File "/Users/ivanporollo/Desktop/monosi/src/server/__init__.py", line 10, in <module>
    from server.middleware import middleware
  File "/Users/ivanporollo/Desktop/monosi/src/server/middleware/__init__.py", line 3, in <module>
    from .scheduler import init_scheduler
  File "/Users/ivanporollo/Desktop/monosi/src/server/middleware/scheduler.py", line 4, in <module>
    from server.config import Config
  File "/Users/ivanporollo/Desktop/monosi/src/server/config.py", line 22, in <module>
    class BaseConfig:
  File "/Users/ivanporollo/Desktop/monosi/src/server/config.py", line 36, in BaseConfig
    SCHEDULER_JOBSTORES = {"default": SQLAlchemyJobStore(url=SQLALCHEMY_DATABASE_URI, tablename="msi_jobs")}
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/APScheduler-3.9.1-py3.9.egg/apscheduler/jobstores/sqlalchemy.py", line 52, in __init__
    self.engine = create_engine(url, **(engine_options or {}))
  File "<string>", line 2, in create_engine
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/sqlalchemy/util/deprecations.py", line 309, in warned
    return fn(*args, **kwargs)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/sqlalchemy/engine/create.py", line 530, in create_engine
    u = _url.make_url(url)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/sqlalchemy/engine/url.py", line 715, in make_url
    return _parse_rfc1738_args(name_or_url)
  File "/Users/ivanporollo/Desktop/monosi/.venv/lib/python3.9/site-packages/sqlalchemy/engine/url.py", line 771, in _parse_rfc1738_args
    components["port"] = int(components["port"])
ValueError: invalid literal for int() with base 10: 'None'

Expected behavior

This error occurs because local environment variables have not been set for the Monosi database. To solve this error, you need to set the following environment variables in the shell:

export DB_USER=<USER>
export DB_PASSWORD=<PASSWORD>
export DB_HOST=<HOST>
export DB_PORT=<PORT>
export DB_DATABASE=<DATABASE>
export DB_SCHEMA=<SCHEMA>

There should be better error handling to notify the user that their environment variables have not been set.

Steps to reproduce

Clone the monosi repository
Follow the monosi setup instructions for local environment
Run the server through flask run
Observe error

Fix standard deviation metrics

Description

Numeric standard deviation and length standard deviation are not currently being calculated

https://github.com/monosidev/monosi/blob/master/src/ingestion/sources/base.py#L265-L275

Expected behavior

By uncommenting these lines, we would expect that the metrics are made available to users. It would seem that the SQL being run in this case is not correctly computing the standard deviation of length and values.

The current SQL is here:

Steps to reproduce

Uncomment https://github.com/monosidev/monosi/blob/master/src/ingestion/sources/base.py#L265-L275 to enable the metrics. 2. Then try to run the application.
You'll notice that queries fail due to the metric being calculated incorrectly.
There is also the potential that it works and there is some specific niche case where this was an issue, so this should be verified.

Additional context

To fix this issue, one simply needs to figure out the SQL required to calculate the numeric std dev and length std dev and replace the fields here. Then, uncomment the metric lines.

Generate FQTN in profiler if possible

Description

When running the command monosi profile, the table string in each monitor is generated with an incorrect format. It currently creates just the table name. Instead, for Snowflake, it should generate <database_name>.<schema_name>.<table_name>

Steps to reproduce

Setup a monosi project with a connected source
Run monosi profile in the monosi project directory
Navigate to a monitor in the generated bootstrapped-monitors folder
Notice that the monitor's table key only has the table name as the value. Instead, we should have <database_name>.<schema_name>.<table_name>

Add support for Redshift

Problem

I want to use monosi with my Redshift data store

Solution

Monosi needs to interpret Redshift configuration details and run on Redshift accordingly

Requirements

Redshift instance
Refactor to support multiple formats for configurations
Add Redshift driver & dialect

Send anomaly alerts to slack

Problem

I need to be alerted in slack when monosi detects anomalies.

Solution

Set up monosi to take a slack connection for sending alerts and information.

Requirements

Slack workspace exists and one can retrieve the webhook for it to send events to.

Additional Context

The easiest way here would be simply add a way for input of a slack webhook URL and then sending a payload to that URL.

File validation for all monosi setup files

Description

There should be validation for all monosi related files (collections, monosi_project, monitors). Currently, we do not validate any of the files related to the monosi project. We need to check each related file to ensure that they are defined correctly. If there's an issue in the format or syntax, throw an error.

Other notes

This should be done in the validate function. This should probably use some form of schema validation (maybe https://python-jsonschema.readthedocs.io/en/stable/)

Add support for Druid

Problem

I want to use monosi with my Druid data store

Solution

Monosi needs to interpret Druid configuration details and run on Druid accordingly

Requirements

Druid instance
Add Druid driver & dialect
Tests to ensure connection & monitors work

Add support for BigQuery

Problem

I want to use monosi with my BigQuery data store

Solution

Monosi needs to interpret BigQuery configuration details and run on BigQuery accordingly

Requirements

BigQuery instance
Refactor to support multiple formats for configurations
Add BigQuery driver & dialect

On source deletion, executions status stays running

Description

If a source is deleted during an execution, the status of that execution stays running.

Expected behavior

When a source is deleted, the execution should be stopped (and probably removed).

Steps to reproduce

Add a data source
Wait until an execution is running
Delete the data source while an execution is running
Navigate to executions page
See error

monosidev / monosi Goto Github PK

monosi's Introduction

Open Source Data Observability Platform

Installation

Community

Overview

Own your stack

Contributing

monosi's People

Contributors

Stargazers

Watchers

Forkers

monosi's Issues

Timestamp columns are incorrectly inferred on nested columns

Expected behavior

Steps to reproduce

Description

Expected behavior

Steps to reproduce

Additional context

Description

Expected behavior

Steps to reproduce

Problem

Solution

Requirements

Additional Context

Questions, or need help getting started?

Description

Expected behavior

Steps to reproduce

Additional context

Problem

Solution

Requirements

Problem

Solution

Requirements

Additional Context

Problem

Solution

Additional Context

Description

Expected behavior

Steps to reproduce

Additional context

Description

Expected behavior

Additional context

Problem

Solution

Requirements

Additional Context

Questions, or need help getting started?

Problem

Solution

Requirements

Additional Context

Questions, or need help getting started?

Problem

Solution

Requirements

Additional Context

Questions, or need help getting started?

Description

Expected behavior

Steps to reproduce

Additional context

Problem

Solution

Requirements

Additional Context

Questions, or need help getting started?

Description

Other notes

Description

Expected behavior

Description

Expected behavior