raybeam / rb_status_plugin Goto Github PK
View Code? Open in Web Editor NEWrb_status_plugin : Data confidence tool for Airflow
License: MIT License
rb_status_plugin : Data confidence tool for Airflow
License: MIT License
I was messing around with different types of quality checks and I noticed that the Lumen report DAGs actually have 2 different types of failures.
So, the second option is basically a status of the status tests ... kinda meta, but important. We should think of a different way for the Lumen report DAGs to report success or failure.
Adding tests for the new attributes added to Report class
We probably want the value displayed to be the name of the dag and not the number of the run. The number doesn't have much semantic value alone. Example:
We need to populate the existing report configurations into the UI when a user edit's an existing report.
Test configs will need to create DAGs
Showing report's status in the status page
cron schedule
, day of week
, and time
appear as options in UI, regardless of which schedule is chosen.
We need to upload the list of all available tests for the user, when they are building a report.
Currently the UI is populating sample tests, but we need to extract the list of tests as dag_name.task_name from airflow's postgres db.
If a report doesn't exist, visiting the edit page for that report should raise an error and redirect to /reports.
Need to add tests for saving an edited report.
Change LumenBaseBuilderView -> LumenStatusView to build email link
Traceback (most recent call last):
File "/usr/lib/python3.7/site-packages/flask/app.py", line 2447, in wsgi_app
response = self.full_dispatch_request()
File "/usr/lib/python3.7/site-packages/flask/app.py", line 1952, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/lib/python3.7/site-packages/flask/app.py", line 1821, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3.7/site-packages/flask/_compat.py", line 39, in reraise
raise value
File "/usr/lib/python3.7/site-packages/flask/app.py", line 1950, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/lib/python3.7/site-packages/flask/app.py", line 1936, in dispatch_request
return self.view_functionsrule.endpoint
File "/usr/lib/python3.7/site-packages/flask_appbuilder/security/decorators.py", line 109, in wraps
return f(self, *args, **kwargs)
File "/usr/local/airflow/plugins/lumen_plugin/views.py", line 176, in form_post
extract_report_data_into_airflow(form)
File "/usr/local/airflow/plugins/lumen_plugin/helpers/report_save_helpers.py", line 29, in extract_report_data_into_airflow
convert_schedule_to_cron_expression(report_dict, form)
File "/usr/local/airflow/plugins/lumen_plugin/helpers/report_save_helpers.py", line 79, in convert_schedule_to_cron_expression
time_of_day = form.schedule_time.data.strftime("%H:%M")
AttributeError: 'NoneType' object has no attribute 'strftime'
Add logic to handle manual schedule (aka setting schedule_interval=None
)
Most likely a front-end change
The UI is already in place, but we need to create a hook that will call the underlying dag to run in real-time when a user clicks the trigger report button.
The UI is already in place, but we need to create a hook that will change the state of the report's underlying dag to ON or OFF.
The UI is already in place, but we need to create a hook that will first delete the airflow variable for the report and then the underlying dag for the report, when the user clicks delete report button.
Lumen DAGs should have catchup set to False so they don't try to backfill.
Configs looks like this:
{
"tests": [
"dag_name.operator_1",
"dag_name.operator_3"
],
"emails": ["[email protected]", "[email protected]"],
"schedule": "0 7 * * *"
}
Lumen needs to be able to read all variables and parse the report configs.
Lumen test config variables are prefixed with lumen_test_
Right now we're passing a link to the error log. We want to pass a link to the status page
We need to move the DAG part of AIrflow into it's own repo and just have the plugin stand alone. The aim is to have the plugin in it's own repo but still deploy into an AIrflow instance for CI and demo.
Here's an example of an Airflow plugin repository: https://github.com/airflow-plugins/salesforce_plugin
It should not assume Astronomer but should accommodate it.
When editing an existing report, the schedule should show up in the same format as it was during last save.
Ie. We don't want to display a cron expression when a user goes back to edit a report, if the user originally saved it using the non-cron options.
Create Lumen page for reports management.
Spec https://docs.google.com/document/d/1mUuVnksqLfKGnVXMUHh8UA7vWynSLgf1WD7JLyI8j94/edit#heading=h.3131wg1wryom
We need to embed into the report management workflow a pattern such that:
Right now, if you delete a Lumen report DAG in the Airflow UI, it just comes back later. You can only truly remove it if you remove the Variable that creates it. When we change to a DB backed management system, it will only be removed when you remove it from the Lumen database tables.
Should removing that DAG in the UI remove it in Lumen?
Here we passing emails
into owner_email
that results in this bug
<a href="mailto:['[email protected]', '[email protected]']">['[email protected]', '[email protected]']</a>
Is it our intention to have multiple owners?
To my understanding:
emails
- a list of subscribersowner_name
and owner_email
- creator/owner of the report, one person in the current designTest choices population using the task instance object.
Do we want to populate tasks that haven't run? Is this part of the MVP?
We need to find a new model to use that isn't taskinstance. I'm not sure where to find operator data though.
How to handle errors when user submits form (ie. bad email).
Currently, it will throw an airflow error on a new page.
To fix this issue, we should show error(s) at top of the same page when user clicks Save
Button.
The reports_data
method (and other methods) on Lumen views should handle errors gracefully. Probably logging them and maybe showing an error in the page. It should not throw a 500 and fail like it does now.
Example
When the ReportInstance tries to get_latest
if there is no available run, it throws an exception. This exception results in a 500 in the report view.
In order to easily run Lumen, we'll need a script to set up the environment.
Lumen Back End should be able to extract the data from the Report Management UI into an airflow variable when the user clicks the "Save" button on a new report.
We need unit tests for email.
Adding all attributes about a report, so that a Report object has all required information about itself.
I think this needs to be changed to only take into consideration test tasks. Example:
I have helper functions in email_helpers.py that determine test tasks and I think should be moved to the reportinstance class
We need to have the following attributes in the value of the airflow variable:
report_title
description
owner_name
owner_email
subscribers
tests
schedule
Note that emails has been renamed to subscribers and also that subscribers is a union of owner_email and subscribers.
Queries postgres for status of task_instance (aka a test) and then fails or succeeds based on task_instance's status.
Moving helper functions into classes (based on their functionality) in order to give a more holistic view of how each function ties together.
Create email template using Foundation and test it in litmus
======================================================================
FAIL: test_invalid_email (lumen_report_save_test.ReportSaveTest)
Test that errors are thrown with an invalid email.
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/circleci/lumen-test-airflow/plugins/lumen_plugin/tests/lumen_report_save_test.py", line 206, in test_invalid_email
in str(context.exception)
AssertionError: False is not true
----------------------------------------------------------------------
Currently, /lumen/reports/
is being populated by dummy_reports
.
The page should be populated with real report data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.