Giter VIP home page Giter VIP logo

dbnd's People

Contributors

andreyz4k avatar asafmaor avatar dariusz-jania avatar desher123 avatar devops-ghe avatar dudi-databand avatar evgenyshulman avatar fhoffmanncode avatar gleb-britecore avatar ilantc1 avatar ilantc42 avatar jonathanshir avatar jonnybarda avatar kalebinn avatar makeyko avatar moshe avatar pod666 avatar randomclicker avatar rozhok avatar staaam avatar talrumer-databand avatar turbaszek avatar vashanin avatar yk1711 avatar yoav-benizri avatar yuval7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dbnd's Issues

Wrong import in dbnd_execute.py

Cannot find reference 'dbnd_operator__get_task_retry_delay' in 'dbnd_execute.py'

from dbnd_airflow.dbnd_task_executor.dbnd_execute import (
            dbnd_operator__get_task_retry_delay,
        )

Not found (404) when call GET http://<databand-server>/api/v1/integrations/config?type=***

Describe
The Airflow Monitor DAG get not found error (http 404) when call Databand API "http:///api/v1/integrations/config?type=***"
It isn't a communication issue between Airflow and Databand because previous from another API calls occurs successfully.

To reproduce

  1. Create a requirements.txt file with content:
dbnd==1.0.14.1
dbnd-spark==1.0.14.1
dbnd-airflow==1.0.14.1
dbnd-airflow-auto-tracking==1.0.14.1
  1. Create Dockerfile to build Airflow image:
FROM apache/airflow:latest
ADD requirements.txt .
RUN pip install apache-airflow==${AIRFLOW_VERSION} -r requirements.txt
  1. Create directories:
mkdir -p ./dags ./logs ./plugins ./config
  1. Get official docker-compose file: https://airflow.apache.org/docs/apache-airflow/2.8.1/docker-compose.yaml

  2. Change docker-compose.yml, comment line 52 and uncomment line 53

...
  # image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.8.1}
  build: .
...
  1. Create the containers
docker compose up airflow-init
docker compose up -d
  1. In Databand, create a Airflow Syncer (Settings > Airflow Syncers > Add new Syncer)
    Connection URL: http://<docker_host>:8080
    Syncer name: Any_syncer
    The step 3 give you a json to be used in next step.

  2. In Airflow, in admin > connection, create a new connection
    connection id: dbnd_config
    connection type: HTTP
    Extra:

{
  "airflow_monitor": {
    "dag_ids": "",
    "is_sync_enabled": true,
    "syncer_name": "Any_syncer"
  },
  "core": {
    "databand_url": "http://<databand-server>",
    "databand_access_token": "eyJ0eXAiOiJKV...."
  },
  "log": {
    "preview_head_bytes": 8192,
    "preview_tail_bytes": 8192
  },
  "tracking": {
    "track_source_code": false
  }
}
  1. Create a DAG file "databand_airflow_monitor.py" in dag folder:
from airflow_monitor.monitor_as_dag import get_monitor_dag
## This DAG is used by Databand to monitor your Airflow installation.
dag = get_monitor_dag()
  1. In Airflow, Unpause dag "databand_airflow_monitor" and see the logs.
ec2f5435d5c4
*** Found local files:
***   * /opt/airflow/logs/dag_id=databand_airflow_monitor/run_id=scheduled__2024-02-01T17:21:00+00:00/task_id=monitor/attempt=3.log
*** Found logs served from host http://ec2f5435d5c4:8793/log/dag_id=databand_airflow_monitor/run_id=scheduled__2024-02-01T17:21:00+00:00/task_id=monitor/attempt=3.log
[2024-02-07, 18:43:24 UTC] {dbnd_airflow_handler.py:117} INFO - Databand Tracking Started 1.0.14.1
[2024-02-07, 18:43:24 UTC] {base.py:83} INFO - Using connection ID 'dbnd_config' for task execution.
[2024-02-07, 18:43:24 UTC] {base.py:83} INFO - Using connection ID 'dbnd_config' for task execution.
[2024-02-07, 18:43:24 UTC] {taskinstance.py:1956} INFO - Dependencies all met for dep_context=non-requeueable deps ti=<TaskInstance: databand_airflow_monitor.monitor scheduled__2024-02-01T17:21:00+00:00 [queued]>
[2024-02-07, 18:43:24 UTC] {taskinstance.py:1956} INFO - Dependencies all met for dep_context=requeueable deps ti=<TaskInstance: databand_airflow_monitor.monitor scheduled__2024-02-01T17:21:00+00:00 [queued]>
[2024-02-07, 18:43:24 UTC] {taskinstance.py:2170} INFO - Starting attempt 3 of 13
[2024-02-07, 18:43:24 UTC] {taskinstance.py:2191} INFO - Executing <Task(MonitorOperator): monitor> on 2024-02-01 17:21:00+00:00
[2024-02-07, 18:43:24 UTC] {standard_task_runner.py:60} INFO - Started process 358 to run task
[2024-02-07, 18:43:24 UTC] {standard_task_runner.py:87} INFO - Running: ['***', 'tasks', 'run', 'databand_***_monitor', 'monitor', 'scheduled__2024-02-01T17:21:00+00:00', '--job-id', '24596', '--raw', '--subdir', 'DAGS_FOLDER/databand_***_monitor.py', '--cfg-path', '/tmp/tmpyjarxxti']
[2024-02-07, 18:43:24 UTC] {standard_task_runner.py:88} INFO - Job 24596: Subtask monitor
[2024-02-07, 18:43:24 UTC] {base.py:83} INFO - Using connection ID 'dbnd_config' for task execution.
[2024-02-07, 18:43:25 UTC] {task_command.py:423} INFO - Running <TaskInstance: databand_airflow_monitor.monitor scheduled__2024-02-01T17:21:00+00:00 [running]> on host ec2f5435d5c4
[2024-02-07, 18:43:25 UTC] {taskinstance.py:2480} INFO - Exporting env vars: AIRFLOW_CTX_DAG_OWNER='Databand' AIRFLOW_CTX_DAG_ID='databand_***_monitor' AIRFLOW_CTX_TASK_ID='monitor' AIRFLOW_CTX_EXECUTION_DATE='2024-02-01T17:21:00+00:00' AIRFLOW_CTX_TRY_NUMBER='3' AIRFLOW_CTX_DAG_RUN_ID='scheduled__2024-02-01T17:21:00+00:00' AIRFLOW_CTX_UID='ff2088ba-35ab-5eba-870b-c5b30b238094'
[2024-02-07, 18:43:25 UTC] {base.py:83} INFO - Using connection ID 'dbnd_config' for task execution.
[2024-02-07, 18:43:25 UTC] {base.py:83} INFO - Using connection ID 'dbnd_config' for task execution.
[2024-02-07, 18:43:25 UTC] {tracking_store_console.py:98} INFO - Tracking monitor task at http://192.168.0.195:8080/app/jobs/databand_***_monitor/29c4661a-0e21-5fb3-a5ba-94735dbf3cc1/084620bd-cb4a-5287-a09a-eb3495a26f3f
[2024-02-07, 18:43:25 UTC] {monitor_as_dag.py:194} INFO - Running memory guard with the limit=8589934592, checking every 10 seconds
[2024-02-07, 18:43:25 UTC] {subprocess.py:63} INFO - Tmp dir root location: /tmp
[2024-02-07, 18:43:25 UTC] {subprocess.py:75} INFO - Running command: ['/usr/bin/bash', '-c', '/usr/local/bin/python -m dbnd ***-monitor-v2  --interval 10  --stop-after 10800 ']
[2024-02-07, 18:43:25 UTC] {subprocess.py:86} INFO - Output:
[2024-02-07, 18:43:25 UTC] {monitor_as_dag.py:209} INFO - Memory usage changed from: 0 mb to 208 mb
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,904] INFO - Starting Databand 1.0.14.1!
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - 	DBND_HOME=/root
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - 	DBND_SYSTEM=/root/.dbnd
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,904] INFO - Reading configuration from:
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - 	/home/***/.local/lib/python3.8/site-packages/dbnd/conf/databand-core.cfg
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - 
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,907] INFO - Running validations
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,908] INFO - All required configurations exist
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,915] INFO - All dbnd packages required for monitor exist
[2024-02-07, 18:43:26 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:26,915] INFO - All dbnd packages required for tracking exist
[2024-02-07, 18:43:27 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:27,188] INFO - Reading the config from /opt/***/***.cfg
[2024-02-07, 18:43:27 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:27,364] INFO - Configured default timezone UTC
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - [�[34m2024-02-07T18:43:28.315+0000�[0m] {�[34mvalidations.py:�[0m98} INFO�[0m - Airflow 2.0 support is set�[0m
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - [2024-02-07 18:43:28,586] ERROR ***_monitor.shared.multiserver 360 MainThread : Unknown exception during iteration
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - Traceback (most recent call last):
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/***_monitor/shared/multiserver.py", line 167, in run
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     self.run_once()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/***_monitor/shared/multiserver.py", line 183, in run_once
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     ] = self.integration_management_service.get_all_servers_configuration(
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 241, in wrapped_f
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return self.call(f, *args, **kw)
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 329, in call
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     do = self.iter(result=result, exc_info=exc_info,
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 297, in iter
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     raise retry_exc.reraise()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 136, in reraise
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     raise self.last_attempt.result()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return self.__get_result()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     raise self._exception
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 333, in call
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     result = fn(*args, **kwargs)
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/***_monitor/common/metric_reporter.py", line 77, in wrapped
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return f(*args, **kwargs)
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 241, in wrapped_f
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return self.call(f, *args, **kw)
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 329, in call
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     do = self.iter(result=result, exc_info=exc_info,
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 279, in iter
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return fut.result()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 437, in result
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     return self.__get_result()
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     raise self._exception
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/_vendor/tenacity/__init__.py", line 333, in call
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     result = fn(*args, **kwargs)
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/***_monitor/shared/integration_management_service.py", line 68, in get_all_servers_configuration
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     response = self._api_client.api_request(
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/utils/api_client.py", line 254, in api_request
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     resp = self._request(
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -   File "/home/***/.local/lib/python3.8/site-packages/dbnd/utils/api_client.py", line 152, in _request
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO -     raise DatabandApiError(
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - dbnd._core.errors.base.DatabandApiError: Call failed to endpoint GET http://<databand-server>/api/v1/integrations/config?type=***
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - Response code: 404
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - Server error: <!doctype html>
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - 	<html lang=en>
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - 	<title>404 Not Found</title>
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - 	<h1>Not Found</h1>
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - 	<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>
[2024-02-07, 18:43:28 UTC] {subprocess.py:93} INFO - 

Expected behavior
Success in endpoint call and see execution metrics in Databand UI.

Typo in _DecoratedTask._invoke_func

# this function is called from run/banc -> # this function is called from run/band

# that's mean we just need to execute user code to calculate staff -> # that's mean we just need to execute user code to calculate stuff

Loggers should not write ascii codes to files

Currently when dbnd saves task logs those file include ascii escape codes:

[2020-12-01 15:03:09,771] �[32mINFO �[0m - Starting Databand 0.30.0!

This is rather not expected behaviour when using file log handlers.

Pass task type in explicit way

Hello, I think it would be better to do:

def task(*args, **kwargs):
    return _task_decorator(
        task_type=DecoratedPipelineTask,
        task_default_result=_default_output
        *args, **kwargs
    )

instead of

def task(*args, **kwargs):
    kwargs.setdefault("_task_type", DecoratedPythonTask)
    kwargs.setdefault("_task_default_result", _default_output)
    return _task_decorator(*args, **kwargs)

Why? The first thing the _task_decorator does is to pop those arguments. Thus passing them explicitly will by more pythonic imho. I'm happy to open a PR :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.