Giter VIP home page Giter VIP logo

server's Introduction

Please note: If you are new to Prefect, we strongly recommend starting with Prefect 2 and Prefect Cloud 2, as they are in General Availability.

Prefect 1 Core, Server, and Cloud are our first-generation workflow and orchestration tools. You can continue to use them and we'll continue to support them while migrating users to Prefect 2. Prefect 2 can also be self-hosted and does not depend on this repository in any way.

If you're ready to start migrating your workflows to Prefect 2, see our migration guide.

Prefect Server

Please note: this repo is for Prefect Server development. If you want to run Prefect Server, the best first step is to install Prefect and run prefect server start.

If you want to install Prefect Server on Kubernetes, take a look at the Server Helm Chart.

If you would like to work on the Prefect UI or open a UI-specific issue, please visit the Prefect UI repository.

Overview

Prefect Server is an open source backend that makes it easy to monitor and execute your Prefect flows.

Prefect Server consists of a number of related services including:

  • postgres: the database persistence layer
  • hasura: a GraphQL API for Postgres (http://hasura.io)
  • graphql: a Python-based GraphQL server that exposes mutations (actions) representing Prefect Server's logic
  • apollo: an Apollo Server that serves as the main user interaction endpoint, and stitches together the hasura and graphql APIs
  • towel: a variety of utility services that provide maintenance routines, because a towel is just about the most massively useful thing an interstellar hitchhiker can carry
    • scheduler: a service that searches for flows that need scheduling and creates new flow runs
    • lazarus: a service that detects when flow runs ended abnormally and should be restarted
    • zombie_killer: a service that detects when task runs ended abnormally and should be failed

These services are intended to be run within Docker and some CLI commands require docker-compose which helps orchestrate running multiple Docker containers simultaneously.

Installation

  1. Don't Panic.

  2. Make sure you have Python 3.7+ and Prefect installed:

    pip install prefect
    
  3. Clone this repo, then install Prefect Server and its dependencies by running:

    pip install -e .
    cd services/apollo && npm install
    

Note: if installing for local development, it is important to install using the -e flag with [dev] extras: pip install -e ".[dev]"

Running the system as a developer

Note: for deploying Prefect Server, please use the prefect server start CLI command in Prefect Core 0.13.0+.

If you are doing local development on Prefect Server, it is best to run most services as local processes. This allows for hot-reloading as code changes, setting debugging breakpoints, and generally speeds up the pace of iteration.

In order to run the system:

  1. Start the database and Hasura in Docker:

    prefect-server dev infrastructure

    If when starting the infrastructure, you receive an error message stating infrastructure_hasura_1 exited with code 137, it is likely a memory issue with Docker. Bumping Docker Memory to 8GB should solve this.

  2. Run the database migrations and apply Hasura metadata:

    prefect-server database upgrade
  3. In a new terminal, start the services locally:

    prefect-server dev services

You can use the -i (include) or -e (exclude) flags to choose specific services:

# run only apollo and graphql
prefect-server dev services -i apollo,graphql

# run all except graphql
prefect-server dev services -e graphql

Running tests

Prefect Server has three types of tests:

  • unit tests: used to validate individual functions
  • service tests: used to verify functionality throughout Prefect Server
  • integration tests: used to verify functionality between Prefect Core and Server

Prefect Server uses pytest for testing. Tests are organized in a way that generally mimics the src directory. For example, in order to run all unit tests for the API and the GraphQL server, run:

pytest tests/api tests/graphql

Unit tests can be run with only prefect-server dev infrastructure running. Service and integration tests require Prefect Server's services to be running as well.

Filing an issue

Whether you'd like a feature or you're seeing a bug, we welcome users filing issues. Helpful bug issues include:

  • the circumstances surrounding the bug
  • the desired behavior
  • a minimum reproducible example

Helpful feature requests include:

  • a description of the feature
  • how the feature could be helpful
  • if applicable, initial thoughts about feature implementation

Please be aware that Prefect Server feature requests that might compete with propriety Prefect Cloud features will be rejected.

License

Prefect Server is lovingly made by the team at Prefect and licensed under the Prefect Community License. For information on how you can use, extend, and depend on Prefect Server to automate your data, take a look at our license or contact us.

server's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

server's Issues

PayloadTooLargeError and graphql foreign key violation on server start

Description

I am seeing these errors when I run prefect server start on 0.13.8 on a particular machine.
Link to error log gist

I was running an older version of prefect, either 0.13.4 or 0.13.6, ran a workflow over dask executor, then upgraded to 0.13.8 and restarted it. I've tried clearing all docker images and volumes, and ~/.prefect/flows, and I still see the same error.

Expected Behavior

No error.

Reproduction

I've tried reproducing this on 2 other machines but haven't been able to.

Environment

Ubuntu 18.04

This is my config.toml

backend = "server"

[server]
host = "http://172.18.1.3"

    [server.ui]
    apollo_url = "http://172.18.1.3:4200/graphql"

[logging]
# The logging level: NOTSET, DEBUG, INFO, WARNING, ERROR, or CRITICAL
level = "WARNING"

[context.secrets]
SLACK_WEBHOOK_URL = "<redacted>"

BUG: No Notification raised for a task run that needs approval

Description

When a task run has a manual only trigger it can be approved in the UI. In Cloud we raise a notification about the task run that needs approval. We have some of the code for this in server but it's not fully working i.e. no notification is given.

Expected Behavior

When I run a flow that has a task with manual approval, server creates a notification to let me know it needs approving.

Reproduction

Create a flow with a manual_ only trigger and register and run it in server. You should see a notification that it needs approval but won't.

Environment

Graphql Apollo service failure under Ubuntu 20.04 LTS

Description

The initial problem was detected by following the prefect installation instructions using pip under python 3.7.6. After a successful installation sequence and starting the local server using 'prefect server start', the service stack runs up but an error connecting to the graphql server fails. Attempts using 'prefect backend server' caused no change in the problem. Editing of both the local .prefect/config.toml and the one installed under site-packages/prfect/config.toml to turn on debugging and force different IP address assignments didn't produce any difference in behavior.

2020-06-22T21:26:17.180Z Error fetching GraphQL health: FetchError: request to http://graphql:4201/health failed, reason: connect ECONNREFUSED 172.28.0.4:4201
apollo_1 | 2020-06-22T21:26:17.181Z Error: Could not safely build a schema!
apollo_1 | at safelyBuildSchema (/apollo/dist/index.js:129:11)
apollo_1 | at process._tickCallback (internal/process/next_tick.js:68:7) Could not safely build a schema! Error: Could not safely build a schema!
apollo_1 | at safelyBuildSchema (/apollo/dist/index.js:129:11)
apollo_1 | at process._tickCallback (internal/process/next_tick.js:68:7)
apollo_1 | 2020-06-22T21:26:17.182Z
apollo_1 | Trying again in 3 seconds...

Corresponding to this error was a API status problem on the web UI:

image

Several different system configurations (conda 3.7, python 3.8 pip install no conda) and modifications of the configuration file were tried. The developer instructions :

https://github.com/PrefectHQ/prefect/tree/master/server

Were then followed to download and build sub-components of the service stack. In particular the apollo service was independently tested :

prefect/server/services/apollo$ npm install> [email protected] postinstall /home/av_developer/Downloads/prefect/server/services/apollo/node_modules/@babel/node/node_modules/core-js

node -e "try{require('./postinstall')}catch(e){}"

Thank you for using core-js ( https://github.com/zloirock/core-js ) for polyfilling JavaScript standard library!

The project needs your help! Please consider supporting of core-js on Open Collective or Patreon:

https://opencollective.com/core-js
https://www.patreon.com/zloirock

Also, the author of core-js ( https://github.com/zloirock ) is looking for a good job -)

[email protected] postinstall /home/av_developer/Downloads/prefect/server/services/apollo/node_modules/apollo-env/node_modules/core-js
node -e "try{require('./postinstall')}catch(e){}"

[email protected] postinstall /home/av_developer/Downloads/prefect/server/services/apollo/node_modules/core-js
node -e "try{require('./postinstall')}catch(e){}"

@apollo/[email protected] postinstall /home/av_developer/Downloads/prefect/server/services/apollo/node_modules/@apollo/protobufjs
node scripts/postinstall

[email protected] postinstall /home/av_developer/Downloads/prefect/server/services/apollo/node_modules/nodemon
node bin/postinstall || exit 0

Love nodemon? You can now support the project via the open collective:

https://opencollective.com/nodemon/donate

npm WARN apollo No repository field.
npm WARN optional SKIPPING OPTIONAL DEPENDENCY: [email protected] (node_modules/fsevents):
npm WARN notsup SKIPPING OPTIONAL DEPENDENCY: Unsupported platform for [email protected]: wanted {"os":"darwin","arch":"any"} (current: {"os":"linux","arch":"x64"})

added 1090 packages from 610 contributors and audited 1165 packages in 16.751s

and then tested:

prefect/server/services/apollo

nodemon --exec babel-node src/index.js

[nodemon] 1.19.4
[nodemon] to restart at any time, enter rs
[nodemon] watching dir(s): src//* ../../src//*
[nodemon] watching extensions: js,mjs,coffee,litcoffee,json,py
[nodemon] starting babel-node src/index.js
2020-06-22T21:30:34.188Z Error fetching GraphQL health: FetchError: request to http://localhost:4201/health failed, reason: connect ECONNREFUSED 127.0.0.1:4201
2020-06-22T21:30:34.197Z Error: Could not safely build a schema!
at safelyBuildSchema (/home/av_developer/Downloads/prefect/server/services/apollo/src/index.js:126:11)
at process._tickCallback (internal/process/next_tick.js:68:7) Could not safely build a schema! Error: Could not safely build a schema!
at safelyBuildSchema (/home/av_developer/Downloads/prefect/server/services/apollo/src/index.js:126:11)
at process._tickCallback (internal/process/next_tick.js:68:7)
2020-06-22T21:30:34.198Z
Trying again in 3 seconds...

producing the same error as in the prior installation attempts.

Expected Behavior

I expected the installation guide instructions to result in a working test platform and the grahql service to startup and allow proper connection.

Reproduction

This problem occurred consistently for me for all install attempts using the Ubuntu 20.04 LTS installation under VMware Fusion virtualization. The method of reproduction involving the least amount of software was to follow the developer instructions and then build and test only the apollo service sub-component under prefect/server/services/apollo.

Environment

Ubuntu 20.04 LTS platform with all updates running under VMware Fusion 11.5.5 under Mac OS X 10.15.4. Anaconda 4.8.3 was used for many of the installation attempts with python 3.7.6. npm version was 6.14.4.

Example 3.8 env :

prefect diagnostics
{
"config_overrides": {
"server": {
"ui": {
"graphql_url": true
}
}
},
"env_vars": [],
"system_information": {
"platform": "Linux-5.4.0-37-generic-x86_64-with-glibc2.29",
"prefect_version": "0.12.0",
"python_version": "3.8.2"
}
}

Example anaconda env :

prefect diagnostics
{
"config_overrides": {
"server": {
"ui": {
"graphql_url": true
}
}
},
"env_vars": [],
"system_information": {
"platform": "Linux-5.4.0-37-generic-x86_64-with-debian-bullseye-sid",
"prefect_version": "0.12.0",
"python_version": "3.7.6"
}
}

Flow Group Schedule and Parameters

Description

If I turn a flow group schedule on when a flow is missing required parameters, I get an error message and no runs are scheduled. However, flow.is_schedule_active is still updated to true.

Expected Behavior

flow.is_schedule_active would stay as false because we weren't able to set the schedule to running. (This is important because in the UI we use flow.is_schedule_active to decide what the schedule toggle should show.)

Reproduction

Register a flow with a required parameter that is not yet set. (Or with a required parameter set and then unset it using flow group parameters). Then set a flow group schedule. Try to turn on the schedule (e.g. using the schedule toggle in the UI or with the set_schedule_active mutation.

Environment

This was in dev

Reschedule scheduled flow runs whenever a relevant change to a flow group's settings occurs

Opened from the Prefect Public Slack Community

michael.hadorn: Hi there 🙂
Is it by design that scheduled flow will not use updated flow default parameters.
e.g.
• schedule a flow
• let them run
• update the default params
• for the next run the old/wrong default values are used
On Prefect local server 0.14.12

Btw: There is no way, to specify params for a specific schedule, right? That would be awesome and necessary in a way.

anze.kravanja: https://prefect-community.slack.com/archives/CL09KU1K7/p1611295611083600?thread_ts=1611295611.083600&cid=CL09KU1K7|https://prefect-community.slack.com/archives/CL09KU1K7/p1611295611083600?thread_ts=1611295611.083600&cid=CL09KU1K7

jim: > There is no way, to specify params for a specific schedule, right? That would be awesome and necessary in a way
This is supported, see: https://docs.prefect.io/core/concepts/schedules.html#varying-parameter-values

jim: As for your original issue, I think this is due to scheduled runs being created in batches. So your later executing runs are using the old values (since the actual DB record was created before your updates). I can see this being confusing, I'll open an issue.

jim: <@ULVA73B9P> open "Reschedule scheduled flow runs whenever a relevant change to a flow group's settings occurs" in server

Original thread can be found here.

Enabling scheduling on a flow, before creating a schedule for that flow, does not create future job runs

Description

If you register a flow with Prefect Server, then create a schedule for that flow, and then enable scheduling with the toggle at the top-right, Prefect Server creates future job runs according to the schedule. However, if you first enable scheduling, then subsequently create a schedule, Prefect Server does not create any future job runs.

Expected Behavior

I expect that Prefect Server would create future job runs according to the just-created schedule.

Reproduction

  1. Register a flow with Prefect Server.
  2. On the flow's page, click the "Schedule" toggle at the top-right of the screen to enable the schedule.
  3. Under "Settings" -> "Schedules", create any schedule for the flow (I went with the default "hourly" one).
  4. Go back to the flow's page. No job runs are scheduled for the future.

Environment

I'm experiencing this with 0.14.2 but also noticed the same behaviour in 0.13.x.

$ sudo docker ps
CONTAINER ID        IMAGE                          COMMAND                  CREATED             STATUS              PORTS                            NAMES
456ee1654b22        prefecthq/ui:core-0.14.2       "/docker-entrypoint.…"   5 days ago          Up 5 days           80/tcp, 0.0.0.0:8080->8080/tcp   tmp_ui_1
6e299060f287        prefecthq/server:core-0.14.2   "python src/prefect_…"   5 days ago          Up 5 days                                            tmp_towel_1
dbfa22a77f9f        prefecthq/apollo:core-0.14.2   "docker-entrypoint.s…"   5 days ago          Up 5 days           0.0.0.0:4200->4200/tcp           tmp_apollo_1
0704793c681e        prefecthq/server:core-0.14.2   "bash -c 'prefect-se…"   5 days ago          Up 5 days           0.0.0.0:4201->4201/tcp           tmp_graphql_1
5620b36799e8        hasura/graphql-engine:v1.3.0   "graphql-engine serve"   5 days ago          Up 5 days           0.0.0.0:3000->3000/tcp           tmp_hasura_1
c442f4b1fc4d        postgres:11                    "docker-entrypoint.s…"   5 days ago          Up 5 days           0.0.0.0:5432->5432/tcp           tmp_postgres_1

Foreign key violation error when starting flow with de-registered but active agent

Description

When trying to submit a flow run without registered agent (i removed agent via ui) (but agent was still running though), prefect server fails with cryptic exception shown to user:

[{'message': 'Foreign key violation.', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Foreign key violation.'}}}]

Expected Behavior

Do not schedule flow run into de-registered agents

Reproduction

  1. Create a minimal flow
  2. Spin up prefect server setup
  3. Start any agent (i was using docker agent)
  4. Wait until agent shows up, and then remove it from UI
  5. Start a flow

Environment

Some docker-compose logs

hs-databricks-agent_1  | [2020-10-23 08:21:00,939] DEBUG - hs-databricks agent | Found flow runs ['b13ab3e8-b781-429c-a65b-96a551683dda']
hs-databricks-agent_1  | [2020-10-23 08:21:00,940] DEBUG - hs-databricks agent | Querying flow run metadata
hs-databricks-agent_1  | [2020-10-23 08:21:00,958] INFO - hs-databricks agent | Found 1 flow run(s) to submit for execution.
hs-databricks-agent_1  | [2020-10-23 08:21:00,959] DEBUG - hs-databricks agent | Next query for flow runs in 0.25 seconds
hs-databricks-agent_1  | [2020-10-23 08:21:00,959] DEBUG - hs-databricks agent | Updating states for flow run b13ab3e8-b781-429c-a65b-96a551683dda
hs-databricks-agent_1  | [2020-10-23 08:21:00,961] DEBUG - hs-databricks agent | Flow run b13ab3e8-b781-429c-a65b-96a551683dda is in a Scheduled state, updating to Submitted
postgres_1             | 2020-10-23 08:21:00.998 UTC [4573] ERROR:  insert or update on table "flow_run" violates foreign key constraint "flow_run_agent_id_fkey"
postgres_1             | 2020-10-23 08:21:00.998 UTC [4573] DETAIL:  Key (agent_id)=(c42a93cb-8e89-4d16-b6e3-01e8c08e0c01) is not present in table "agent".
postgres_1             | 2020-10-23 08:21:00.998 UTC [4573] STATEMENT:  WITH "flow_run__mutation_result_alias" AS (UPDATE "public"."flow_run" SET "agent_id" = ($1)::uuid  WHERE (('true') AND ((((("public"."flow_run"."id") = (($2)::uuid)) AND ('true')) AND ('true')) AND ('true'))) RETURNING * , CASE WHEN 'true' THEN NULL ELSE "hdb_catalog"."check_violation"('update check constraint failed')  END ), "flow_run__all_columns_alias" AS (SELECT  "id" , "tenant_id" , "created" , "flow_id" , "parameters" , "scheduled_start_time" , "auto_scheduled" , "heartbeat" , "start_time" , "end_time" , "version" , "state" , "state_timestamp" , "state_message" , "state_result" , "state_start_time" , "serialized_state" , "name" , "context" , "times_resurrected" , "updated" , "idempotency_key" , "agent_id" , "labels"  FROM "flow_run__mutation_result_alias"      ) SELECT  json_build_object('affected_rows', (SELECT  COUNT(*)  FROM "flow_run__all_columns_alias"      ) )        
hasura_1               | {"type":"http-log","timestamp":"2020-10-23T08:21:00.929+0000","level":"error","detail":{"operation":{"user_vars":{"x-hasura-role":"admin"},"error":{"path":"$","error":"Foreign key violation. insert or update on table \"flow_run\" violates foreign key constraint \"flow_run_agent_id_fkey\"","code":"constraint-violation"},"request_id":"16fac4d9-b87e-41a2-ae98-f48b0e6c615c","response_size":173,"query":{"variables":{"update_set":{"agent_id":"c42a93cb-8e89-4d16-b6e3-01e8c08e0c01"},"update_where":{"id":{"_eq":"b13ab3e8-b781-429c-a65b-96a551683dda"}}},"query":"mutation($update_where: flow_run_bool_exp!, $update_set: flow_run_set_input) {\n    update: update_flow_run(where: $update_where, _set: $update_set) {\n        affected_rows\n    }\n}"}},"http_info":{"status":400,"http_version":"HTTP/1.1","url":"/v1alpha1/graphql","ip":"172.18.0.4","method":"POST","content_encoding":null}}}
graphql_1              | Foreign key violation.
graphql_1              | 
graphql_1              | GraphQL request:2:3
graphql_1              | 1 | mutation ($input: set_flow_run_states_input!) {
graphql_1              | 2 |   set_flow_run_states(input: $input) {
graphql_1              |   |   ^
graphql_1              | 3 |     states {
graphql_1              | Traceback (most recent call last):
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 85, in execute
graphql_1              |     as_box=as_box,
graphql_1              |   File "/prefect-server/src/prefect_server/utilities/graphql.py", line 80, in execute
graphql_1              |     raise ValueError(result["errors"])
graphql_1              | ValueError: [{'extensions': {'path': '$', 'code': 'constraint-violation'}, 'message': 'Foreign key violation. insert or update on table "flow_run" violates foreign key constraint "flow_run_agent_id_fkey"'}]
graphql_1              | 
graphql_1              | During handling of the above exception, another exception occurred:
graphql_1              | 
graphql_1              | Traceback (most recent call last):
graphql_1              |   File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
graphql_1              |     return await result
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
graphql_1              |     result = await result
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
graphql_1              |     *[check_size_and_set_state(state_input) for state_input in input["states"]]
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
graphql_1              |     agent_id=agent_id,
graphql_1              |   File "/prefect-server/src/prefect_server/api/states.py", line 133, in set_flow_run_state
graphql_1              |     await api.runs.update_flow_run_agent(flow_run_id=flow_run_id, agent_id=agent_id)
graphql_1              |   File "/prefect-server/src/prefect_server/api/runs.py", line 410, in update_flow_run_agent
graphql_1              |     set={"agent_id": agent_id}
graphql_1              |   File "/prefect-server/src/prefect_server/database/orm.py", line 406, in update
graphql_1              |     run_mutation=run_mutation,
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 391, in update
graphql_1              |     result = await self.execute_mutations_in_transaction(mutations=[graphql])
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 165, in execute_mutations_in_transaction
graphql_1              |     as_box=as_box,
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 91, in execute
graphql_1              |     raise ValueError("Foreign key violation.")
graphql_1              | ValueError: Foreign key violation.
graphql_1              | 
graphql_1              | The above exception was the direct cause of the following exception:
graphql_1              | 
graphql_1              | Traceback (most recent call last):
graphql_1              |   File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 674, in await_completed
graphql_1              |     return await completed
graphql_1              |   File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 659, in await_result
graphql_1              |     return_type, field_nodes, info, path, await result
graphql_1              |   File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 733, in complete_value
graphql_1              |     raise result
graphql_1              |   File "/usr/local/lib/python3.7/site-packages/graphql/execution/execute.py", line 628, in await_result
graphql_1              |     return await result
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/extensions.py", line 52, in resolve
graphql_1              |     result = await result
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/states.py", line 45, in resolve_set_flow_run_states
graphql_1              |     *[check_size_and_set_state(state_input) for state_input in input["states"]]
graphql_1              |   File "/prefect-server/src/prefect_server/graphql/states.py", line 39, in check_size_and_set_state
graphql_1              |     agent_id=agent_id,
graphql_1              |   File "/prefect-server/src/prefect_server/api/states.py", line 133, in set_flow_run_state
graphql_1              |     await api.runs.update_flow_run_agent(flow_run_id=flow_run_id, agent_id=agent_id)
graphql_1              |   File "/prefect-server/src/prefect_server/api/runs.py", line 410, in update_flow_run_agent
graphql_1              |     set={"agent_id": agent_id}
graphql_1              |   File "/prefect-server/src/prefect_server/database/orm.py", line 406, in update
graphql_1              |     run_mutation=run_mutation,
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 391, in update
graphql_1              |     result = await self.execute_mutations_in_transaction(mutations=[graphql])
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 165, in execute_mutations_in_transaction
graphql_1              |     as_box=as_box,
graphql_1              |   File "/prefect-server/src/prefect_server/database/hasura.py", line 91, in execute
graphql_1              |     raise ValueError("Foreign key violation.")
graphql_1              | graphql.error.graphql_error.GraphQLError: Foreign key violation.
graphql_1              | 
graphql_1              | GraphQL request:2:3
graphql_1              | 1 | mutation ($input: set_flow_run_states_input!) {
graphql_1              | 2 |   set_flow_run_states(input: $input) {
graphql_1              |   |   ^
graphql_1              | 3 |     states {
hs-databricks-agent_1  | [2020-10-23 08:21:01,013] ERROR - hs-databricks agent | Logging platform error for flow run b13ab3e8-b781-429c-a65b-96a551683dda
graphql_1              | INFO:     172.18.0.7:39012 - "POST /graphql/ HTTP/1.1" 200 OK
graphql_1              | INFO:     172.18.0.7:39016 - "POST /graphql/ HTTP/1.1" 200 OK
graphql_1              | INFO:     172.18.0.7:39020 - "POST /graphql/ HTTP/1.1" 200 OK
hs-databricks-agent_1  | [2020-10-23 08:21:01,066] ERROR - hs-databricks agent | Error while deploying flow: ClientError([{'message': 'Foreign key violation.', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_flow_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'Foreign key violation.'}}}])
hs-databricks-agent_1  | [2020-10-23 08:21:01,067] DEBUG - hs-databricks agent | Completed flow run submission (id: b13ab3e8-b781-429c-a65b-96a551683dda)
  • postgres:11
  • hasura/graphql-engine:v1.3.0
  • prefecthq/server:2020-10-08
  • prefecthq/apollo:2020-10-07
  • prefecthq/ui:2020-10-05
  • prefecthq/prefect:0.13.10-python3.7 as agent

Modify role so prefect-server helm chart works with dask-kubernetes.KubeConfig

Current behavior

I have prefect-server set up via minikube and the official helm charts (prefect-server-2021.03.10) including a Kubernetes Agent set up via helm chart. I'm using Docker Storage and a DaskExecutor with dask-kubernetes.KubeConfig to run a simple flow. I'm able to register and run the flow, but was getting kubernetes permission errors causing the flow to fail. I had to add the following permission to the prefect-server-agent-role in order to get the flow to run successfully.

  - verbs:
      - '*'
    apiGroups:
      - ''
    resources:
      - events
      - pods
      - pods/log
      - services
  - verbs:
      - '*'
    apiGroups:
      - policy
    resources:
      - poddisruptionbudgets

Proposed behavior

Add the above permissions to the role in the helm chart. Admittedly, I am new to prefect and kubernetes so I'm not sure if there is a better way or not.

Example

Easier use with dask-kubernetes without custom configuration

Add a "debug mode" to the Hasura client

Current behavior

We have limited functionality to log bad GraphQL queries from the hasura client, but no way to watch queries as they happen.

Proposed behavior

A new config (prefect_server.hasura.debug_mode?) which prints all query details whenever a query is issued: Query, variables, headers, etc.

Example

This would be useful when trying to understand how Python code is being translated into GQL

Unable to Run Flow Through Local UI

Description

A clear description of the bug

Any flow that I register and try to run through the UI can get Submitted for Execution but never Executed.

Expected Behavior

What did you expect to happen instead?

In the best case, I would hope the flow to run without an issue. At the minimum, I would hope that I could get a more informative error message of what I am missing. To my eyes, I have followed the documentation and tutorials exactly.

Reproduction

A minimal example that exhibits the behavior.

  1. I set up an account with Prefect Cloud so that I could create TENANT and RUNNER tokens.
  2. I added the TENANT token to my local ~/.prefect/config.toml.
  3. I downloaded the newest version of prefect from PyPi. pip3 install prefect
  4. I started the server. prefect server start
  5. I started an agent. prefect agent start -t {{RUNNER_TOKEN}}
  6. I registered the sample flow provided in a walkthrough. python3 hello_flow.py
import prefect
from prefect import task, Flow

@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("Hello, Cloud!")

flow = Flow("hello-flow", tasks=[hello_task])
flow.register()
  1. I go through my local UI on localhost:8080 and click "Run flow".
  2. The flow transitions through the states: Flow Run Scheduled and Submitted for Execution then just hangs. The flow is never executed.
  3. Here are the logs.
June 7th 2020,1:06:37pm 	agent	INFO	Submitted for execution: PID: 24731
June 7th 2020,1:06:38pm 	prefect.CloudFlowRunner	INFO	Beginning Flow run for 'hello-flow'
June 7th 2020,1:06:38pm 	prefect.CloudFlowRunner	DEBUG	Failed to retrieve flow state with error: BoxKeyError("'serialized_state'",)
  1. There is nothing on Google or in the documentation about that error message.
  2. Note, I also tried setting up a local dask cluster and specifying that as my DaskExecutor address in config.toml but that didn't work either.

Environment

Any additional information about your environment
Optionally run prefect diagnostics from the command line and paste the information here

{
  "config_overrides": {
    "backend": true,
    "cloud": {
      "auth_token": true
    },
    "engine": {
      "executor": {
        "dask": {
          "address": true,
          "cluster_class": true
        },
        "default_class": true
      },
      "flow_runner": {
        "default_class": true
      },
      "task_runner": {
        "default_class": true
      }
    },
    "server": {
      "telemetry": {
        "enabled": true
      }
    }
  },
  "env_vars": [],
  "system_information": {
    "platform": "Linux-4.15.0-101-generic-x86_64-with-Ubuntu-18.04-bionic",
    "prefect_version": "0.11.5",
    "python_version": "3.6.9"
  }
}

Docker container having issue communicating to local server API

Description

Attempting to use a Flow within Docker from

https://docs.prefect.io/orchestration/tutorial/docker.html#persisting-your-flow-with-docker-storage

results in Flow going into Submitted state and appearing to stay there.

Expected Behavior

Flow to complete run normally

Reproduction

prefect backend server
prefect server start
prefect agent start docker --show-flow-logs

execute the hello-docker workflow (example given in tutorial - linked above)
Outputs:
https://gist.github.com/philipmac/3780609c3a69ca77881e9a9eef7fefdc

Use localhost:8080 to run workflow.

Examine outputs:
https://gist.github.com/philipmac/895d14feceda868e63be34f5953dd325

Environment

$ prefect diagnostics
{
  "config_overrides": {
    "PREFECT__CONTEXT__SECRETS__AWS_CREDENTIALS": true,
    "PREFECT__FLOWS__CHECKPOINTING": true
  },
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.4.0-40-generic-x86_64-with-glibc2.29",
    "prefect_version": "0.11.5",
    "python_version": "3.8.2"
  }
}

Explore removing the use of a version_group_id

Current behavior

Currently when creating a flow you can provide a version_group_id that is used to version the flow (defaults to a slugified flow name and project name). Seeing as a version group isn't necessarily a thing this deserves some better naming / documentation.

Proposed behavior

Rename version_group_id to something like flow_group_name however we should still maintain version_group_id at the API level for backwards compatibility but remove references to it in the documentation.

cc @cicdw @znicholasbrown from discussion

config.toml file in the server does not reflect to UI

Description

Hello!
Please look at the attached screenshot. I am deploying the Prefect Server on the EC2 instance. I changed the address for server.ui.graphql_url in config.toml to reflect this. However this isn't reflected in UI, which keeps referring to localhost.

image

Result

The localhost is changed to the ip adress

Deployment Step I did

  1. Used the following config.toml file:
[server]
host = "http://34.209.184.154"
port = "4200"
host_port = "4200"
endpoint = "${server.host}:${server.port}"

        [server.graphql]
        host = "34.209.184.154"

        [server.ui]
        host = "http://34.209.184.154"
        port = "8080"
        host_port = "8080"
        apollo_url = "http://34.209.184.154:4200/graphql"
  1. docker system prune -a ( optional )
  2. prefect server start
  3. http://34.209.184.154:8080/

Could you please let me know what I'm missing?
Looking forward to hearing from you!
Thanks.

Originally posted by @aj-super in PrefectHQ/prefect#3580

Resolve security vulnerabilities in Docker containers of Prefect server

Hello,
We are evaluating prefect core to replace our workflow management. As part of security protocol all container images are scanned which included all the 5 docker containers of prefect core (server, Apollo, ui, Hasura/graphql & Postgres:11). All of these containers seem to have vulnerabilities and most of them are medium to low.
However, the ones which are critical & high needs to be resolved before we can use them for prod, based on our observation these packages are system libs (like linux kernel, shadow, glibc etc. something which prefect doesn’t directly use). Is it possible to update the base image to say Ubuntu 20.4 (in our tests between Debian and ubuntu, the latter seems to have just about 16 [1 medium rest all low vulnerabilities])?

Screen shots -

Screen Shot 2021-02-26 at 17 01 29
Screen Shot 2021-02-26 at 17 01 44
Screen Shot 2021-02-26 at 16 15 05

PS - Scans have been made using GCP’s vulnerability scan service in GCR

Any other recommended approach we could follow to over come this issue is greatly appreciated. 🙂

Slack reference

Originally posted by @niakki in PrefectHQ/prefect#4180

Running Prefect Server without Docker ?

Hi Team,

My company is looking at automation tools and as part of that we're reviewing Prefect. One of the restrictions we have is that we can't use Docker in our systems which the Prefect Server has a dependency on as you know.

Hoping you can help clarify if it would be possible to run without Docker and if so how?

Many Thanks,
Kaashif Khawaja

Create a set_flow_schedule graphql mutation

Current behavior

Currently, there are no graphql mutations to provide creating a schedule for a given flow at run time (after the flow has been created). Instead, we need to either

  1. use the graphql update_flow end point
  2. use the graphql set_flow_group_schedule

Proposed behavior

Introduce a set_flow_schedule into the graphql mutation into the graphql endpoint.

Example

This enhancement would be very useful so that we can save the schedules all in the flow information and it is more convenient.

Don't rely on cascading deletes when deleting flow runs

Current behavior

Right now, there are many instances where we delete flow runs; the most important for the purposes of this issue is when toggling a schedule (we delete all auto-scheduled runs in the future), but other situations are archiving, manual deletion, etc.

As users deploy larger and larger Flows, there are situations in which the delete cascades actually timeout (in most circumstances user-facing calls should have low timeout thresholds). When this occurs, important behavior is not executed that can leave room for confusion and bugs.

Proposed behavior

We should introduce a delete flow run API function that performs the logic of a cascading delete manually (it should accept a list of flow run IDs); first we delete task run states, then task runs, then the flow run. Additionally we should audit run deletes and replace them with ID lookups that then are passed to this new API function.

Example

Reduce timeout scenarios and make deletion more robust.

Add Helm chart instructions for deployment per cloud provider

Current behavior

Right now, the chart will run provided a K8s cluster is already stood up in an environment. However, instructions for setting up (our recommended method of using) an external postgres are not provided right now. Additionally, there are no instructions or recommendations for the setup of the cluster itself.

Proposed behavior

  • Add external database integration instructions for AWS, GCloud, Azure
  • Add cluster setup recommendations
  • Potentially add cluster IAC examples

Add submitted state locking to Prefect Server

If a user has multiple agents with the same set of labels, multiple agents can pick up and start the same flow run. When running with Prefect Cloud, this will be caught via submitted state locking, so only one flow run can progress (and the others will abort). This feature was never ported over to Prefect Server, so currently users can end up with multiple executions of the same flow run.

Plugins/contrib harness for server

Use Case

Please provide a use case to help us understand your request in context
We want to be able to support a contrib structure for server, INCLUDING changes that may require database migrations/hasura metadata/API additions, but that are OPT-IN and separate from the mainline development of Core. This way we can allow for experimental contrib features to exist alongside Core.

With the current stack we perceive the following overarching problems that need to be solved:

  • managing our alembic migration chain that is as clean as possible
  • being able to compose the hasura metadata -- because it is not additive experience like alembic migrations
  • When to interject to compose/resolve alembic migrations/hasura metadata (ex. prefect-server dev migrate)

Solution

Please describe your ideal solution
Right now this is a collaborative effort between @lauralorenz , @alexisprince1994 and @zdhughes though other thoughts are welcome!

Current idea:

  1. Contrib changes that touch the database must be denormalized away from the Core tables and additive only
  2. Contrib changes will have their own alembic branch (they may use depends_on to 'reach over' to the other branch, see alembic docs on branch dependencies). This way we can use alembic's existing infrastructure to support migration lineages outside of the mainline Core development lineage
  3. Contrib devs should be able to use Hasura to implement hasura queries, which should be versioned away from the mainline metadata
  4. We must provide some homegrown way for the hasura version to be merged with the official Core metadata because there isn't a first class Hasura way to do so
  5. There should be some explanation of how a 'promotion' would take place, which probably includes a merge migration between the plugin's branch and Core branch on alembic (which is supported by alembic), and committing a merged metadata into the mainline Hasura metadata file.

Experiments still to do:

  • define where in the config plugins will be listed/turned on for later hasura metadata merge detection
  • alembic branch with Alex's PR and/or a hello world type plugin
    • make a fake dependent migration on top just to see how that works
  • document CLI to produce the versioned metadata.yml-{pluginname} for above migrations ^
    • make a thing that makes the diff of this to the Core metadata for committing to the plugin
  • update our hasura steps in the cli to recognize plugins and concatenate the additive files and puts it in a custom metadata file (maybe ~/.prefect_server/metadata.yml)
    • update the part in the server CLI that reads from ~/.prefect_server/metadata.yml or wherever the user configured it INSTEAD OF FROM THE PACKAGE

Alternatives

Please describe any alternatives you've considered, even if you've dismissed them

Below is quite a few ideas we've dismissed over the time talking about them:

Idea 1: Runtime linearization of migrations
whenever you have a plugin, leave its 'revises' field blank
- run all your core migrations
- detect the latest core migration at runtime
- fill in the blank there for the revises field for your plugin
what about when people upgrade server and server has a new migration after they have installed plugins? The 'revises' field for server is wrong now
- Can we do alembic merge on the fly for them?
- Is this too artisanal in terms of people having custom head migration numbers?

Idea 2: Don't let contrib devs use Hasura
- Hasura really only gives you 'free' GraphQL Queries, we could potentially force plugin authors to write their own in the .graphql files instead so we don't have to deal with resolving hasura metadata

Idea 3: Two databases that can speak to each other very well
- safe core database that is linear in terms of alembic and hasura metadata
- plugins sandbox that is linear in terms of alembic -- how/should we merge hasura?

Add healthcheck for apollo server

Current behavior

at the moment, there is no endpoint for the apollo server, to indicate that it's alive. This creates problems in automated deployment of kubernetes agents. In specific, if the agent starts before apollo is ready, it will fail to register itself upon start up and never recover unless restarted.

Proposed behavior

add endpoint indicating that apollo is ready to receive requests

Example

It will allow using a reliable probe for liveliness and readiness in the context of a Kubernetes deployment

Flow Run Page -> Task Run Name Search Fails with `task_run_name` argument

Description

From the Task Run table on the Flow Run page, it's possible to search for the name of a particular task run and get a list. However, the search fails to return results. The significant difference that I can see between this Flow Run and other Flow Runs is that I use the task_run_name feature to define my Task Run names using input arguments. No searches work for the Flow Runs in question on this page.

There's no error in chrome devtools that I can see.

I didn't open this ticket on the UI because I looked at the UI code and it seems to be doing everything correctly. Searching for dynamically generated names works at the flow level and I wonder if it's something about the nature of this query specifically.

Expected Behavior

Return a subset of the Task Runs with the search results.

Reproduction

https://cloud.prefect.io/prefect/flow-run/7fd9ed02-f19a-4aeb-a32a-ed7fa4b99908
Screen Shot 2021-02-24 at 12 09 35 PM
Screen Shot 2021-02-24 at 12 09 18 PM

Environment

Helm: Document support for NodePort access to services

Is there a way to run using a NodePort configuration instead of ClusterIP or LoadBalancer, so that we can have internal access to the Prefect UI without having to port forward or make use of expensive AWS load balancers?

Originally posted by @tonycpsu in #123 (comment)

We should document this pattern and perhaps change the default.

CPU is busy even when idle

Description

Running 0.14.6 for a few weeks with no issues, I decided to upgrade to 0.14.12. All looks good except CPU is busy even when there is no flow running. This was not the case with 0.14.6. Here is the output from docker stats --format "table {{.Name}}\t{{.CPUPerc}}"

NAME                CPU %
tmp_ui_1            0.00%
tmp_towel_1         0.00%
tmp_apollo_1        13.08%
tmp_graphql_1       3.56%  
tmp_hasura_1        11.23%
tmp_postgres_1      7.40%

And here is the docker version:

docker version
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Fri Dec 18 12:21:44 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Thu Dec 10 13:23:49 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.4
  GitCommit:        
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:        
 docker-init:
  Version:          0.18.0
  GitCommit:        

And docker-compose version:

docker-compose version 1.28.5, build c4eb3a1f

Then I upgraded my docker to latest version:

Client: Docker Engine - Community
 Version:           20.10.5
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        55c4c88
 Built:             Tue Mar  2 20:18:05 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.5
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       363e9a8
  Built:            Tue Mar  2 20:16:00 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.4
  GitCommit:        05f951a3781f4f2c1911b05e61c160e9c30eaa8e
 runc:
  Version:          1.0.0-rc93
  GitCommit:        12644e614e25b05da6fd08a38ffa0cfe1903fdec
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

But behavior did not change.

I could not downgrade to previous working version (0.14.6) due to DB migration not allowing downgrade. So I tried on a different machine (Mac laptop) with same docker version (20.10.5).
I upgraded my local Prefect from 0.14.6 to 0.14.12, and high CPU could be observed:

NAME           CPU %
t_ui_1         0.01%
t_towel_1      0.01%
t_apollo_1     3.96%
t_graphql_1    4.62%
t_hasura_1     3.36%
t_postgres_1   7.71%

And then downgraded to 0.14.6 and CPU looks normal:

NAME           CPU %
t_ui_1         0.02%
t_apollo_1     0.00%
t_towel_1      0.00%
t_graphql_1    0.48%
t_hasura_1     0.80%
t_postgres_1   0.53%

Expected Behavior

When Prefect is idle (no flows are running), CPU should not be busy, as shown above.

Reproduction

1- Install Prefect 0.14.6 (and maybe other versions?)
2- Use docker stats to observe CPU usage
3- Install Prefect 0.14.12
4- Use docker stats to observe CPU usage

Environment

Problem observed on:

Ubuntu:

Linux ip-172-31-49-216 5.4.0-1039-aws #41~18.04.1-Ubuntu SMP Fri Feb 26 11:20:14 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python 3.7.6

And

macos:
Darwin laptop 20.3.0 Darwin Kernel Version 20.3.0: Thu Jan 21 00:07:06 PST 2021; root:xnu-7195.81.3~1/RELEASE_X86_64 x86_64

Python 3.8.5

Explore flow concurrency limits per agent

Some users are encountering issues where an Agent will deploy too many flows at once (often after some downtime) resulting in infrastructure collapse. Cloud flow concurrency limits would fix this by limiting concurrent flows by label (which the agent would have as well), but per #90 implementing label-based concurrency limits in Server is complicated and full of race conditions. An alternative solution is to limit the number of flows an agent can spawn. This will prevent agents from spawning too many flows allowing other agents to process them or waiting until slots are available.

Since the agent_id is affixed to FlowRuns each agent can track how many flow runs it has spawned. The agent could just track this using local in-memory state but this would be hard to persist on agent restarts; additionally, this would be hard to generalize across all agent types. When an agent registers with the API, it will receive the same agent_id if the configuration values are the same making this solution robust. One concern is that users frequently start agents without names which can result in them sharing agent_ids despite being distinct agents. We will have to address this first, perhaps by requiring user-provided agent names.

Interestingly, this also will help with some user concerns about "work-balancing" in which they run many agents and desire for their flows to be equally distributed amongst them. Currently, agents collect work first-come-first-serve; with agent-level concurrency limits, a low limit can be set to encourage each agent to allow work to be collected by other agents. This is not a perfect balancing system, but will allow for rudimentary balancing where needed.

We will likely want to store agent concurrency limits in the AgentConfig (which is currently unused except for some beta Cloud features) as a FLOW_CONCURRENCY_LIMIT: int field. It would make sense to be able to define this from the CLI as well with a --flow-limit <int> argument which would override the AgentConfig value or allow you to set a limit without creating a separate config object. This would lead to some difficulty in display (in the UI) though, so perhaps we will not want to include it in the config and just include it in the Agent object.

This will likely require a lift for the Agent display in the UI to accommodate this additional information.

Inspired by @Samreay and @jacksund at #90 (comment)

This issue will track feasibility of implementation of this feature in Server (and consequently, Cloud).

Flow registration with unaltered scheduled status

Current behavior

When registering a Flow, the set_schedule_active argument of the prefect.core.flow.Flow.register method determines whether the flow's new version is automatically enabled or disabled depending on its value (True or False, respectively).

This effectively overrides any current schedule setup.
For instance, if a flow's scheduled status is "active" and the set_schedule_active argument is set to False when a new version of the flow is registered, the flow's scheduled status will be forced to "inactive".

Proposed behavior

There should be an option to leave the scheduled status unchanged, meaning that if the flow is active/inactive, the new registration should leave the new flow as active/inactive as well.

This will help avoid 2 potentially dangerous situations:

  • Accidentally enabling a flow that should remain disabled
  • Accidentally disabling a flow that should remain enabled

A suggestion is to have this behaviour whenever the set_schedule_active argument is given a value of None. This would need to be explicit though, given that the current default value of the set_schedule_active argument is True, and changing that default value to None will break backwards compatibility.

More context can be found in this Slack thread.

Example

Flow Currently Active

These will leave the status as "active":

flow.register()
flow.register(set_schedule_active=True)
flow.register(set_schedule_active=None)

These will set the status to "inactive"

flow.register(set_schedule_active=False)

Flow Currently Inactive

These will leave the status as "inactive":

flow.register(set_schedule_active=False)
flow.register(set_schedule_active=None)

These will set the status to "active"

flow.register()
flow.register(set_schedule_active=True)

UI cannot connect to Prefect Server hosted on Ubuntu VM

Opened from the Prefect Public Slack Community

info973: Hey guys, one question... I experienced this on multiple systems, usually all ubuntu server lts 20.04, installing with pip.
On every instance it will just not work out of the box. It will fail to connect to the graphql endpoint on localhost:4200. Sometimes it worked to set the docker internal ip of the container, sometimes it only worked to set the public ip (very insecure!), but it just never works out of the box - which is very annoying.
Am I missing something?

Steps I usually do
• new ubuntu server
• install docker (official repo) & docker-compose
• apt install python3 python3-dev python3-pip
• pip install prefect
• prefect backend server
• prefect server start
• prefect agent local start
And then I won't get a connection but it will redirect me to the "Welcome to your prefect ui" screen, where I try out one of the abovementioned IPs and - if I'm lucky - it will work, if not it will not work at all on that machine, even when pruning docker & reinstalling everything

curl localhost:4200, 127.0.0.1:4200, dockerinternalip:4200 all work fine

michael054: Hey <@U01QR9QMS4E>, this is concerning. I'll open an issue to track this in our Github repo as I'll need to spin up new machines to test this. One thing I've had success with in the past is using pip install docker-compose

michael054: <@ULVA73B9P> open "Server consistently fails to start on Ubuntu" in server

Original thread can be found here.

Helm: Add support for ingresses

Yeah, I think having an ingress shouldn't be on by default (or required), but supporting an opt-in option which requires some user configuration makes sense to me. This is what JupyterHub does as well, and it seems to work well. https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/advanced.html#ingress, https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/master/jupyterhub/templates/ingress.yaml.

Originally posted by @jcrist in #123 (comment)

Also see an example at https://devspace.sh/component-chart/docs/configuration/ingress

May be blocked by UI changes -- will update this note once I have more infomration.

Explore whether the Hasura `.get()` and `.exists()` methods can be safely removed

Current behavior

These methods are cruft that, if used, can be easily replaced with "proper" queries. https://github.com/PrefectHQ/server/blob/master/src/prefect_server/database/hasura.py#L170
https://github.com/PrefectHQ/server/blob/master/src/prefect_server/database/hasura.py#L186

In particular, they rely on a by_pk query that we want to avoid.

Note they are referenced in orm.py and possibly in the codebase.

Proposed behavior

Example

TikTok notifications

Current behavior

Please describe how the feature works today

Prefect Server / Cloud do not support TikTok notifications

Proposed behavior

Please describe your proposed change to the current behavior

Prefect Server / Cloud support TikTok notifications

Example

Please give an example of how the enhancement would be useful

Uh

Motivation

Please explain what prompted this enhancement

https://www.youtube.com/watch?v=ZAPifDwKaXk

Add mapped info route to GQL

Current behavior

Proposed behavior

Discussed IRL with @znicholasbrown

Given a task id and flow run id:

  • mapped parent task run id
  • minimum child start time
  • maximum child end time
  • count of child task runs by state

Example

Bug: ValidationError ({'_schema': 'Invalid data type: None'})

Description

When upgrading to version 0.13.8 (2020-11-29) and using mapped flows the log contains errors

Failed to retrieve task state with error: ValidationError({'_schema': 'Invalid data type: None'})
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/cloud/task_runner.py", line 190, in initialize_run
    task_run_info = self.client.get_task_run_info(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 1338, in get_task_run_info
    state = prefect.engine.state.State.deserialize(task_run_info.serialized_state)
  File "/usr/local/lib/python3.8/site-packages/prefect/engine/state.py", line 361, in deserialize
    state = StateSchema().load(json_blob)
  File "/usr/local/lib/python3.8/site-packages/marshmallow_oneofschema/one_of_schema.py", line 144, in load
    raise exc
marshmallow.exceptions.ValidationError: {'_schema': 'Invalid data type: None'}

And the tasks are pending without completing

Expected Behavior

Flows written for 0.13.17 to keep working with the new version

Reproduction

Sorry, I wasn't able to reproduce, as a "straightforward" mapping of a task over a list works well.

Environment

prefecthq/prefect:0.13.18-python3.8
prefecthq/apollo:core-0.13.18
prefecthq/server:core-0.13.18
prefecthq/server:core-0.13.18
prefecthq/server:core-0.13.18
prefecthq/ui:core-0.13.18

Towel container throws ERROR", "name": "prefect-server.Lazarus && "DEBUG", "name": "prefect-server.ZombieKiller

Description

I continue to get following errors in #prefect-server #towel container log definition (PFA below), not sure if this is worrisome or not. Disclaimer: I am able to access my UI and all other container logs for UI, GraphQL, Apollo and Hasura are all stable and fine. I am also able to access GraphQL playground, Hasura Console and healthcheck endpoint properly.

Expected Behavior

Towel should work in prefect-server and not throw Errors in logs

Reproduction

2021-02-01 01:06:15{"severity": "DEBUG", "name": "prefect-server.ZombieKiller", "message": "Sleeping for 120.0 seconds..."}
2021-02-01 01:04:48{"severity": "ERROR", "name": "prefect-server.Lazarus", "message": "Unexpected error: ConnectError(gaierror(-2, 'Name or service not known'))"}
2021-02-01 01:04:48{"severity": "DEBUG", "name": "prefect-server.Lazarus", "message": "Sleeping for 600.0 seconds..."}
2021-02-01 01:04:25{"severity": "ERROR", "name": "prefect-server.Scheduler", "message": "Unexpected error: ConnectError(gaierror(-2, 'Name or service not known'))"}
2021-02-01 01:04:25{"severity": "DEBUG", "name": "prefect-server.Scheduler", "message": "Sleeping for 300.0 seconds..."}
2021-02-01 01:04:15{"severity": "ERROR", "name": "prefect-server.ZombieKiller", "message": "Unexpected error: ConnectError(gaierror(-2, 'Name or service not known'))"}

Environment

This is when I am hosting #prefect-server in AWS ECS Fargate as a single task definition with the following configuration converted from docker-compose.yml (https://github.com/PrefectHQ/prefect/blob/master/src/prefect/cli/docker-compose.yml) to 5 multi-container task definition for AWS Fargate.

I am able to host Prefer Server using the following task definition (PFA the screenshot). Note I have used Prefect core 0.14.5 tags for my images. Please note all configurations were done using AWS Console for task definition configuration. I have used managed Postgres in AWS RDS for Hasura's underlying DB

Screenshot 2021-02-01 at 1 04 06 AM (1)

Screenshot 2021-02-01 at 1 11 46 AM

Include flow Parameters in flow run parameters

Description

If I query for a flow run and want to see its parameters, flow_run.parameters includes flow group default parameters and parameters set at run time but does not include flow parameters. This can be confusing for users who want to see all parameters that affect their flow run.
See UI#397 for a description of how this was affecting users in the UI.

Expected Behavior

UI#401 should resolve this issue in the UI but I think it would be better if we can update server to include flow parameters as part of a flow run's parameters - for example for users who use the API to check their flow run's parameters and want to see all the relevant parameters.

Reproduction

Simple flow with parameters like this:

from prefect import Flow, Parameter, task
import time


@task()
def sleep(x):
    time.sleep(x)

with Flow(name='simple-sleep') as flow:
    c = Parameter('c', default = 7)
    sleep(c)
    

flow.register('project')

Run the flow (create_flow_run) setting no parameters. Then query for that flow run and its parameters. The parameter will not show. If you set the parameter at flow run time or as part of the flow's flow group default parameters (in the UI or using the set_flow_group_default_parameters mutation) it will be shown.

Environment

Flow Concurrency Limits

Current behavior

Currently, Flow Concurrency Limits are a cloud-only feature. There was a PR against Core's server while it was still a part of the core package, but was dropped due to the migration of Server to it's own repo and sharing the core code with Cloud.

Proposed behavior

The implementation of Flow Concurrency Limits should allow labeling of Flows and limiting concurrent executions of flows.

A few notable questions / changes from the previous PR:

  • The database table requires a tenant_id column foreign keyed to the tenant table.
  • The database table should not include the description column.
  • The publicly available mutations should be update_flow_concurrency_limit (which acts as an upsert) and delete_flow_concurrency_limit

Example

See Cloud's documentation of Flow Concurrency Limits :)

Add support for running `explain` for Hasura queries

Current behavior

Hasura has a metadata API (if enabled) that allows running explains of queries. This can be useful for debugging. Getting it to comply with the existing ORM is a little tricky, but doable.

Proposed behavior

Example

Cannot set Prefect Server logging level

Description

Setting Prefect Server logging level via environment variable doesn't seem to work. Log level stays at "INFO".

Neither PREFECT__LOGGING__LEVEL: WARNING nor PREFECT_SERVER__LOGGING__LEVEL: WARNING seemed to have the desired effect.

Expected Behavior

I expect the Prefect Server to follow the Prefect Logging Configuration documentation.

Reproduction

Add PREFECT__LOGGING__LEVEL: WARNING or PREFECT_SERVER__LOGGING__LEVEL: WARNING to the graphql environment and start the prefect server.

Note the "INFO" logging messages, e.g.:

...
graphql_1   | INFO:     192.168.32.4:40048 - "GET /health HTTP/1.1" 200 OK
graphql_1   | INFO:     192.168.32.4:40052 - "GET /health HTTP/1.1" 200 OK
...

Environment

Server docker image: prefecthq/server:core-0.14.12 with ID f92fdce05361

Provide PREFECT_SERVER_VERSION to Apollo image CI build step

Currently when running the Apollo container you will see the output:

apollo_1    | Server ready at http://0.0.0.0:4200 🚀 (version: UNKNOWN)

This is due to the PREFECT_SERVER_VERSION not being provided to the Docker image build:

server/.circleci/config.yml

Lines 211 to 223 in bddc98b

'Build and publish versioned server artifacts':
jobs:
- docker/publish:
docker-password: DOCKER_HUB_PW
docker-username: DOCKER_HUB_USER
image: 'prefecthq/apollo'
path: 'services/apollo'
tag: $CIRCLE_TAG,latest
filters:
branches:
ignore: /.*/
tags:
only: /^[0-9]+\.[0-9]+\.[0-9]+$/

ENV PREFECT_SERVER_VERSION=${VERSION}

Ensure health check available for all services (where it makes sense)

Opened from the Prefect Public Slack Community

verun.karim: hi, is there a quick way to check the health and status of the lazarus process/service? perhaps a REST endpoint we could query or a command line utility we can invoke?

jim: I assume you're asking about a health check for that service when running your own version of Prefect Server (not one for pinging cloud)?

verun.karim: yes <@U011EKN35PT> that is correct!

jim: Sure, I'll open an issue

verun.karim: excellent thanks 👍

jim: <@ULVA73B9P> open "Ensure health check available for all services (where it makes sense)" in server

Original thread can be found here.

Expose Prefect Core Version on Server/UI

Current behavior

Actually, there's no easy way to get the version of Prefect Core running in my environment.

Proposed behavior

Expose Prefect Core version so that it can be shown in Prefect UI/Server.

Example

Having Prefect Core version under the Prefect Core logo would be nice!
Screenshot 2020-10-28 at 14 31 40

Add routes for renaming flow runs and tasks

Current behavior

Currently, flow runs can be named at creation only or given random names, and task runs are randomly named.

Proposed behavior

A new GQL mutation allows runs to be renamed. By incorporating this into a Prefect task, users could rename runs based on parameters, schedules, inputs, or anything else. Prefect runs on Prefect™️

Example

QUICK RUN cannot take newer Default Parameter updated from UI

Description

The QUICK RUN button will take default Parameter which is given when registered, but it still takes this default Parameter no matter how I modify it on UI later (settings->Parameters->Default Parameters).

Expected Behavior

"QUICK RUN" should take the parameter after user update it on UI. Like the code below, when I register and run this flow, it will return 3 (2+1), but when I modify the parameter from 2 to 50, it still return 3.

Reproduction

from prefect import task, Flow, Parameter
import prefect

logger = prefect.utilities.logging.get_logger()
@task
def print_plus_one(x):
    print(x + 1)
    logger.warning(x+1)

with Flow('Parameterized Flow') as flow:
    x = Parameter('x', default = 2)
    print_plus_one(x=x)

flow.register(project_name="Test Project")

Environment

Prefect version: 0.13.2

Reschedule of schedules with docker agent not correctly working (after cancel it)

Description

If I cancel some scheduled flow runs, i'm not able to reschedule them via the scheduler.
Even if I recreate the schedule, even I set another execution time.

Expected Behavior

  • a planned flow (canceled and scheduled again) should run at the same time like it was planned before
  • if the schedule is deleted and recreated, every canceled flow run task in the past should be ignored. everything else is very confusing.

Reproduction

Prepareation

  • having a listening docker agent running
  • set a daily schedule for a never-before-scheduled flow.
  • activate the schedule -> you will see the planned flow runs in yellow
  • on the screenshot you can see, it's planned for today 6pm.
    image
  • now click on this yellow bar -> the planned flow run opens
  • click cancle on the right top corner -> the state changed to cancel, displayed as grey bar and moves behind in history (to the point i canceled) / the next flow run is planned for tomorrow
    image

Now I tried two options:

Try 1: set state back to scheduled

  • set the state of this canceled flow back to scheduled
    • [MISSING BEHAVIOR] the flow will run immediately (guess because he is also displayed in the past) -> should be run at this time it was planned originally.
    • you can see a lot of states after the run (btw: the duration was clearly too short):
      image

Expected: An planned-flow run should be planned at the same date/time after cancel and reschedule and not run immediately.

Try 2: recreate the schedule

  • recreate the schedule (even with different time)
    • create a new schedule (daily, 17:59) (created Thursday morning)
      image
    • cancel the next flow run (Thursday 17:59) (is marked grey in the image below) -> the next will be on Friday evening.
      image
    • delete the schedule for this flow
    • create a new schedule (daily, 17:58, [created still thursday morning]) -> so the next flwo run should be again on Thursday / today (but 17:58)
    • disable/enable the schedules for this flow (i think this only speedsup the preview of schedule flow runs)
    • [MISSING BEHAVIOR] you can see the next run is planned for Friday (tomorrow) (so the Thursday run is missing!)
      image

Expected: if I create a new schedule, it should behave like a new schedule (independent of deleted schedules with maybe canceled flows.)

Environment

Prefect Backend (not the cloud)
with a running docker agent
Prefect 0.14.6
Python 3.8.0
Ubuntu 18.04.5 LTS

Optionally run prefect diagnostics from the command line and paste the information here. -->

"system_information": {
    "platform": "Linux-4.15.0-135-generic-x86_64-with-glibc2.10",
    "prefect_backend": "server",
    "prefect_version": "0.14.6",
    "python_version": "3.8.0"
  }

Prefect not picking up my Postgres database.

I added my database properties to both backend.toml and config.toml files.
When I execute 'prefect config', the config json shows my database credentials.
But when I execute 'prefect server start', prefect pulls docker image for postgres and does not use my external database.

Prefect K8S agent error when deploying a flow run

Description

I get an occasional error on my Kubernetes agent when deploying a flow run. I have noticed that the flow doesn't start or will sometimes start much later.

This error is roughly correlated with the time I started trying to deploy to our existing Dask cluster, and to the time when we upgraded our Kubernetes version.

[2021-01-17 00:00:11,063] ERROR - AGENTNAME | Error while managing existing k8s jobs
Traceback (most recent call last):
  File "/usr/local/.venv/lib/python3.8/site-packages/prefect/agent/kubernetes/agent.py", line 362, in heartbeat
    self.manage_jobs()
  File "/usr/local/.venv/lib/python3.8/site-packages/prefect/agent/kubernetes/agent.py", line 219, in manage_jobs
    event.last_timestamp
TypeError: '<' not supported between instances of 'NoneType' and 'datetime.datetime'

The current workaround is to click the run button again.

Expected Behavior

The flow should be deployed and begin processing at the earliest opportunity.

Reproduction

I don't have a great deal of insight into what the underlying cause is.

Environment

Prefect server stack deployed to Kubernetes with a modified version of the official Helm chart. I have a custom docker image and an ingress.

Our cluster is running in Azure Kubernetes Service, Kubernetes version 1.17.13

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.