Giter VIP home page Giter VIP logo

Comments (10)

brndnmtthws avatar brndnmtthws commented on July 18, 2024

It looks like it's not handling the :EXIT message here (which is a bug), but I'm not sure why it's terminating there. Do you know why it is terminating? Is the supervisor shutting down for some reason? Can you share the code for Siteguardian.Scheduler.Registry and Siteguardian.Application.start?

from citrine.

amacgregor avatar amacgregor commented on July 18, 2024

Application start code:

  def start(_type, _args) do
    children = [
      # Start the Ecto repository
      Siteguardian.Repo,
      # Start the Telemetry supervisor
      SiteguardianWeb.Telemetry,
      # Start the PubSub system
      {Phoenix.PubSub, name: Siteguardian.PubSub},
      # Start the Endpoint (http/https)
      SiteguardianWeb.Endpoint,
      # Start the Citrine Scheduler
      Siteguardian.Scheduler,
      # Start a Command Runner
      Siteguardian.CommandRunner,
      # Start a worker by calling: Siteguardian.Worker.start_link(arg)
      # {Siteguardian.Worker, arg}
    ]

    # See https://hexdocs.pm/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Siteguardian.Supervisor]
    start_val = Supervisor.start_link(children, opts)

    for domain <- Siteguardian.Automation.list_active_domains() do
       Siteguardian.Scheduler.put_job(%Citrine.Job{
          id: "job_#{domain.name}",
          schedule: "* * * * *", # Run every second
          task: fn -> TaskRunner.start(domain) end,
        })
    end

    start_val
  end

I have been able to narrow it down somewhat and it seems that is related to the following snippet of code:

    case System.cmd("/bin/sh", ["-c", command]) do
      {output, 0} ->
        output
        |> format_output
        |> output_to_map(%{})
        |> IO.inspect()

        {:ok, domain}
      _ ->
        IO.inspect("Nope")
        {:error, "Error: unable to load certificate"}
    end

that snippet of code is called as part of the Task execution, without it seems to work fine.

from citrine.

brndnmtthws avatar brndnmtthws commented on July 18, 2024

Ah okay, I see what's happening. Internally System.cmd uses Port.

Here's a note about it:

Internally, this function uses a Port for interacting with the outside world. However, if you plan to run a long-running program, ports guarantee stdin/stdout devices will be closed but it does not automatically terminate the program. The documentation for the Port module describes this problem and possible solutions under the "Zombie processes" section.

And, as per the Port docs:

On its turn, the port will send the connected process the following messages:

  • {port, {:data, data}} - data sent by the port
  • {port, :closed} - reply to the {pid, :close} message
  • {port, :connected} - reply to the {pid, {:connect, new_pid}} message
  • {:EXIT, port, reason} - exit signals in case the port crashes. If reason is not :normal, this message will only be received if the owner process is trapping exits

I'm not 100% sure what the idiomatic Elixir way to handle this is. If I were to guess, I think your code should wrap the System.cmd call with a Task so it doesn't propagate to the executor. I also think the commit I just added (30f4c57) was probably not necessary.

from citrine.

amacgregor avatar amacgregor commented on July 18, 2024

Thank for this was also looking into a DynamicSupervisor to pass the command execution to a worker

from citrine.

brndnmtthws avatar brndnmtthws commented on July 18, 2024

The easiest way may be Task.Supervisor.

from citrine.

amacgregor avatar amacgregor commented on July 18, 2024
    Task.async(fn -> execute_command(command) end)
    |> Task.await()
    |> IO.inspect()

Will still result on

[error] GenServer {Siteguardian.Scheduler.Registry, "job_coderoncode.com"} terminating
** (FunctionClauseError) no function clause matching in Citrine.JobExecutor.handle_info/2
    (citrine 0.1.11) lib/citrine/job_executor.ex:101: Citrine.JobExecutor.handle_info({:EXIT, #PID<0.622.0>, :normal}, %{cron_expr: ~e[* * * * * *], job: %Citrine.Job{extended_syntax: false, id: "job_coderoncode.com", schedule: "* * * * *", task: #Function<1.74548289/0 in Siteguardian.Application.start/2>}, registry: Siteguardian.Scheduler.Registry, timer: #Reference<0.589530647.1897136130.163894>})
    (stdlib 3.13.2) gen_server.erl:680: :gen_server.try_dispatch/4
    (stdlib 3.13.2) gen_server.erl:756: :gen_server.handle_msg/6
    (stdlib 3.13.2) proc_lib.erl:226: :proc_lib.init_p_do_apply/3
Last message: {:EXIT, #PID<0.622.0>, :normal}
State: %{cron_expr: ~e[* * * * * *], job: %Citrine.Job{extended_syntax: false, id: "job_coderoncode.com", schedule: "* * * * *", task: #Function<1.74548289/0 in Siteguardian.Application.start/2>}, registry: Siteguardian.Scheduler.Registry, timer: #Reference<0.589530647.1897136130.163894>}

from citrine.

brndnmtthws avatar brndnmtthws commented on July 18, 2024

You need something to handle that exit event, or else it will bubble up. I actually not sure if the unhandled event is harmful; does it keep running after that?

In any case, you need to wrap your task with something that can trap the exits and handle them accordingly.

from citrine.

amacgregor avatar amacgregor commented on July 18, 2024

Yeah, thats the tricky bit. I'm having a hard time finding the Idiomatic way to handle that. That said I did try adding the handle_info from your earlier commit which does kinda work; by handling the exit however that also terminates the job

  @impl true
  def handle_info({:EXIT, _pid, :normal}, state) do
    {:stop, :normal, state}
  end
[debug] finished job id=job_siteguardian.dev in 0.468161s
[debug] terminating Citrine.JobExecutor with reason: :normal and state=%{cron_expr: ~e[* * * * * *], job: %Citrine.Job{extended_syntax: false, id: "job_siteguardian.dev", schedule: "* * * * *", task: #Function<1.47134453/0 in Siteguardian.Application.start/2>}, registry: Siteguardian.Scheduler.Registry, timer: #Reference<0.3858931454.2454716417.247372>}

I could in theory use my fork and {:EXIT, _pid, :normal} to reschedule the job but that kinda wrong.

from citrine.

brndnmtthws avatar brndnmtthws commented on July 18, 2024

There's a discussion of a similar issue here: https://elixirforum.com/t/supervising-async-tasks/14412/6

I think if it were me, the easiest thing would be to write a tiny GenServer that is wrapped by a Task and then you can trap the exit and handle it accordingly there. There's an example in the thread I just linked to which you could adapt.

from citrine.

amacgregor avatar amacgregor commented on July 18, 2024

@brndnmtthws Thank you that last thread was exactly what I needed, and this is working for now (I think the Genserver might need some tweaking for scale)

Thanks again, I'm going to close the issue

from citrine.

Related Issues (1)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.