Giter VIP home page Giter VIP logo

muontrap's Introduction

MuonTrap

CircleCI Hex version REUSE status

Keep programs, deamons, and applications launched from Erlang and Elixir contained and well-behaved. This lightweight library kills OS processes if the Elixir process running them crashes and if you're running on Linux, it can use cgroups to prevent many other shenanigans.

Some other features:

  • Attach your OS process to a supervision tree via a convenient child_spec
  • Set cgroup controls like thresholds on memory and CPU utilization
  • Start OS processes as a different user or group
  • Send SIGKILL to processes that aren't responsive to SIGTERM
  • With cgroups, ensure that all children of launched processes have been killed too

TL;DR

Add muontrap to your project's mix.exs dependency list:

def deps do
  [
    {:muontrap, "~> 1.0"}
  ]
end

Run a command similar to System.cmd/3:

iex>  MuonTrap.cmd("echo", ["hello"])
{"hello\n", 0}

Attach a long running process to a supervision tree using a child_spec like the following:

{MuonTrap.Daemon, ["long_running_command", ["arg1", "arg2"], options]}

Running on Linux and can use cgroups? Then create a new cgroup:

sudo cgcreate -a $(whoami) -g memory:mycgroup
{MuonTrap.Daemon,
 [
   "long_running_command",
   ["arg1", "arg2"],
   [cgroup_controllers: ["memory"], cgroup_base: "mycgroup"]
 ]}

MuonTrap will create a cgroup under "mycgroup" to run the "long_running_command". If the command fails, it will be restarted. If it should no longer be running (like if something else crashed in Elixir and supervision needs to clean up) then MuonTrap will kill "long_running_command" and all of its children.

Want to know more? Read on...

The problem

The Erlang VM's port interface lets Elixir applications run external programs. This is important since it's not practical to rewrite everything in Elixir. Plus, if the program is long running like a daemon or a server, you use Elixir to supervise it and restart it on crashes. The catch is that the Erlang VM expects port processes to be well-behaved. As you'd expect, many useful programs don't quite meet the Erlang VM's expectations.

For example, let's say that you want to monitor a network connection and decide that ping is the right tool. Here's how you could start ping in a process.

iex> pid = spawn(fn -> System.cmd("ping", ["-i", "5", "localhost"], into: IO.stream(:stdio, :line)) end)
#PID<0.6116.0>
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.032 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.077 ms

To see that ping is running, call ps to look for it. You can also do this from a separate terminal window outside of IEx:

iex> :os.cmd('ps -ef | grep ping') |> IO.puts
  501 38820 38587   0  9:26PM ??         0:00.01 /sbin/ping -i 5 localhost
  501 38824 38822   0  9:27PM ??         0:00.00 grep ping
:ok

Now exit the Elixir process. Imagine here that in the real program that something happened in Elixir and the process needs to exit and be restarted by a supervisor.

iex> Process.exit(pid, :oops)
true
iex> :os.cmd('ps -ef | grep ping') |> IO.puts
  501 38820 38587   0  9:26PM ??         0:00.02 /sbin/ping -i 5 localhost
  501 38833 38831   0  9:34PM ??         0:00.00 grep ping

As you can tell, ping is still running after the exit. If you run :observer you'll see that Elixir did indeed terminate both the process and the port, but that didn't stop ping. The reason for this is that ping doesn't pay attention to stdin and doesn't notice the Erlang VM closing it to signal that it should exit.

Imagine now that the process was supervised and it restarts. If this happens a regularly, you could be running dozens of ping commands.

This is just one of the problems that muontrap fixes.

Applicability

This is intended for long running processes. It's not great for interactive programs that communicate via the port or send signals. That feature is possible to add, but you'll probably be happier with other solutions like erlexec.

Running commands

The simplest way to use muontrap is as a replacement to System.cmd/3. Here's an example using ping:

iex> pid = spawn(fn -> MuonTrap.cmd("ping", ["-i", "5", "localhost"], into: IO.stream(:stdio, :line)) end)
#PID<0.30860.0>
PING localhost (127.0.0.1): 56 data bytes
64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.081 ms

Now if you exit that process, ping gets killed as well:

iex> Process.exit(pid, :oops)
true
iex> :os.cmd('ps -ef | grep ping') |> IO.puts
  501 38898 38896   0  9:58PM ??         0:00.00 grep ping

:ok

Containment with cgroups

Even if you don't make use of any cgroup controller features, having your port process contained can be useful just to make sure that everything is cleaned up on exit including any subprocesses.

To set this up, first create a cgroup with appropriate permissions. Any path will do; muontrap just needs to be able to create a subdirectory underneath it for its use. For example:

sudo cgcreate -a $(whoami) -g memory,cpu:mycgroup

Be sure to create the group for all of the cgroup controllers that you wish to use with muontrap. The above example creates it for the memory and cpu controllers.

In Elixir, call MuonTrap.cmd/3 with the cgroup options now. In this case, we'll use the cpu controller, but this example would work fine with any of the controllers.

iex>  MuonTrap.cmd("spawning_program", [], cgroup_controllers: ["cpu"], cgroup_base: "mycgroup")
{"hello\n", 0}

In this example, muontrap runs spawning_program in a sub-cgroup under the cpu/mycgroup group. The cgroup parameters may be modified outside of muontrap using cgset or my accessing the cgroup mountpoint manually.

On any error or if the Erlang VM closes the port or if spawning_program exits, muontrap will kill all OS processes in cgroup. No need to worry about random processes accumulating on your system.

Note that if you use cgroup_base, a temporary cgroup is created for running the command. If you want muontrap to use a particular cgroup and not create a subgroup for the command, use the :cgroup_path option. Note that if you explicitly specify a cgroup, be careful not to use it for anything else. MuonTrap assumes that it owns the cgroup and when it needs to kill processes, it kills all of them in the cgroup.

Limit the memory used by a process

Linux's cgroups are very powerful and the examples here only scratch the surface. If you'd like to limit an OS process and all of its child processes to a maximum amount of memory, you can do that with the memory controller:

iex>  MuonTrap.cmd("memory_hog", [], cgroup_controllers: ["memory"], cgroup_base: "mycgroup", cgroup_sets: [{"memory", "memory.limit_in_bytes", "268435456"}])

That line restricts the total memory used by memory_hog to 256 MB.

Limit CPU usage in a port

Limiting the maximum CPU usage is also possible. Two parameters control that with the cpu controller: cpu.cfs_period_us specifies the number of microseconds in the scheduling period and cpu.cfs_quota_us specifies how many of those microseconds can be used. Here's an example call that prevents a program from using more than 50% of the CPU:

iex>  MuonTrap.cmd("cpu_hog", [], cgroup_controllers: ["cpu"], cgroup_base: "mycgroup", cgroup_sets: [{"cpu", "cpu.cfs_period_us", "100000"}, {"cpu", "cpu.cfs_quota_us", 50000}])

Supervision

For many long running programs, you may want to restart them if they crash. Luckily Erlang already has mechanisms to do this. MuonTrap provides a GenServer called MuonTrap.Daemon that you can hook into one of your supervision trees. For example, you could specify it like this in your application's supervisor:

  def start(_type, _args) do
    children = [
      {MuonTrap.Daemon, ["command", ["arg1", "arg2"], options]}
    ]

    opts = [strategy: :one_for_one, name: MyApp.Supervisor]
    Supervisor.start_link(children, opts)
  end

Supervisors provide three restart strategies, :permanent, :temporary, and :transient. They work as follows:

  • :permanent - Always restart the command if it exits or crashes. Restarts are limited to the Supervisor's restart intensity settings as they would be with normal GenServers. This is the default.
  • :transient - If the exit status of the command is 0 (i.e., success), then don't restart. Any other exit status is considered an error and the command is restarted.
  • :temporary - Don't restart

If you're running more than one MuonTrap.Daemon under the same Supervisor, then you'll need to give each one a unique :id. Here's an example child_spec for setting the :id and the :restart parameters:

    Supervisor.child_spec(
        {MuonTrap.Daemon, ["command", ["arg1"], options]},
         id: :my_daemon,
         restart: :transient
      )

stdio flow control

The Erlang port feature does not implement flow control from messages coming from the port process. Since MuonTrap captures stdio from the program being run, it's possible that the program sends output so fast that it grows the Elixir process's mailbox big enough to cause an out-of-memory error.

MuonTrap protects against this by implementing a flow control mechanism. When triggered, the running program's stdout and stderr file handles won't be read and hence it will eventually be blocked from writing to those handles.

The :stdio_window option specifies the maximum number of unacknowledged bytes allowed. The default is 10 KB.

muontrap development

In order to run the tests, some additional tools need to be installed. Specifically the cgcreate and cgget binaries need to be installed (and available on $PATH). Typically the package may be called cgroup-tools (on arch linux you need to install the libcgroup aur package).

Then run:

sudo cgcreate -a $(whoami) -g memory,cpu:muontrap_test

License

All original source code in this project is licensed under Apache-2.0.

Additionally, this project follows the REUSE recommendations and labels so that licensing and copyright are clear at the file level.

Exceptions to Apache-2.0 licensing are:

  • Configuration and data files are licensed under CC0-1.0
  • Documentation is CC-BY-4.0

muontrap's People

Contributors

alde103 avatar axelson avatar bjyoungblood avatar brunoro avatar cellane avatar fhunleth avatar jjcarstens avatar mattludwigs avatar ringlej avatar tverlaan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

muontrap's Issues

Add test to verify killing chatty processes

This was verified manually in v1.3.3, but would be nice to automate.

Here's the test from @bjyoungblood:

test.c:

#include <stdio.h>

int main(void)
{
  /* Make standard output unbuffered. */
  setvbuf(stdout, (char *)NULL, _IONBF, 0);

  while(1)
    printf("Hello world!\n");

  return 0;
}

repro.exs:

Logger.configure(level: :info)

Supervisor.start_link(
  [
    {MuonTrap.Daemon,
     [
       "#{File.cwd!()}/a.out",
       [],
       [log_output: :debug, delay_to_sigkill: 50]
     ]}
  ],
  name: Sup,
  strategy: :one_for_one
)

Process.sleep(100)

require Integer

Enum.reduce_while(1..100, 0, fn _x, acc ->
  Process.sleep(:rand.uniform(1000))
  Supervisor.terminate_child(Sup, MuonTrap.Daemon)
  Process.sleep(51)

  {ps_output, _} = System.cmd("ps", ["aux"])

  processes =
    ps_output
    |> String.split("\n")
    |> Enum.filter(fn line -> String.contains?(line, "a.out") end)
    |> length()

  if Integer.is_odd(processes) do
    IO.puts("#{acc}: it happened :D")
    System.halt(1)
    {:halt, acc + 1}
  else
    IO.puts("#{acc}: it didn't happen :'(")
    Supervisor.restart_child(Sup, MuonTrap.Daemon)
    {:cont, acc + 1}
  end
end)

The reduce_while almost always stops on the first iteration when the bug exists.

Delay to sigkill

Hi,
We use Muontrap in our application to monitor some daemon processes, and everything works fine.
However we are experiencing some issues in regards to the way shutdown of running processes is behaving.

According to:

* `:delay_to_sigkill` - milliseconds before sending a SIGKILL to a child process if it doesn't exit with a SIGTERM

the :delay_to_sigkill option is documented as milliseconds before sending a SIGKILL to a child process if it doesn't exit with a SIGTERM.

This option is passed on unmodified as a --delay-to-sigkill argument here:

defp muontrap_arg({:delay_to_sigkill, delay}), do: ["--delay-to-sigkill", to_string(delay)]

However from the switch case here:

muontrap/src/muontrap.c

Lines 585 to 590 in 38f5b41

case 'k': // --delay-to-sigkill
// Specified in microseconds for legacy reasons
brutal_kill_wait_ms = strtoul(optarg, NULL, 0) / 1000;
if (brutal_kill_wait_ms > 1000)
errx(EXIT_FAILURE, "Delay to sending a SIGKILL must be < 1,000,000 (1 second)");
break;

it seems that --delay-to-sigkill is expected to be Specified in microseconds for legacy reasons. So in the end the option is interpreted as microseconds and not millisecond as stated in the documentation.

Apart from this, the switch case also for some unapparent reason forces the delay to be less than 1 second. This is really a big issue for us, as the processes we are running need more time for closing open files and shutting down gracefully.

Is there a reason why the delay before sending sigkill cannot be more than 1 second?

Attempt to convert args to binaries

When passing an integer through the MuonTrap.Daemon args the process fails to start and results in an argument error. The example I have is passing a port number. Converting this to a string and passing it fixes the issue. I am only opening this issue as a placeholder to discuss if we should attempt to convert the args before using them.

No Works with Nerves

iex(3)> MuonTrap.cmd("echo", ["hello"])
** (ErlangError) Erlang error: :enoent
    (muontrap 1.0.0) lib/muontrap/options.ex:52: MuonTrap.Options.validate("echo", ["hello"], [])
    (muontrap 1.0.0) lib/muontrap.ex:99: MuonTrap.cmd/3
    iex:3: (file)

Fails to compile when there's a space somewhere in the absolute path

Fatal error: can't create it/Projekte/rentory/rentory_hub/_build/nerves_system_docker_rpi4_dev/lib/muontrap/obj/muontrap.o: No such file or directory
make[1]: *** [it/Projekte/rentory/rentory_hub/_build/nerves_system_docker_rpi4_dev/lib/muontrap/obj/muontrap.o] Error 2
make: *** [all] Error 2

The folder at the start was /made it/โ€ฆ

Compiler error on Ubuntu 20.04

Here's the error:

==> muontrap
 CC muontrap.o
muontrap.c: In function 'process_stdio':
muontrap.c:522:5: error: a label can only be part of a statement and a declaration is not a statement
  522 |     ssize_t written = splice(from_fd, NULL, STDOUT_FILENO, NULL, stdio_bytes_avail, SPLICE_F_MOVE);
      |     ^~~~~~~
make[1]: *** [Makefile:41:_build/dev/lib/muontrap/obj/muontrap.o] Error 1
make: *** [Makefile:5: all] Error 2
gcc --version
gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Muontrap.Daemon Proposal

Hello,

Would you be interested in a new feature for Muontrap.Daemon?, the idea is not only to log the messages generated by the OS process supervised by Muontrap.Daemon, but to add the ability to send them to the parent process for further processing.

Greetings.

Establishing/Supervising Communications With a Long-Running, Data Streaming Python Process

Ubuntu 20.04.1 LTS
Erlang/OTP 23 [erts-11.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
Elixir 1.10.3 (compiled with Erlang/OTP 22)

Our Elixir project requires initiating and then supervising communications with a long-running, data streaming Python process. Data will be pushed to its Elixir counterpart once every second.

Is your library ideally suited for this? If so, where can we find recipes for such a use case?

cgroup support no longer working on Linux 5.15

It might be broken on an earlier version of Linux, but this is the first version that I've seen it.

There are a couple issues. First, the unit tests don't recognize when the group as been created. Once you fix that, there are many more errors, so something is clearly different now.

Escript Troubles

Using Muontrap in :dev and :test - all good.

When running in an Escript, I get an error ArgumentError unknown application :muontrap.

This comes from MuonTrap.Port#muontrap_path/0: Application.app_dir(:muontrap, ["priv", "muontrap"])

When I look in the _build directory I can see the /priv/muontop executable...

Does anyone have an idea of why this is failing in an Escript?

Windows support.

Hello!

Would it be possible for MuonTrap to support windows? I presume that since we spawn the compile muontrap c program, which has support for unix cgroups, that its not as simple as just compiling that program on windows?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.