Giter VIP home page Giter VIP logo

Comments (19)

tarekziade avatar tarekziade commented on August 24, 2024
  1. will not work => Popen.communicate() waits for the process to terminate, and send you the whole buffer. The buffer is loaded in memory.
  2. I don't really understand this one -- does it mean redirecting the streams in a file we can stream back via a poller ?
  3. how do you save it ?
  4. Will not work. see 1.
  5. free ?

What about using the shell when the command is called with shell == True:

program >> log 2>> log

In any case, it seems that if we want to stream stdout/stderr with any existing program we have to have a FD per stream. I don't think that's really an issue because it's easy to reach the FD limit to 8096 for instance

from circus.

benoitc avatar benoitc commented on August 24, 2024

the problem is not the number of fds but how you can collect them in a short time. 2 is about polling these fds and publishing when an event happen, the mapping is there to map then to the watcher and process.

  1. Well by passing files instead of pipes.
  2. having an api to send input and get output at a time is still interresting
  3. 3.(mixed some ideas)

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

Here's what I came up with so far - you can try it on any program that spits data in stdout.stderr (and flushes it)

That leaves the 2 PIPEs open but then in publish() you can do whatever you want, like publish in ZMQ for instance. Then we can provide a file backend

import subprocess
import time
import threading
import sys

lock = threading.Lock()


def publish(msg):
    with lock:
        # or zmq or a file or...
        sys.stdout.write(msg)
        sys.stdout.flush()


class Stream(threading.Thread):
    def __init__(self, stream, pub, prefix):
        threading.Thread.__init__(self)
        self.stream = stream
        self.prefix = prefix
        self.pub = pub

    def run(self):
        line = self.stream.readline()
        while line:
            self.pub(self.prefix + line)
            line = self.stream.readline()



p = subprocess.Popen('python ok.py', shell=True,
                     stdout=subprocess.PIPE, stderr=subprocess.PIPE)


child_stderr = Stream(p.stderr, publish, 'ok:stderr:')
child_stderr.start()
child_stdout = Stream(p.stdout, publish, 'ok:stdout:')
child_stdout.start()

while True:
    time.sleep(.2)

child_stderr.stop()
child_stdout.stop()

from circus.

benoitc avatar benoitc commented on August 24, 2024

that won't work at all if you have 100/1000 processes . You don't want 1000 threads neither you want want a loop that goes over 2000 pipes, the response returned will be slow. Hence the poll which is exactly done for that. we can eventually a thread per watcher, but not sure it's good as well. Maybe havin an external os process reading results from the filesystem is enough?

from circus.

wraithan avatar wraithan commented on August 24, 2024

Popen.communicate:
Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited.

In this case the data size is unlimited. So according to the docs (I've not used Popen.communicate myself) it shouldn't be used here.

from circus.

benoitc avatar benoitc commented on August 24, 2024

I guess the easiest way would be providing a way to redirect stdin/stderr on the filesystem. Then we could easily provide a way to listen on files changes from the fs and provide a pub/sub script based on that.

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

@benoitc that's just a prototype - in a real version we could use greenlets instead of threads and just feed the zmq PUB/SUB socket whenever a line comes in (and no lock in the function that feeds the pub/sub) I don't think 1 greenlet per process is a big overhead but I have not tried yet.

"I guess the easiest way would be providing a way to redirect stdin/stderr on the filesystem"

I am not entirely sure to see the difference here. e.g. listening to 1000 PIPEs changes vs 1000 files changes. How would you do that differently ?

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

-- and you can't really share a file for all processes because you would have concurrent accesses issues. You need serialization in order not to merge lines emited by processes.

from circus.

benoitc avatar benoitc commented on August 24, 2024

The reason why it's easier to have the redirection on the filesystem is mostly because it won't require any other threads to just listen on stderr/stdout. Once it's on the file system you can use any other system process to read them and expose it over zmq or anything. I think I will code something like this during my flight tomorrow.

I don't want to share a file across process . Though it would be feasible. Here I want 1 file / output and also the possibility to redirect stderr to stdout so we could only use one file in that case for a process but that should be optionnal.

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

How would you do it differently than looping on every file ? or having events ? I don't understand how it would be different from working with PIPEs that are like files.

from circus.

benoitc avatar benoitc commented on August 24, 2024

because to do that for pipes you will have to give the information of current watchers to a thread or a process and reuse the same way we are using for the flapping to get changes in watchers. While saving on a the file system would only require to an external process to watch to a directory the changes and won't require any polling on some systems (linux and bsd with kqueue). Just watching on some files rather than suscribing to the pub/sub system maybe easier. The other advantage on saving them on the fs is having some kind of persitency. Logs are here.

The other thing is that using a thread for that may be dangerous and will consume a lot of ram. So using an external process is definitely better.

from circus.

benoitc avatar benoitc commented on August 24, 2024

Maybe we can propose both options if someone don't want the persistence on the disk?

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

an external process to watch to a directory the changes and won't require any polling on some systems (linux and bsd with kqueue)

this is still unclear to me. if you use something like inotify you still need to open back the file to get the content of the emited stoud or stderr, right ? so you'd be looping into 1000 of files as well I think.

The other thing is that using a thread for that may be dangerous and will consume a lot of ram. So using an external process is definitely better.

What do you mean by "dangerous" ? also, 1000 greenlets does not consume any RAM at all.

from circus.

benoitc avatar benoitc commented on August 24, 2024

Greenelts of course aren't consuming lot of RAM right, but circus doesn't use gevent by default so this isn't really the question ;)

Of course you will have to open 1000 files. And this isn't again the problem. By saving file on the fs you can either let people only read what they want at a time rather than listening on every changes and send them across the wirre even if nobody is listening them. Also having them on the fs will make them persistent. So whatever happen you will be able to read the logs. The other advantage of using the fs is that you can delegate to an external program the role to read them, possibly on demand.

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

Greenelts of course aren't consuming lot of RAM right, but circus doesn't use gevent by default so this isn't really the question ;)

I think this is the question in fact. We're talking about scaling this feature. I don't see a problem about using greenlets to scale a feature, if it's an option.

Of course you will have to open 1000 files. And this isn't again the problem. By saving file on the fs you can either
let people only read what they want at a time rather than listening on every changes and send them across the wirre even if nobody is listening them

That's not the same feature anymore. We were talking about redirecting the stderr/stdout in a pub/sub. read back the initial post you did here.

I think we need to step back a bit and agree on the goals.

My goals are to be able to get in a stream, whether it's a file or a zmq pub.sub or whatever, the stderr and stdout of all processes.

This can be done simply via the command line if we use bash by: "program 1>> file 2 >> file"

We want to have the same feature for all processes even if they are not run in the shell. subprocess already provide that feature, and I think it's easy to add that feature in circus.

Now your goal is to make it scale across 1000 processes, and I think it's completely feasible with greenlets. Do you disagree with this point ?

Now the FS vs zmq pub/sub is a completely orthogonal issue : where do we want to push the stream - that's not really an issue but a configuration.

Either we push in a file and we get the persistency you are talking about, either we push in a PUB (or both!) and we get a notification system were people can subscribe.

from circus.

benoitc avatar benoitc commented on August 24, 2024

On Tue, Apr 10, 2012 at 9:41 AM, Tarek Ziade
[email protected]
wrote:

Greenelts of course aren't consuming lot of RAM right, but circus doesn't use gevent by default so this isn't really the question ;)

I think this is the question in fact. We're talking about scaling this feature. I don't see a problem about using greenlets to scale a feature, if it's an option.

Maybe giving the choice of the backend ?

Of course you will have to open 1000 files. And this isn't again the problem. By saving file on the fs you can either
let people only read what they want at a time rather than listening on every changes and send them across the wirre even if nobody is listening them

That's not the same feature anymore. We were talking about redirecting the stderr/stdout in a pub/sub. read back the initial post you did here.

I think we need to step back a bit and agree on the goals.

My goals are to be able to get in a stream, whether it's a file or a zmq pub.sub or whatever, the stderr and stdout of all processes.

This can be done simply via the command line if we use bash by:  "program 1>> file 2 >> file"

We want to have the same feature for all processes even if they are not run in the shell. subprocess already provide that feature, and I think it's easy to add that feature in circus.

Now your goal is to make it scale across 1000 processes, and I think it's completely feasible with greenlets. Do you disagree with this point ?

Now the FS vs zmq pub/sub is a completely orthogonal issue : where do we want to push the stream - that's not really an issue but a configuration.

Either we push in a file and we get the persistency you are talking about, either we push in a PUB (or both!) and we get a notification system were people can subscribe.

agreed. But we need to find a way to not send useless messages. If we
publish all streams to a pub/sub but noone is listening it would cost
some CPU and RAM for nothing. It will also slow down the feed for
those who are really listening.

Maybe a solution would be:

  • having a pub/sub/watcher
  • optionnaly saving to the fs for persistency. If we already have a
    pub/sub we could reuse that to make it happen.

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

I pushed a first version of a greenlet-based system that redirects the stream to a callable.

I'll now bench the system with 1k processes to see what's the overhead

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

I have now pushed a full example where you can set in the ini file a stderr_file or/and stdout_file:

https://github.com/mozilla-services/circus/blob/issue73/examples/circus2.ini

The overhead is not noticeable, and the stream is happily pushed into the file.

What I want to do next is to push structurized logs where we add the worker id, watcher name, stream name and message.

And provide pluggable backends:

  • a file backend, configured like in this example
  • a zmq backend, so the data is pushed in the PUB/SUB

The user then can configure if he wants one or both.

gevent is optional and has to be installed only if they use this feature.

from circus.

tarekziade avatar tarekziade commented on August 24, 2024

see #75

from circus.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.