After playing with aenea for some time I started digging into trying to tighten the la

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Long action chains take a significant amount of time to execute,about dictation-toolbox/aenea

Comments (27)

calmofthestorm commented on July 28, 2024

First, thanks for taking some time to collect some data on this. That's the only way a discussion about performance can start. I agree this is a problem and that it's caused by getcontext issues.

This makes total sense to me. getcontext is really slow, though iirc it's less the shelling out than the xdotool command itself that takes the long time. Note that we do have some caching of that data, so one easy win here would be making that more aggressive and/or tuning TTL adaptively.

I'm fine with adding this context manager since it's a clear win where it works and doesn't break anything, but I am also somewhat skeptical of how well it will integrate into grammars.

The main reason I didn't build the proxy like this originally was because I did not (and still don't) know how to detect the final action in a chain in a general case. Considering a deeply nested grammar (vim, multiedit), I'm not sure how I'd integrate this design. At a minimum it would require a considerable redesign of the grammar (perhaps top level rules could manage the batch proxy, and then execute once lower rules have finished?)

I also don't want to force too much crap into the grammars; I'd like to have aenea grammars look as much like vanilla dragonfly ones as possible. This may be an impossible goal though.

I don't think a context manager is the right interface here (or rather, it's not general enough) since command executions can span multiple rules for a single chain. It would be of some use for actions that are making multiple calls to execute. I guess it's possible you could pass the context manager to subrules somehow, though it's not clear to me this would work.

Other possibilities I've considered to address this:

Doing something akin to what we do for the dictation capture client -- have a thread dedicated to flushing a buffer, and then do simple heuristics to balance latency and throughput (the client works by just constantly flushing the buffer, which works really well in practice to balance latency/throughput), but with the added logic to call getcontext only in between batches. This could get messy because it would need either cross-grammar communication (ew) or a per-grammar thread (ew), and I have no clue how that'd interact with Natlink.
Grammars call start and stop (or whatever) in top-level rules, and we buffer by that
We could also look at tweaking context caching further. The roundtrip to the server is cheap, it's the actually running xdotool that's expensive.

from aenea.

mzizzi commented on July 28, 2024

I should note that the context manager above was simply a proof of concept and it will break things if included in existing grammars. (Things are going to happen out of order or worse)

Thanks for taking the time. I agree that a context manager probably isn't the correct approach here. It is an easy way to test a theory though. We could implement something like this but it would require the developer to write aenea specific code and isn't the prettiest design. We shouldn't have to monkey patch our own code :)

e.g.

class TopLevelRule(CompoundRule):
    # ...
    def _process_recognition(self, node, extras):
        with SomeActionBatchingManager():
            # anything.execute()
            pass

grammar = Grammar('foo')
grammar.add_rule(TopLevelRule())

I think that smarter caching is probably the most seamless and leads to more sharing with pure dragonfly grammars. However, I am not opposed to providing tighter "action transaction" control to the developer. If/when I get time I'll take a deeper look into the matter!

from aenea.

calmofthestorm commented on July 28, 2024

I definitely think it's a real problem and you're on the right track. If you want to experiment with the caching behavior, the relevant code is:

aenea/client/aenea/proxy_contexts.py

Line 40 in 131b615

# Hate to do this, but currently we hit the server once per context,

aenea/client/aenea/proxy_contexts.py

Line 56 in 131b615

def _refresh_server():

and the config option aenea.config.STALE_CONTEXT_DELTA

from aenea.

mzizzi commented on July 28, 2024

Thanks for the details. That explains not seeing a direct 1:1 relationship between actions/get_context calls in my testing. I'll see what can be done with our current caching system and/or explicit batching.

Another solution would be to eliminate the current method of caching altogether. We could potentially subscribe to x11 events in another Thread and keep up-to-date context readily available there. Unfortunately my experience with python-xlib is limited and it's not the most nicest system to deal with.

from aenea.

djr7C4 commented on July 28, 2024

Is this also what causes the connection to occasionally drop? I've also noticed a bit of lag on occassion and would love to see a fix.

Incidentally, for what it's worth, I've also had issues with custom context classes causing lag. These are some mode sensitive contexts for Emacs that I wrote for the Emacs grammars I'm working on. Basically, they allow you to have a different context depending on what major mode and minor modes are active in the current buffer. The solution I found was to cache the current major mode and minor modes so as to prevent ten different grammars from all querying them independently. This eliminated the lag. I don't know if something similar is possible here, but maybe it's worth thinking about.

from aenea.

calmofthestorm commented on July 28, 2024

@mzizzi I really like the idea of subscribing to events. Doing an expensive poll on a regular basis with caching duct taped on is what we have now and it's not a good way to do it. I'd be open to this even if we couldn't get all the information we currently are, provided no one's using it. I'm not worried about the overhead of RPCs to the server, so if the server were subscribed to an event stream, we could get rid of all caching and just do the getcontext there.

@djr7C4 I'd have to dig into the code to remember whether having more contexts is going to cause more calls to get_context. My instinct is yes. I'd also suggest varying aenea.config.STALE_CONTEXT_DELTA to see if it has an effect.

I don't think this would cause the connection to drop. Can you provide more information about that?

from aenea.

djr7C4 commented on July 28, 2024

Subscribing to events sounds good to me (maybe via callbacks).

The issue was that my context classes run the emacsclient program to send code to emacs to get the current mode. I originally had each context do this on demand independently, which resulted in many more calls to emacsclient than were needed and this caused the connection to drop frequently. I believe that it was due to the server being unresponsive because it was stuck in emacsclient calls. I fixed it by only calling emacsclient once each time process_begin was invoked and caching the result. I don't know whether something like this is a potential problem with the aenea contexts or not.

On October 19, 2015 1:00:48 PM GMT+09:00, Alex Roper [email protected] wrote:

@mzizzi I really like the idea of subscribing to events. Doing an
expensive poll on a regular basis with caching duct taped on is what we
have now and it's not a good way to do it. I'd be open to this even if
we couldn't get all the information we currently are, provided no one's
using it. I'm not worried about the overhead of RPCs to the server, so
if the server were subscribed to an event stream, we could get rid of
all caching and just do the getcontext there.

@djr7C4 I'd have to dig into the code to remember whether having more
contexts is going to cause more calls to get_context. My instinct is
yes. I'd also suggest varying aenea.config.STALE_CONTEXT_DELTA to see
if it has an effect.

I don't think this would cause the connection to drop. Can you provide
more information about that?

Reply to this email directly or view it on GitHub:
#103 (comment)

from aenea.

mzizzi commented on July 28, 2024

I looked into subscribing to events from x11 today and came up with the following:
https://gist.github.com/mzizzi/1c5dddf90811830c3941

The idea is that we could potentially run a script similar to this in another thread on the server to query for context only when we absolutely need it. (Should be more efficient/speedier than polling or querying just in time.)

Good news: Using this script on a linux machine running the cinnamon desktop I can capture active window changed events.

Bad news: I couldn't capture events for window name changed events. Maybe this is just a cinnamon thing but it means that we'd still have to poll/query this bit of context.

Thoughts?

from aenea.

calmofthestorm commented on July 28, 2024

Thanks for looking into this, it has the potential to be a real quality of life improvement. Disclaimer: Haven't had a chance to play with it yet, just going off what you said.

There is a lot we could do with just window changed events, since simple contexts only care with app is in the foreground, but I feel like the more complex mode sensitive grammars we were discussing above (eg, your work on an emacs one, similar ideas I've tossed around for vim) are going to require both the window name and low latency for context checks (since iirc everyone was thinking of putting the current mode in the window name).

I guess one option would be to have emacs make RPCs directly to the server updating it of its current mode (this could work with multiple sessions if the server kept track of which was which, which should be doable -- say each emacs generates a GUID at start and then puts it in the title bar, and includes it with all mode switches), and then returned that as part of the context. This could be done as a plugin to the server as-is, or we could just add an RPC for this case -- I'm fine with either.

In any case, I like the idea of having a "context light" RPC that only returns bits of context we can get from async polling in another thread, so that grammars that don't care about name changes or whatnot can stop paying for it. What other events can we get besides which window is currently on top?

from aenea.

calmofthestorm commented on July 28, 2024

Another possibly horribly idea: Suppose we had the events thread poll the current get_context code in a loop, and return possibly slightly stale data in exchange for leaving the tight loop? The main thing that scares me here is different bits of a grammar getting different contexts leading to weird and difficult to reproduce behavior -- as written, I believe this is possible.

from aenea.

mzizzi commented on July 28, 2024

I like where this is headed; we're coming up with quite a few possible solutions.

Suppose we had the events thread poll the current get_context code in a loop, and return possibly slightly stale data in exchange for leaving the tight loop? The main thing that scares me here is different bits of a grammar getting different contexts leading to weird and difficult to reproduce behavior -- as written, I believe this is possible.

Don't we already run the risk of this behavior with context caching? Or maybe that's what you meant! We would certainly still run into this issue with polling/subscribing to events in another thread.

In any case, I like the idea of having a "context light" RPC that only returns bits of context we can get from async polling in another thread, so that grammars that don't care about name changes or whatnot can stop paying for it.

IMO. This plus the get_context polling mentioned above could get us pretty far and should be fairly straight forward to implement. I'll continue to experiment as time permits.

from aenea.

calmofthestorm commented on July 28, 2024

Correct. Context caching can serve stale data as is. However in most cases with current setup the first get_context in a command execution will get current data, and subsequent ones will tend to use that cached value. I'd never build a legit distributed system this way, but we are talking RPCs at the speed of speech.

I also worry that spamming the X server with get_context in a loop could have unintended effects. I'd like to better understand why get_context takes so long (ballpark 100ms off the top of my head)

Anyway as you mention, easy enough to implement and test out, so maybe the best way forward is to give it a try.

from aenea.

mzizzi commented on July 28, 2024

It's slow because we shell out to so much. We might be able to completely do away with caching if we're quick enough. I grabbed a couple native python libs for X related stuff and did some more benchmarks. Turns out we can skip using xdotool all together :)

Here is a gist containing my benchmarks comparing implementations:
https://gist.github.com/mzizzi/548962302a4f7a694a56

My results:

$ python benchmark_get_context.py
running benchmarks...
new implementation 0.000524
old implementation 0.016536
done!
$

A 30x speedup :) Keep in mind we call get_context a lot (upwards of 20 times per dictation). For the original implementation that's about ~0.3 seconds of execution time but only ~0.01 seconds given the improved version.

I'll create a branch of with this implementation of get_context shortly.

from aenea.

djr7C4 commented on July 28, 2024

Nice work. Perhaps this will fix my problems as well. I suspect that they are due to get_context being too slow.

from aenea.

mzizzi commented on July 28, 2024

We don't have to stop with get_context. Shelling out to xdotool for input is pretty slow too. We can remove most (all?) shelling out to xdotool using libxdo for direct input.

Moving to direct bindings will undoubtedly be more performant than the current implentation. I'd like to know your opinions on keeping backwards compatibility with the current server. On one hand I'd hate to surprise everyone with new python/system dependencies. On the other hand I can't see anyone wanting to continue on with slower execution!

from aenea.

calmofthestorm commented on July 28, 2024

I must admit I'm somewhat surprised shelling out is so slow. I guess when I measured this in the past I measured only the overhead of creating a "null" process, rather than actually running xdotool (which would be difficult to measure without having an alternative implementation). At any rate, it's hard to argue with actual numbers:-)

What sort of backwards incompatibilty are we talking about? If it's just dependencies, go for it. I'm even fine with some breaking changes (slight changes to semantics/etc) that require small adaptations on grammar/server side, but would prefer to avoid any regressions in features provided.

Not that I know of any compelling reason to add Yet Another Configuration Option, but it should be easy enough to support in the server by having a methods module/class that does the actual impl.

from aenea.

calmofthestorm commented on July 28, 2024

Also if I'm reading this correctly, we are currently discussing a polling approach using a library, rather than the event driven approach above. Is this correct? If the calls are as fast as the numbers above I see little reason to proceed with the complexity of an event driven implementation

from aenea.

mzizzi commented on July 28, 2024

Also if I'm reading this correctly, we are currently discussing a polling approach using a library, rather than the event driven approach above. Is this correct? If the calls are as fast as the numbers above I see little reason to proceed with the complexity of an event driven implementation

Agreed 100%. If we're quick enough to get context between every action then we should absolutely do it. I'd like to stay away from any design that could potentially introduce those "hard to reproduce" bugs. Adding threads is a good way to introduce those sorts of things.

from aenea.

calmofthestorm commented on July 28, 2024

Still not sure how I feel about making one RPC per grammar context (there could be quite a few) needed, nor about serving stale data, but I guess we can revisit that when and if it's an issue.

from aenea.

mzizzi commented on July 28, 2024

@calmofthestorm what do you think is the ideal situation? It sounds to me like you're leaning towards removing caching and querying context whenever the client pleases. Provided that the RPCs are quick enough, I'm fine with that.

Regardless, I'll start with the low hanging fruit and replace the parts of the server that make sense. We can look at the client/grammar side of things afterwards.

from aenea.

calmofthestorm commented on July 28, 2024

I think the first step should be replacing get_context in the server, provided we can do so as a drop in replacement with no regressions (i.e., we can get all the same data). If there are regressions, we should talk more, look around to see if anyone's using that data, etc.

After that, we should revisit the current client-side caching of the context information, and play around with reducing it or eliminating it entirely.

Once we've done all that we can look into replacing the other uses of xdotool. In my opinion this is much lower on the payoff/effort scale, since get_context is called both dramatically more often and is much slower than any of the other calls. If it's a drop in replacement that's faster I don't object, if it's not drop in we need to evaluate the backwards compatibility angle. I'm fine with breaking things, but only of the payoff's worth it.

from aenea.

mzizzi commented on July 28, 2024

As it turns out, shelling out to xdotool for key_press is a bit slow as well.

A simple key_press(key='1', modifiers=('control',), count=5) takes about 0.09 seconds. This means that a simple action chaining 5 keystrokes takes the server roughly half a second to execute. I've come up with an alternate implementation using the libxdo libraries from #105.

Unfortunately I had to submit a PR to the libxdo bindings to get a bug fixed. Pending rshk/python-libxdo#3 I'll create a new branch for aenea with an alternate implementation.

from aenea.

calmofthestorm commented on July 28, 2024

Where are you getting half a second? Keep in mind that multiple calls to key_press in a single multiple_actions should generally only result in running xdotool once and never more than a small number of times. If you've observed otherwise that's a bug.

It definitely doesn't run xdotool for every key, and if it does that's something I should fix.

from aenea.

mzizzi commented on July 28, 2024

I think multiple_actions covers these cases:

Key('1,2,3,4,5').execute()

However, it's common to construct actions like so:

action = Key('a') + Text('%(something)s') + Key('b')
action.execute()

In the first case the actions should be wrapped up in multiple actions. In the second case we make 3 calls to the server and shell out on each. :)

That being said... my numbers were misleading. I'm seeing closer to 0.25 seconds from the client to server. Drop this into MacroSystem and run it to test things:
https://gist.github.com/mzizzi/7208f21481cc7b3c72f8

from aenea.

calmofthestorm commented on July 28, 2024

I'm traveling at the moment with limited internet. I'll try to take another look at this after the US holidays.

from aenea.

mzizzi commented on July 28, 2024

Has anyone had a chance to take a look at this or #105 ?

from aenea.

calmofthestorm commented on July 28, 2024

Taking a look now. Sorry for the delay -- I wanted to be sure I could take the time to test this thoroughly ,since it's a major change with potential to break things.

from aenea.

Long action chains take a significant amount of time to execute about aenea HOT 27 OPEN

Comments (27)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent