Looks like in each iteration of the Lambda function, it runs only for one ASG, make th

Make Agent Run for all ASG in each iteration, packaging discussions about autospotting HOT 18 CLOSED

leanercloud commented on May 12, 2024

Make Agent Run for all ASG in each iteration, packaging discussions

from autospotting.

Comments (18)

roeyazroel commented on May 12, 2024

I've update the Event Trigger in CW to run every 1m. If my ASG are very dynamic (lots of scaling actions), any reason why shouldn't I keep it in 1m?

Thanks

from autospotting.

cristim commented on May 12, 2024

It should run in parallel for all ASGs in all regions(anything else is a bug), but it only takes one action per ASG, which is intentional in order to reduce the chance of it replacing too many machines at once which may bring down your environment if your application isn't really ready for too many replacements.

Regarding the increased frequency I don't really mind, feel free if it works for you, although at the moment I'd likely need to pay more pennies for that extra bandwidth due to the self-update checks which happen on each run. I still have enough AWS credits for the next few months but once I start paying with my own money I may do something about the currently wasteful self-update mechanism.

I haven't really tested it in such a high frequency mode, so please report any issues you may find.

from autospotting.

roeyazroel commented on May 12, 2024

I've configured it for 1m to "onboard" faster, and then change it back to 5m. It worked perfectly, we had heavy load during night and Spots instances where all over the place working like magic. 👍

In the long term, how do you plan do implement it without depending on your account?
What exactly run on your credits?

Thanks,
Roey

from autospotting.

cristim commented on May 12, 2024

Thanks, it feels great to see such feedback :-)

The plan is to include more in the Lambda zip file if it gets expensive. At the moment it only includes the Python Lambda shim, while the static binary written in golang is downloaded on demand if the latest version is not available in lambda function's /tmp, and some content is downloaded from the Github CDN on every run, which may be costly for them as well and I may not be able to do it for long term. The benefit is it allows users to get updates automatically without performing any CloudFormation stack changes.

The check for determining the latest version and the download on of the binary both require some bandwidth and requests, which show up on my CloudFront bill, the runtime data likely shows up on Github's bill, but as long as the traffic is small they may not care about it. With the 5 min frequency it costs me about 20c/user/month, and since so far I only have a handful of active users and enough AWS credits, that's not a problem, but if this really takes off I may need to do something about it and Github may block it as well.

An easy option would be to include everything I have at build time and stop downloading anything at runtime, but that will break auto-updates which I'm not sure I'd like to do.

A nicer solution may be creating an S3 bucket in user's account, where the binary and some runtime dependencies are updated once a day. If the user doesn't want auto-updates, it could control them by blocking write access to that S3 bucket using a IAM policy change.

from autospotting.

roeyazroel commented on May 12, 2024

I think that I'll prefer to be out of date, and be responsible to update the stack to the latest update rather than paying the on demand prices. Maybe consider to give this decision to the users.

Anyway, I've some ideas. Hopefully I'll be press the fork button in couple of days/weeks. 👍

Thanks a million!
Roey

from autospotting.

cristim commented on May 12, 2024

I agree, as an Ops guy I totally agree that auto-updates are scary and when unsure they should be left to the user, but as a developer back when I was rolling out fixes pretty much on a hourly basis, it was really convenient to just commit and have it auto-update everywhere out of the box without announcing the users to update their stacks.

But considering that the target audience of this tool is mostly Ops'y people, I think dropping auto-updates would actually be desirable once the tool is more or less stable, which as per your feedback, I think may have already happened.

I'll try to come up with a change that bundles everything and removes auto-updates, stay tuned. This entire discussion should actually be part of #6

from autospotting.

cristim commented on May 12, 2024

I just tested and the replacement process is correctly handling multiple AutoScaling groups in the same region concurrently, although only one action is taken per function run per group - according to the expected behavior - so this bug is essentially invalid.

I recently implemented the self-contained packaging method I mentioned before, removing the self-update mechanism entirely, so from now on users would need to update on their own. I will document the update procedure in the Readme file.

This allows users to set any event frequency without much additional costs: only more Lambda execution time and bandwidth consumed while performing API calls, which is negligible or well within the free tier, and it is no longer going to generate any costs on my infrastructure and will no longer misuse the Github CDN.

In 29fcd3c929605 I parameterized the event frequency to let people set any value they like, but I would keep the default of 5 minutes, which I still think it's a good out-of-the-box compromise between speed and safety.

from autospotting.

cristim commented on May 12, 2024

I thought that maybe I should reopen this until I get confirmation from the user that everything is fine :)

@roeyazroel please give this a try when you have some time, and let me know if you run into any issues.

I hope the documentation for software updates is clear enough.

from autospotting.

roeyazroel commented on May 12, 2024

I'm not sure why, but i think when i ran the upgrade to the CF stack, the lambda CW trigger was added again (the same one). Can you check it? Not sure if it was because my CW trigger was in disable during the update as we had some AWS issues.

from autospotting.

roeyazroel commented on May 12, 2024

Another thing - think about using the Github Tags, as the SHA isn't very "friendly" and not easy to "see" that you aren't up to date.

from autospotting.

cristim commented on May 12, 2024

Thanks for testing it.

In theory CloudFormation shouldn't duplicate such resources, unless you created the more frequent event resource outside the CloudFormation stack, but it should take over the edited resource and change it if you just edited it in place. Can you confirm that wasn't the case? I can have a look into the lambda trigger creation tonight, unless you figure it out first.

Interesting idea to use tags, I'm thinking maybe to use the Travis build number as tag name, what do you think?

from autospotting.

roeyazroel commented on May 12, 2024

Should be good enough, tags are more friendly for "human eyes" and for versioning.

Not sure what want wrong after the update, but now it's running at all. still trying to understand why.

from autospotting.

roeyazroel commented on May 12, 2024

False alarm, working fine.

But it's look like that update of the stack does create another trigger in the lambda function.

from autospotting.

cristim commented on May 12, 2024

Do I understand correctly that every update creates yet another event generator that runs the lambda function? This looks to me like a bug in CloudFormation.

from autospotting.

cristim commented on May 12, 2024

I tested the CloudFormation stack updates and the resource is correctly updated according to the given stack parameters, updating the event generator to the defined frequency. In my case I didn't notice any additional CloudWatch event generators being created.

I also implemented support for using Travis CI build numbers for software updates, and I documented this in the readme.

@roeyazroel Please have a look and let me know if this is good enough.

from autospotting.

cristim commented on May 12, 2024

I now understood what you meant by duplicating the event generator after a stack update, I think I just experienced the same issue. The resource doesn't get duplicated, but the Lambda function gets subscribed to the same event generator twice.

It looks much like a CloudFormation issue to me. If I see it again, I'll report it as an AWS support case.

from autospotting.

cristim commented on May 12, 2024

I deleted one of them and the function wouldn't be executed anymore. It recovered after disabling and re-enabling the remaining event in the Lambda function's list of triggers. So I assume having both of them is harmless(although not so nice to see in the console), since only one of them would actually launch the function. But this is still a bug that should be fixed.

from autospotting.

cristim commented on May 12, 2024

I can't see this happening anymore, likely AWS fixed it, so I would like to finally close this issue.

@roeyazroel please reopen if you are still having issues.

from autospotting.

Make Agent Run for all ASG in each iteration, packaging discussions about autospotting HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent