Context At Etsy, we've gone all-in on using this repository and th

Thanks for your insight (and speedy reply) <a class="user-mention notranslate" data-ho

Compound API interactions for cookbooks included by `fb_init` are difficult to implement,about facebook/chef-cookbooks

Comments (25)

joshuamiller01 commented on April 28, 2024 2

Internally, compound API interactions are fairly rare for us. We have a few workarounds we employ, some of which are basically the options you discussed:

for a specific problematic cookbook attribute, teach that cookbook to optionally accept a Proc (i.e. a reimplemented one-off handling for Chef::DelayedEvaluator), and update setters to use a Proc. See the only_if API value for fb_timers
use a recipe which runs early in converge together with a chunk of converge time code (whyrun_safe_ruby_block, custom resource, etc)

My favorite option is what we internally call a "settings" cookbook (e.g. "fb_cron_settings"), which has a recipe which goes immediately before the "implementation" cookbook (e.g. "fb_cron") in the init recipe. The "settings" cookbook is the spot where business logic can live, at the last moment before the implementation occurs. A whyrun_safe_ruby_block in the "settings" cookbook can then read from some other cookbook API (set at compile time), and set "fb_cron", and leave no races for new API values to have creeped in between the start of converge and whenever "fb_cron" runs. We use this pattern for other reasons too (independent of compound API interactions), like as a useful hook to validate or mix-in API values.

This might not be a great answer if the stack_etsyweb cookbook is one of many, each of which would need to have handling in the "fb_cron_settings" recipe. But it might work well enough if there's a small number.

Certainly a more general solution using Chef::DelayedEvaluator would be preferable. I haven't worked a lot with this, but I expect us to look at it much more this year. I think it'll solve some of the compound API problems, but not all (e.g the validate / mix-in use case)

from chef-cookbooks.

joshuamiller01 commented on April 28, 2024

two stack cookbooks with a common dependency included in a single hostgroup

I read this situation as: there are two (or more) recipes which are late in the run_list, and they're both including some third recipe (e.g. "fb_apache"), in a way that makes the order of when fb_apache will be included unclear / hard to reason about. That makes API handling (which depends on the ordering) tough.

The best answer here (though perhaps not the most convenient), is to put that third recipe into etsy_init directly, with an API that allows it to be enabled / disabled, and a default state of disabled. Then the hostgroup recipes can toggle on / off as appropriate, neither influencing the order of when fb_apache is set up.

This reduces the likelihood of compound API complications, and creates the natural hook for a future "fb_apache_settings " (to handle compound APIs interactions if needed). Arguably more importantly, it codifies the order of when fb_apache runs, which means you can explicitly handle dependencies. While dependencies weren't part of your question, it inevitably becomes a big problem in a dynamic and/or sufficiently complicated environment, and the more codified they are, the easier your life will be. Put into more concrete example terms: fb_apache will need to happen after some things are set up already (like configuring the mount / filesystem it will run on), and some things will need to happen after fb_apache has been done.

from chef-cookbooks.

ericnorris commented on April 28, 2024

Thanks for your insight (and speedy reply) @joshuamiller01!

for a specific problematic cookbook attribute, teach that cookbook to optionally accept a Proc (i.e. a reimplemented one-off handling for Chef::DelayedEvaluator), and update setters to use a Proc. See the only_if API value for fb_timers

This makes sense, but I think our biggest reservation is that we'd probably need to make changes in many Facebook cookbooks, and we felt like that might be a lot of effort to upstream, nor would it necessarily be desirable for you all to merge (especially if compound API interactions are rare for you).

My favorite option is what we internally call a "settings" cookbook (e.g. "fb_cron_settings"), which has a recipe which goes immediately before the "implementation" cookbook (e.g. "fb_cron") in the init recipe.

This is pretty interesting! To make sure that I understand this correctly, do you mean something like this?

In cookbooks/fb_init/recipes/fb_cron_settings.rb:

whyrun_safe_ruby_block 'set foo cron' do
  block do
    if node['fb_init']['foo_cron']['enabled']
      # ...
    end
  end
end

I think you're right that we might want to avoid this pattern as-is since we might have a large number of these, considering that compound API interactions may be more common for us.

We use this pattern for other reasons too (independent of compound API interactions), like as a useful hook to validate or mix-in API values.

If you have a chance, I'd be curious to get a more specific example of this! I'm having a hard time imagining a scenario for this at the moment.

Certainly a more general solution using Chef::DelayedEvaluator would be preferable. I haven't worked a lot with this, but I expect us to look at it much more this year.

That's good to know, and if you're interested at all in talking more about this in the future, you can reach me in this issue or at my first initial and last name @ the company I work for!

I read this situation as: there are two (or more) recipes which are late in the run_list, and they're both including some third recipe (e.g. "fb_apache"), in a way that makes the order of when fb_apache will be included unclear / hard to reason about. That makes API handling (which depends on the ordering) tough.

Yeah, and that these also present the same compound API interaction issue as one with a recipe included in fb_init. To be fair, I expect this will be a rare or entirely non-existent occurrence.

Your fb_foo_settings suggestion to solve this and the fb_init issue makes sense to me, but I'm attracted to the idea of keeping fb_init as lean as possible. Additionally, I think that some of the other possible solutions (like using Procs / DelayedEvaluators, or making a new whyrun_safe_ruby_block-esque thing that works regardless of order) have a nice property that other developers can follow a single "rule" - that you need to use X for compound API interactions - instead of multiple, like "use whyrun_safe_ruby_blocks here, unless the ordering looks like this, and then instead do this other thing".

I'm not criticizing the fb_foo_settings approach though, it may just be that the way we're structuring our cookbooks is different enough that we're running into this problem more. Additionally, we're expecting all of our VM-based infrastructure developers to migrate their systems over to this greenfield repository, so we may also be over-indexing on reducing the support burden of helping people when they run into an issue like this.

As I write this and think about our own patterns, I also wonder if giving up the "last write wins" mindset for stack cookbooks only might be a better overall solution. While we've designed them to be reusable, we don't necessarily need them to be layerable, like we would with a fb cookbook. That is, we want fb_apache to be configured all throughout the compilation phase, but our stack cookbooks are always going to be configured and then immediately included, e.g.:

# in cookbooks/hostgroup_foo_api/recipes/default.rb

node.default['stack_foo']['bar'] = ...

include_recipe 'stack_foo'

...so maybe it's okay for us to read stack_foo attributes non-lazily, since we don't expect anything other than the hostgroup_foo_api cookbook to have an opinion on how the stack_foo cookbook should be configured.

Anyhow, I deeply appreciate you taking the time to answer; I realize I've written a great deal and this isn't exactly an issue with this repository. Feel free to close this issue, and like I said above I'm open to chatting more here, or via email. Thanks again!

from chef-cookbooks.

joshuamiller01 commented on April 28, 2024

do you mean something like this?

Basically, yeah; we make a cookbook named fb_cron_settings, with a recipe like you outlined, and then in fb_init we have:

include_recipe 'fb_cron_settings'
include recipe 'fb_cron'

more specific example of this

Validation can be things like "the API value for some directory said 777 permissions, but here at company X we don't like that cause we care about our security, so we fail if we get asked for that". Mix-in might be something like "for every requested cron job, we'll see if there's a MAILTO, and if not, add one". Or maybe there's some API entry we really want and we don't want to allow somebody to delete it (basically a hook for "the last write").

but our stack cookbooks are always going to be configured and then immediately included,

Yeah - at the end of the day, if you carefully handle the ordering, you don't have to play by the 'write-at-compile, read-at-converge' rules. It really only matters to write before the implementation, and read after all the writes (if you're chef-savvy enough and have a small enough team writing the code and who understand which bits are run when).

from chef-cookbooks.

ericnorris commented on April 28, 2024

basically a hook for "the last write"

That makes a ton of sense, thank you.

if you carefully handle the ordering

Right, we've discussed this a bit internally and will think more about whether we could statically enforce this for our stack and hostgroup cookbooks, so that they have to look something like what I shared above. That said, we're still on the fence about it since it's nice to just uniformly apply the "Facebook philosophy" everywhere, and to know that it will work (ignoring the compound API stuff). Also, it doesn't solve the problem for non-stack cookbooks (like interactions between fb cookbooks), of course.

Back to original suggestions though, I'm curious what you all think of implementing something like how the fb_helpers_reboot resource works in fb_init:

# in cookbooks/fb_init_sample/libraries/node.rb

class Chef
  class Node
    def compound_api_interaction(&block)
      default['fb_init']['compound_api_interaction'] << block
    end
  end
end

# in cookbooks/fb_init_sample/recipes/default.rb
# *before anything else*

whyrun_safe_ruby_block 'process compound API interactions' do
  block do
    node['fb_init']['compound_api_interactions'].each(&:call)
  end
end

# now it is okay to include_recipe 'fb_cron', etc.

...and then when necessary, a recipe could do something like:

# in cookbooks/fb_foo/recipes/default.rb

node.compound_api_interaction do
  node.default['fb_cron']['jobs']['foo_cron'] = {
    # ...
  }
end

There are many different ways this could be implemented, however, so I'm not particularly attached to the literal code above - it could be something like calling FB::Init.compound_api_interaction, which would then append to a class variable that fb_init could then iterate over, etc.

If this idea sounded at all reasonable to you and the team, I'd be happy to submit a PR where we could nitpick the implementation details. We'd prefer to be aligned with this repository and the patterns inside where possible, but if this isn't something you'd like to see upstreamed that's fine too.

from chef-cookbooks.

joshuamiller01 commented on April 28, 2024

def compound_api_interaction(&block)

This is clever and I think it'd be a nice option; I support the endeavor.

But to be clear; it wouldn't solve for the validation / mixin use case, because there's a big gap between the very start of converge phase and the point when the actual implementation recipe runs. But the general case you're pursuing - "drive an API write based on reading another API value" should still work.

Another gotcha while I'm thinking about it; if one of the compound_api_interaction blocks reads from one API, and another of the compound_api_interaction blocks writes to that API, we're back to the original problem. But that's hopefully a corner case, which documentation is enough to cover.

@dafyddcrosby @jaymzh feel free to weigh-in.

from chef-cookbooks.

ericnorris commented on April 28, 2024

This is clever and I think it'd be a nice option; I support the endeavor.

Thanks!

But to be clear; it wouldn't solve for the validation / mixin use case, because there's a big gap between the very start of converge phase and the point when the actual implementation recipe runs.

Ah, yeah, apologies for not being explicit in my intentions but you are entirely correct; I would still expect that use case to need the fb_foo_settings pattern.

Another gotcha while I'm thinking about it; if one of the compound_api_interaction blocks reads from one API, and another of the compound_api_interaction blocks writes to that API, we're back to the original problem.

That sounds right to me, and I agree that it's a corner case that would be worth documenting - if I'm thinking about this correctly, compound compound API interactions are unsafe, or at least depend on careful ordering.

Maybe more explicitly, it's okay for a cookbook to read from its own API, but not necessarily another's (e.g. it can read from node['fb_foo'] but shouldn't read from node['fb_cron']), and it's okay to write to a non-compound API (like fb_cron), but not a compound API. In those cases, the ordering would matter.

@dafyddcrosby @jaymzh feel free to weigh-in.

Thanks for tagging others in! I already have a rough working copy, so I may file a PR before hearing back from them; I'll link back to this issue if I do so.

from chef-cookbooks.

jaymzh commented on April 28, 2024

Assuming i understand this, I'm a 👎 on this, as it currently stands.

In the fb_foo_settings case, the "Chef team" - whoever owns the core APIs - get's to choose: We allow/expect these compound interactions. That's really important because these sorts of things can break the entire last-writer-wins API model.

Consider today this case:

etsy_init - creates node['fb_cron']['jobs']['foo'] which say, checks on some common $badthing that happens, say looks for some tempfile to cleanup
esty_service_bar - overwrites node['fb_cron']['jobs']['foo'] because despite core libraries doing $badthing in $badplace, bar has to move it to $newbadplace
etsy_service_womp - modifies node['fb_cron']['jobs']['foo'] further because it uses bar, but writes more data so wants it to happen more often

Lets assume that womp looks a lot like bar, and so womp includes bar, and then tweaks it. Today, this works.

Now, lets say that we implement this. Now bar can come along, and do a complex interaction which ovewrites node['fb_cron']['jobs']['foo'] completely. Now womp can't tweak foo anymore.

In the event that bar is owned by the core team, this might be OK, because perhaps it's some critical security thing we always need running.

But, adding this API allows anyone anywhere to be the last-writer even after the real-last writer. It's re-creating node.override, which we don't use for this exact reason.

I might be missing something, and if I am, my apologies.

That said, adding more lazy-style attributes to core cookbooks is something I'm all about. I started that trend with fb_crons only_if, and it's been used in tons of great places to enable great things.

from chef-cookbooks.

joshuamiller01 commented on April 28, 2024

I think the intent behind compound_api_interaction is as a generic enablement mechanism, where today there is none (well, none that are both generic and correct). That's pretty useful.

But I'm mostly thinking about this as a a relief valve for the one-offs API complications which come up. If compound API interactions were actually commonplace in your environment, and you were building layers on top of that, I can definitely imagine it turning into a node.override game.

Still - AFAICT, the available general-purpose options are either to make exceptions to last-writer-wins (which is effectively what all these are discussing), or live without compound API interactions.

from chef-cookbooks.

jaymzh commented on April 28, 2024

Still - AFAICT, the available general-purpose options are either to make exceptions to last-writer-wins (which is effectively what all these are discussing), or live without compound API interactions

I don't think it's that black and white. I think complex API interactions need to be handled EITHER BY the underlying API (by making it lazy), which still preserves a last-writer wins, but gives access to "fall thru" OR they need to be explicitly added by the "administrator" (by fb_foo_settings) which again, by adding to fb_init, you're controlling what can come and stomp all over your config.

from chef-cookbooks.

ericnorris commented on April 28, 2024

Thanks for taking a look @jaymzh!

I wonder if your example might be addressed by having etsy_init expose an API for configuring the cron, since it expects to install the cron and it's okay with etsy_service_bar and etsy_service_womp making changes anyways. That is, in your example, I'd expect this to use the fb_cron_settings pattern, and it'd use a whyrun_safe_ruby_block to install the cron before fb_cron converges.

Even if we slightly tweak your example by dropping the etsy_init part though, your point still stands: if etsy_service_womp includes etsy_service_bar, today it could tweak the node['fb_cron']['jobs']['foo'] setting that etsy_service_bar installs.

While that does work today, the tradeoff is that etsy_service_bar couldn't itself expose an API for tweaking node['fb_cron']['jobs']['foo'], which is essentially the problem we're running into. Exposing an API makes sense if etsy_service_bar is abstracting away the "bar" service - maybe the "bar" service can be deployed as an internal API cluster, a public facing API cluster, a public facing web cluster, etc., and the etsy_service_bar cookbook needs to change the foo cron to match.

@joshuamiller01 noted that way to solve this would be to have the Chef team define a etsy_service_bar-specific compound API interaction in etsy_init, but I think that's somewhat undesirable for the reasons I mentioned above. Furthermore, this would also have the same issue that you described - etsy_service_womp couldn't tweak the foo cron because etsy_init would sneak in the last write at the convergence phase, unless etsy_init defined a full API for tweaking the cron at run time.

Which I think is the answer to your point about etsy_service_womp in a hypothetical world with a compound_api_interaction pattern: if you are defining a compound API, you should generally make it so that the compound API allows for tweaking the thing that you're going to write as much as is reasonable.

That said, adding more lazy-style attributes to core cookbooks is something I'm all about. I started that trend with fb_crons only_if, and it's been used in tons of great places to enable great things.

I am for this approach as well, but there's two things worth noting about this:

It may be tricky to write code that can expect Procs everywhere - in our most direct case we'd want the command sub-attribute for node['fb_cron']['jobs'] entries to be a Proc, but theoretically each sub-attribute could be a Proc, or maybe even the entire entry hash could be a Proc... I suppose you could draw the line somewhere, but this would involve making a lot of changes in this repository and deciding what is Proc-worthy and what isn't.

Furthermore, sometimes you'd have other difficult decisions, like what if you wanted to expose a compound API that optionally added a NTP server to the fb_chronyd server list array? If you just append a Proc, then fb_chronyd needs to handle the case when a child Proc returns nil or some other sentinel value to indicate not to add anything at all. If you make the whole thing a Proc, you've broken the API and so now others cant append unless you had some sort of hybrid ProcArray thing.

This would increase the burden of an fb_foo cookbook writer, which is already decently high, and so I think that, to me, it's not a clear-cut solution to the compound API issue. We'd likely need to propose a lot of changes to fb cookbooks in order to support this, and there could be differing ideas on what is Proc-able and what wasn't.
It kinda also has issues of its own, like if etsy_service_bar changed the command to a Proc, then etsy_service_womp can't append to the command string anymore (if that was its goal, like adding an extra flag or something). It could overwrite it entirely, but then it'd be duplicating the logic from etsy_service_bar. etsy_service_bar could expose an attribute to customize the command, but it could also do so in a compound_api_interaction world.

If compound API interactions were actually commonplace in your environment, and you were building layers on top of that, I can definitely imagine it turning into a node.override game.

I agree that if this was becoming any more complex than the examples I've given, we've gone too far.

Still - AFAICT, the available general-purpose options are either to make exceptions to last-writer-wins (which is effectively what all these are discussing), or live without compound API interactions.

I agree, and we really don't want to live without compound API interactions.

I don't think it's that black and white. I think complex API interactions need to be handled EITHER BY the underlying API (by making it lazy), which still preserves a last-writer wins, but gives access to "fall thru" OR they need to be explicitly added by the "administrator" (by fb_foo_settings) which again, by adding to fb_init, you're controlling what can come and stomp all over your config.

I hopefully made a case for why making the underlying API even lazier (by expecting Procs) is difficult, and also that I don't think you'd necessarily be giving up control over what can "stomp all over your config" when you allow for compound APIs (etsy_init could always win with the fb_foo_settings pattern, or you could define your compound API to allow for tweaking), and why fb_foo_settings isn't satisfying for a "bar" service-specific API, but I think it's worth noting that I don't think this is an either / or scenario between Procs and having compound_api_interaction.

I'd argue that we kind-of have a pseudo-compound_api_interaction behavior today, e.g. fb_swap writes to fb_fstab, and that swap can't be tweaked by any other cookbook. This only works because of the ordering in fb_init, and formalizing compound_api_interaction would (I think) make that interaction more clear - in fact it'd now work regardless of fb_fstab ordering - and have no worse effect than it does now. That said, in parallel, things could become more Proc-aware, which would reduce the need for using a compound_api_interaction config. (Though as I write this, I realize that the fb_stab link is another example of needing to append a Proc, which I think is kinda tricky semantically.)

Anyhow, apologies for the wall of text. I think that the facebook/chef-cookbooks philosophy is a powerful and pretty game-changing (for us) pattern. We'd like to try and stay as aligned as possible, so I again appreciate both of your time.

from chef-cookbooks.

jaymzh commented on April 28, 2024

Re: lazy stuff

I think only leaf-node things can/should ever be lazy. Then, much like an only_if, it can be a proc or value. Yes, that does mean that if someone writes a Proc, someone would have to think carefully about what to do, but this is already a requirement we put on a caller. And they have many options to build on that proc, e.g.:

old_oi = node['fb_cron']['jobs']['foo']['only_if']
node['fb_cron']['jobs']['foo']['only_if'] = proc { old_oi.call && some_test }

I don't think making an entire job a proc makes much else, it's like allowing something to be a hash or an array, things go bonkers.

But I think it would be pretty trivial to encapsulate our only_if logic into something like DelayedEvalutor, and just use it a lot, throughout our cookbooks.

fb_init Ordering

We've generally found there's a handful of core APIs we expect other APIs to be able to write to. Cron is one of them.

Swap is a special case, we expect if you're using fb_swap, it owns swap, nothing else, does. It just happens to leverage fb_fstab for that. It would be crazytown to debug if you set things with fb_swap that got modified by a different API. Almost any such interaction is crazytown, and so each one is built with care.

It IS possible to build encapsulations of compound APIs. etsy_custom can internally have etsy_foo and etsy_bar and make sure they're ordered correctly so that they can just be included by a downstream customer. Of course, then they can't really be core cookbooks.

compound_interactions

The problem with this, is it provides a generalized override mechanism. There already is one - it's node.override, and that's why we hard-ban it at Meta (linter's won't let you commit with it, and there are no exceptions). We could create a similar rule and prevent anyone from every calling node.compound_api_interactions, but that feels a bit like chasing the same tail we chased before.

But I do see the problem you're trying to solve, so allow me to spitball a bit.

I wonder if we could have some sort of white-list in the Chef::Config, for each cookbook, which cookbooks were allowed to be compounding APIs. This would of course mean it's a bit less flexible for you, but it would at least add some safety to it. Like maybe:

fb_pattern.compound_apis = {
  'fb_cron' = > [
    # list of cookbooks that can override it
  ],
  ...
}

node.compound_api would then need to look at the call stack. Which is a ugly in its own right.

And yes, I'm aware that Chef::Config can be changed from within a run, but at least at that point you're OBVIOUSLY breaking the rules. I have to noodle on how I'd feel about that, but I'd definitely feel less bad about it.

We could even have an fb_pattern.enable_compound_apis in the config where people could wholesale turn it off if they wanted.

Just thinking outloud here.

One last thought.... in many cases, every place someone uses node.compound_api, they're probably working around the lack of a feature in a core cookbook. While there are cases where lazy, only_if, and other features may not be enough, my instinct is that not only would it cover 75% of cases, but it would also just generally help lots of people doing a variety of things today who don't necessarily need the compound API interactions. So even if we do go down that route, it feels better to me to at least start with the low-hanging fruit of adding lazy support for cronjob attributes, only_if almost everywhere, etc.

And finally, while I stay heavily involved in this as the original author of most of this... I am not at FB/Meta anymore and don't have final say.

from chef-cookbooks.

ericnorris commented on April 28, 2024

I think only leaf-node things can/should ever be lazy.

I am curious about how to handle things like teaching the fb_chronyd server list array to handle Procs - not literally just this of course, because it's maybe weird to want to optionally add an extra server, but the general pattern of "the API is an array and I'd like to optionally append something to it".

Sure, fb_chronyd could expose a separate API with some sort of only_if thing for each server list entry, but I'm nervous about putting a burden on a fb cookbook developer of needing to think of every possible way someone could want to lazily configure the API.

fb_apache might be another example, where every virtual host would need an only_if, and every sub-attribute could be a Proc. Maybe this isn't insurmountable, but it would potentially add a decent amount of noise.

I'm also interested in how this would apply to APIs where they just take a config that is somewhat transparently rendered into a file, like (but not limited to) fb_consul. It's not immediately clear to me where the leaf node is, or how a Proc would allow someone to optionally add a chunk of config.

Swap is a special case, we expect if you're using fb_swap, it owns swap, nothing else, does.

I think there's an interesting thing here. In your first foo cron example - with etsy_init, etsy_service_bar, and etsy_service_womp - I think that etsy_init owns the foo cron, and so it can and should take the initiative to force the foo cron to exist, but expose an API to change it in the ways that it deems reasonable (like making it happen more frequently). This should be true with or without a compound_api_interaction option, but I'll admit it's slightly more work in a world where compound_api_interaction exists.

In my modified example - with just etsy_service_bar, and etsy_service_womp - I think of it the same way as you described for fb_swap. etsy_service_bar owns the foo cron, and nothing else does. If we want the foo cron to be further configurable, it should define an API for it, just like what fb_swap would do if it wanted to e.g. allow for customizing the priority of the swap file.

etsy_custom can internally have etsy_foo and etsy_bar and make sure they're ordered correctly so that they can just be included by a downstream customer.

One concern I have - and it may not apply in your environment - is that I can imagine a situation where it is "impossible" to have a working configuration with just ordering alone. That could be like the situation I mentioned in the original post of having two stack cookbooks that both include fb_apache, each of which exposes a compound API. In that example, if a host wanted to include two "stacks" (which I believe is reasonable), the latter stack would not be able to make changes to fb_apache.

One answer would be to move fb_apache to etsy_init, another may be to remove fb_apache from both and make the caller responsible for including it, but it's somewhat unsatisfying that these can't compose easily.

The problem with this, is it provides a generalized override mechanism.

I think I understand this problem generally, but I'm also having a hard time re-articulating it. Going back to your example again, the concern is that etsy_service_womp can't make changes to the foo cron installed by etsy_init that is modified in a compound way by etsy_service_bar.

On the other hand, sometimes it's okay that something can't be overridden, like with fb_swap, and that might be the case here with etsy_service_bar - if it's defining a compound API that configures the cron, it is taking ownership of the cron. If it's not defining a compound API and just configuring the cron with a fake compound_api_interaction so that no one else can change the cron, that's likely bad. Ignoring that though, there's a world in which etsy_service_womp needs to deal with the fact it can't make a change there isn't an API for.

In that world, one way that etsy_service_womp could get around the lack of API is if it also registered a fake (since it wouldn't depend on an etsy_service_womp API) compound_api_interaction to "override the override", so to speak. That would also be bad; while being able register a later compound API interaction would still follow the letter of the "last-write-wins" rule, it maybe doesn't follow the spirit of it, which was to avoid the need for node.override (and probably all of the other numerous levels).

So I'd summarize the possible bad scenarios as:

A cookbook defines a fake compound API to set attribute X solely to prevent a later cookbook from changing it.
A cookbook defines a compound API for a good reason to set attribute X, but this still prevents a later cookbook from changing it. (This one could also be a good thing, depending on the circumstance.)
A later cookbook defines a fake compound API to modify attribute X solely because an earlier cookbook used a compound API to set attribute X.

But I do see the problem you're trying to solve, so allow me to spitball a bit.

I appreciate that you can understand our situation! I'd like to give just a bit more context around how we got here, if it helps: we've taken an approach of categorizing our cookbook implementations into "Facebook-style, un-opinionated, potentially upstream-able", and then building on those in "Etsy-opinionated wrappers" (which we also write in the Facebook style, but we never expect to upstream them). We find that this categorization is really useful to make sure stuff is fully configurable, rather than being codified to whatever it was in our old Chef repository.

So we would have an fb_memcached, for instance, that we could theoretically upstream, and then we'd have a stack_etsyweb_memcached, which "drives" the fb_memcached cookbook by configuring it in a way that we think is suitable for our monolith, and then including it.

Since stack_etsyweb_memcached is also a cookbook, it's natural for it to have its own API, which may result in different writes to various fb cookbooks, etc., and hence compound API interactions are somewhat common.

I might be repeating myself from the original post, but I thought it was worth emphasizing that we've been approaching development in this way.

I wonder if we could have some sort of white-list in the Chef::Config, for each cookbook

I'm not familiar with using Chef::Config for generic configuration, but I think I understand what you're suggesting.

On the other hand, I'm wondering if there's a more direct way to prevent the "bad" things I tried to summarize above, but it almost seems like a catchable-in-code-review-only thing. This isn't a new problem, either, because - as much as I wish this wasn't true - a lot of writing a "good" Facebook pattern cookbook is not something you can statically analyze.

I'm somewhat partial to the idea that we could come up with a rule for this rather than rely on an allow list, or to just rely on code reviews, but I don't think I feel overly strong about this.

One last thought.... in many cases, every place someone uses node.compound_api, they're probably working around the lack of a feature in a core cookbook.

I agree, and after thinking more about your point, I think you could also express the compound API issue as a result of not having the ability to say "when etsy_service_bar API attribute qux changes, write out node['fb_foo']['xyzzy']". That is, the only reason etsy_service_bar wants to read qux lazily is that something later might want to change it, and so it needs to defer the read, but that either a) makes the node['fb_foo']['xyzzy'] attribute write impossible due to ordering or b) makes it so that no one else can override the node['fb_foo']['xyzzy'] attribute.

If etsy_service_bar could instead re-compute its changes to node['fb_foo']['xyzzy'] attribute when qux changes, I think that'd follow both the letter and the spirit of "last-write-wins" - etsy_service_womp could write to either node['etsy_service_bar']['qux'] or directly to node['fb_foo']['xyzzy'] last, and it would win. If a etsy_service_kind_of_womp cookbook that included etsy_service_womp came long, it could also make changes to either node['etsy_service_bar']['qux'] or node['fb_foo']['xyzzy'] to write over whatever etsy_service_womp (and etsy_service_bar) did.

In my earlier research on this topic I did find https://github.com/Annih/chef-updatable-attributes, which essentially is what I described above, but I'd have to think more about the approach and consider if there was a better way to implement it.

Maybe having a good implementation of something like this makes our discussion so far moot? Does this address a generic compound API interaction pattern and avoid the concerns you brought up?

And finally, while I stay heavily involved in this as the original author of most of this... I am not at FB/Meta anymore and don't have final say.

I appreciate the humility, but I definitely value your opinion! So thanks for giving this your consideration. (Also, apologies for this absolute unit of a comment.)

from chef-cookbooks.

jaymzh commented on April 28, 2024

Sure, fb_chronyd could expose a separate API with some sort of only_if thing for each server list entry, but I'm nervous about putting a burden on a fb cookbook developer of needing to think of every possible way someone could want to lazily configure the API.

Today every fb_cron job and fb_timers job has an only_if.

We could make all the attrs lazy, and I'm cool with that, but also, you could work around the lack of that but having a few entries with the appropriate only_if procs.

putting a burden on a fb cookbook developer of needing to think of every possible way someone could want to lazily configure the API.

This is a good point. One of the impetuses behind my design of the FB pattern was that we didn't want to have to think of all possible combinations. Resources, today, often do. They have awful 1-1 mappings of properties to item-in-config-file which constantly have to be kept in sync with the upstream software, and I just think that's The Wrong Way(TM).

So you're right, we don't want to go too far down that path. That's a super convincing argument.

I'll come back to this.

In my modified example - with just etsy_service_bar, and etsy_service_womp - I think of it the same way as you described for fb_swap. etsy_service_bar owns the foo cron, and nothing else does. If we want the foo cron to be further configurable, it should define an API for it, just like what fb_swap would do if it wanted to e.g. allow for customizing the priority of the swap file.

Welllllllll I don't think these are quite the same.

I'm not saying "swap owns this fstab entry" - I'm saying "swap owns swap, which has nothing to do with fstab, but fbstab happens to be the way it's configured, which is this weird side-effect of unix history"

When I wrote fb_swap originally I contemplated having an init script swapon it, and keeping it out of fb_fstab (and /etc/fstab) entirely, and then people either used fb_swap directly, or used fb_fstab to make swap, but not both. I was then going to have fb_fstab bail out if it had swap entries and it detected you were using fb_swap. This was all designed to keep these separate. I also toyed with fb_swap only creating swapfiles, and it was your job to put it in fb_fstab.

But ultimately since we owned init and could own the order, it made sense to just have one build on the other since I could control the runlist order and be done with it.

In a systemd world, I might use systemd_unit directly and bypass fb_fstab, but this was all written way before that. Or usr the generator thing I mention below.

You're example is different. You're not trying to own, say, swapiness, which by some weird coincidence happens to get configured through fb_sysctl.... you're just making a cron. So make the cron with some defaults, and let people fuck with it later.

Actually I have a better (maybe) solution for that that we have often used:

etsy_bar could have a Esty::Bar.gen_cronjobs() which etsy_womp can call, pass in whatever data it wants, and the crons get jammed into fb_cron. Depending on how you'd like it to work, each call would either overwrite all bar-related crons, or merge the new options into bar-related crons, or add to bar-related crons. This could also be configurable in the parameters to gen_cronjobs. This provides several benefits:

etsy_bar can provide it's own "API" for generating crons still
If there's 5 later cookbooks, they can all still continue to have last-writer-wins semantics either by touching fb_cron directly, or be using gen_cronjobs

We do that a lot, actually, and it works quite well (for many cases).

On the other hand, sometimes it's okay that something can't be overridden, like with fb_swap

The model states this should only ever be true with core cookbooks, and never ever outside of it. At least... that's the theory. :)

Sure, fb_chronyd could expose a separate API with some sort of only_if thing for each server list entry

Almost every time I've EVER made an array in an API I've regretted it. Here's another.

So I'd summarize the possible bad scenarios as:

Precisely. Well summarized! Much better than my summary, thanks.

So we would have an fb_memcached, for instance, that we could theoretically upstream, and then we'd have a stack_etsyweb_memcached, which "drives" the fb_memcached cookbook by configuring it in a way that we think is suitable for our monolith, and then including it.

This is exactly what we use fb_foo_settings for. And I think that's the better approach.

In fact, at one point I thought of making a run-list generator. You'd pass in something like:

runlist = %w{
fb_swap
fb_fstab
fb_cron
})

gen_runlist('etsy', runlist)

And it would do, roughly:

def gen_runlist(prefix, runlist)
  runlist.each do |item|
    _, name = item.split('_', 2)
    wrapper = "#{prefix}_#{name}_wrapper"
    if cookbook_exists?(wrapper)
      include_recipe wrapper
    end
    include_recipe item
  end
end

I actually think that's the right way to do (most of) this.

That solves the etsy_stack_memcached problem, but it doesn't solve the "etsy_unrelated_service wants to own some crons" problem. How much of each do you have?

"when etsy_service_bar API attribute qux changes, write out node['fb_foo']['xyzzy']".

Hehe, you want React, but for Chef. :)

OK coming back to your point about not having to know everything that someone might ever want to do, I agree.

My take is roughly:

Way more stuff should have only_if (like every virtualhost in fb_apache, for example). Where/when should be pretty clear.
Some more stuff should have lazy support, exactly where/how is unclear, but some if them will be obvious
We should make wrapper-cookbooks a 1st-class citizen with something like the gen_runlist above
We should better document/surface the generator-method-API pattern as a way of dealing with intersecting APIs

If we still have holes after that, we should do compound_api, but with handles to turn it off/whitelist it.

from chef-cookbooks.

jaymzh commented on April 28, 2024

Also, apologies for this absolute unit of a comment.)

No apologies necessary, I much prefer it, it helps ensure a productive discussion, rather than just "well no I need it!" - the more detail the easier it is to build solutions that work well in the long run.

from chef-cookbooks.

ericnorris commented on April 28, 2024

So make the cron with some defaults, and let people fuck with it later.

Yeah, I see your point, and that is why this whole pattern is pretty attractive - you can make all sorts of decisions but they don't end up getting set in stone, because you can always change it later without having to jump through hoops (like overrides or rewriting core cookbooks, etc.).

etsy_bar could have a Esty::Bar.gen_cronjobs() which etsy_womp can call, pass in whatever data it wants, and the crons get jammed into fb_cron.

I was this close to suggesting method calls that write to node attributes as a solution in my last message, and now I wish I had, haha.

That solves the etsy_stack_memcached problem, but it doesn't solve the "etsy_unrelated_service wants to own some crons" problem. How much of each do you have?

I think I get the gen_runlist implementation, but I don't follow how it solves the etsy_stack_memcached problem of defining a compound API, apologies. Do you happen to have a slightly more concrete example?

Hehe, you want React, but for Chef. :)

Ha, fair point. There's two reasons why I didn't end up suggesting a method call, which kinda ties together with how I arrived at a React-like suggestion:

A stack may be more than just writing fb attributes - it may include some one-off resources, etc., so it feels like it should be a cookbook with recipes and a library class, but not just a library class. I don't think you're suggesting this, but the point is that it's natural to write a cookbook here. So far, the way to drive a cookbook is to set attributes that it lazily reads, not to call a method it calls internally with different parameters.

More concretely, it feels inconsistent to say "include this stack_etsyweb_memcached cookbook, but instead of using node.default to configure the foo part of it like you would with other cookbooks, call Etsy::Stack::EtsywebMemcached.configure_foo() with different parameters".
A stack could have an attribute that it uses throughout, both in resources but also in compound API writes, and so you might have to both set an attribute and call one or methods - imagine if you had to set node.default['stack_etsyweb_memcached']['foo'] = 'bar' and also needed to call Etsy::Stack::EtsywebMemcached.configure_all('foo').

There's also a property of writing to a node attribute API that I think we could put aside for now but I want to mention anyways - if you write to an fb_foo cookbook that isn't in the run list, nothing happens. I'd guess that most of the time that's a bug, but it is kinda nice for a "cooperating but non-dependent" cookbook. On the other hand, calling a method like Etsy::Stack::EtsywebMemcached.configure_foo() would actually have an effect - it could write out a cron, for example. I think this is kind of niche though, so I'll move on.

Back to the original point: since it feels most natural to write a stack as a cookbook following the Facebook pattern, it sticks out that configuring the stack would require calling a method, instead of (or in addition to) changing an attribute. And so in the spirit of keeping attribute writes as the only API, being able to define some attributes in such a way that they immediately compute downstream attribute changes would allow for both writing to other APIs and to keep the last-write-wins mindset, without needing "overrides".

Entertaining this point just a little bit more, I could imagine something like:

# in cookbooks/stack_foo/attributes/default.rb

default['stack_foo'] = {
  'foo' => 'bar',
  'baz' => CompoundAPI.new(StackFoo::handleBaz, {
   'baz-sub-attribute' => 42,
    # ...
  }
}

...and so setting node.default['stack_foo']['baz']['baz-sub-attribute'] would get intercepted by StackFoo::handleBaz, which would handle writing all of the compound API interactions that depended on it. baz-sub-attribute could also be used lazily for things like resources, etc, and would work as it did before.

I can understand that this is getting into decently complicated territory, but I can't think of any other way to maintain the consistency of a last-write-wins, compile-time node attribute API. Open to other ideas though!

My take is roughly:

I think I agree (putting aside my conviction about React for Chef ™️), and one thing that I'd like to add is: we should have some way of helping devs write correct wrapper cookbooks with guardrails, e.g. static analysis.

We've attempted to do that at Etsy by using Rubocop to enforce that node attribute reads in cookbooks must be lazy, which works for "core" cookbooks when we can re-arrange the ordering, but unfortunately as this issue reveals this doesn't work in "wrapper" cookbooks. Otherwise though it's been great, since it helps to drill in the Facebook pattern to developers who aren't familiar with it, and of course even core maintainers can make mistakes.

So if there was any way to guide people to writing wrapper cookbooks correctly, whether the solution is moving stuff to method calls or using a magic reactive API, I think we'd be set. Maybe this can be done purely with Rubocop, or maybe there can be some sort of on-rails wrapper class / cookbook / whatever that a dev can extend?

If we still have holes after that, we should do compound_api, but with handles to turn it off/whitelist it.

I haven't fully thought this out yet, but if wrapper cookbooks have their own solution, it may be that the only time we need node.compound_api_interaction is for core cookbooks, to allow them to be ordered any way that is appropriate instead of by their compound API dependencies. At that point, you may say it's not necessary and careful ordering of the core cookbooks is the way to go, and I'd probably accept that even if I thought it was nice "sugar".

It is kind of a shame though; node.compound_api_interaction is a pretty simple lift, and it does immediately solve our problems, even if there's a chance it'd introduce new ones. I want to keep thinking on this to see if there's something else that could be done to maintain node.default as one way to configure everything, but for now a reactive API (in any form, not just literally the one I suggested) for compound interactions is the only thing that comes to mind.

from chef-cookbooks.

joshuamiller01 commented on April 28, 2024

What a great discussion!

Some lightly-organized thoughts on my part.

Some of the discussion here is conflating 'settings' cookbooks with 'wrapper' cookbooks. In this context - dealing with compound API problems - 'settings' cookbooks are an opportunity to tactically plug a particular compound API gap. This works for Meta internally - because we don't do 'wrapper' cookbooks. In this discussion, a 'wrapper' cookbook will write to a bunch of APIs, but which exposes its own API. Since this new API will be the thing driving all API settings, it turns every API attribute into a compound interaction. 'settings' cookbooks don't help here (unless you're rewriting every API value, in which case we're back where we started). If you imagine lots of wrapper cookbooks, after a bit you're writing Eric's original question: how do we generically make this work?

One answer is "don't make wrapper cookbooks", because they create this problem. But that will probably create lots of duplication / boilerplate / copy-paste, which also isn't great. Some of the discussion here is about using a library call to initialize node API values vs node.default. I respect that having two ways to do things is undesirable, but this approach is actually pretty great IMO: it leaves last-writer-wins intact, sidesteps the compound API issue (mostly), and gives the cookbook author a convenient hook to do richer validation.

# in a recipe for service foo
  #  setup() runs at compile, sanity checks requested values, writes to dozens of APIs, can set sane defaults
  Company::Service.setup(node, 'foo', 'foo-package', ....)
  # implementing custom resource which reads the actual values (at converge time)
  company_service 'foo' 
  
# in a later recipe, customize the setup via the normal `node.default` at compile time; last-writer-wins still works
  node.default['company_service']['foo'][API] = ...

sidesteps the compound API issue (mostly)

Of course, if any of the values you want to set are based on other APIs, you're back to the original problem. But (in our experience) this is something that can usually be one-off addressed by passing a Proc, or a settings cookbook, or one of the solutions already discussed.

Maybe this might work for you and be a way to try to avoid generally solving compound APIs.

Back on the topic of generally solving it through.

The problem with this, is it provides a generalized override mechanism.

Every solution on the table here involves a block of code which runs at converge time; they all are a generalized override mechanism. We have to assume good intent, and this is chef code; an uncaring / hostile chef author can always clobber their way into getting what they want. This discussion is about building a process end users can use to deal with compound API interactions (rather than a hammer to break last-writer-wins).

That said ... having an allowlist or some element of code review that forces somebody who actually understands chef and the API model to green-light these sounds reasonable to me. With enough users and time, uninformed copy-paste will creep in for no good reason, and if you had a lot of this, last-writer-wins would be severely degraded.

if you write to an fb_foo cookbook that isn't in the run list, nothing happens

This touches on some of the other problems with this API model (unrelated to compound APIs) - and is its own huge discourse, so - not touching that here.

re-compute its changes to node['fb_foo']['xyzzy'] attribute when qux changes

There are technically hooks in the event framework which could allow you to do this, but I advise against, for a couple reasons.

There's a gotcha in here, unrelated to the FB API model, but due to chef and how it does Immutable attributes. In short, any time you write to a node attribute top-level hash (say node.default['josh'] = ...), it drops the deep-merged cache which is what read operations actually read from. Then later, when something comes along and says to read via node['josh'][...], chef runs the deep-merge and rebuilds the cache. Depending on how large the hash is and how many levels of precedence are in play, this is a non-trivial amount of CPU overhead. As a result, any code which intermingles a read and a write into the same hash (technically Mash I think) results in chain drop, re-deep-merge, drop, re-deep-merge. See 45e9eae

Finally, the events hooks are a spot which is brittle and something like the attributes code is called real early, such that there's a very good chance you'll eventually break chef in ways chef can't fix itself.

from chef-cookbooks.

ericnorris commented on April 28, 2024

If you imagine lots of wrapper cookbooks, after a bit you're writing Eric's original question: how do we generically make this work?

I appreciate this framing, thank you!

# in a recipe for service foo

In your example, what if node.default['company_service']['foo'][API] had an attribute that Company::Service.setup needed as well (e.g. it was used for resources and also as a part of a compound API)? Is the idea that you'd need to call Company::Service.setup and write to node.default['company_service']['foo'][API] with consistent values? That's one part of my hesitation to (but not rejection of) going with method calls.

in a later recipe, customize the setup via the normal node.default at compile time; last-writer-wins still works

Somewhat repeating my last question, but I just want to confirm that if this service had a compound API, it would look like this to the caller:

node.default['company_service']['foo'][API] = ...

include 'service_foo'

# either:
Company::Service::Foo.setup('different-parameters')

# ...or:
Company::Service::Foo.set_up_one_part_of_foo('different-parameters')

So the service_foo recipe will call setup with some defaults, but the caller is expected to re-call Company::Service::Foo.setup('different-parameters'), or some other sub-methods, with the parameters they expect?

Again, not rejecting this, I just want to be extra clear about what this will look like for when I explain this to others.

Every solution on the table here involves a block of code which runs at converge time

A thought that keeps sticking out to me is that it's almost like I want a two-pass model for writing node attributes, e.g. the reference I was making to a "post-compile, pre-any-cookbook-convergence" phase in my original post.

That is, I suppose I want to be able to go through all of the attributes/default.rbs, and then all of the node.default writes in all of the recipes/default.rbs - this is a normal compile phase - and then I want to run through all of the node.default writes in all of the recipes/default.rbs one more time.

If that were possible, I believe this would correctly handle one layer of compound API writes, while still allowing for last-write wins, and without a convergence-time "override" - though of course you could still implement one for core cookbooks to enforce invariants.

This is because the underlying problem is that in a compound API I want to read my node attributes lazily so I can pick up on attribute writes that happen later in compile time, and if I move that to the convergence phase we have the problems we've discussed already - that it either doesn't work because it's too late, or it doesn't let someone else change what I write unless they too defer their writes until the convergence phase - but if you were to run through all of the node.default writes twice, you'd read those later compile time writes.

I think the flow would look like this for one of the examples we've discussed:

cookbooks are gathered, all attributes/default.rb values are merged
etsy_init compiles, includes fb_cron
etsy_service_bar compiles, writes to fb_cron at compile time based on the attributes/default.rb value of node['etsy_service_bar']['foo']
etsy_service_womp compiles, writes to node['etsy_service_bar']['foo'] and fb_cron for the foo cron, overwriting the foo cron's frequency
second "compile phase" starts - though note this isn't about the resources anymore
etsy_init compiles, includes fb_cron (nothing changes)
etsy_service_bar compiles, writes to fb_cron at compile time now based on the value of node['etsy_service_bar']['foo'] _that etsy_service_womp has written
etsy_service_womp compiles, writes to node['etsy_service_bar']['foo'] and fb_cron for the foo cron, overwriting the foo cron's frequency

This seems to me like this has all of the properties we want? Though maybe this would suffer from the performance problem you noted later of reading and writing top-level attributes.

That said ... having an allowlist or some element of code review that forces somebody who actually understands chef and the API model to green-light these sounds reasonable to me.

That makes sense to me too; I think ideally we'd be able to enforce "good" usage of a compound API with static analysis, but that may be impossible, and so failing that, requiring a code review would suffice. It'd be important to have good written documentation on when to reach for this though.

I'll note that we have a self-imposed incentive to avoid a centralized allow list since we rebuild disk images for every affected host, so e.g. making a change to etsy_init carries enough weight (visually, via CI checks) that it naturally feels better to keep a change scoped to only the things that truly need it. I'm not voting against a centralized allow list though, just noting how it would look for us.

This touches on some of the other problems with this API model (unrelated to compound APIs) - and is its own huge discourse, so - not touching that here.

Haha, yeah I totally understand, I just didn't want to let it go unstated :)

There's a gotcha in here, unrelated to the FB API model

This is new to me, thank you! I had seen a bit of the deep merge stuff when I was going through Chef internals, but I wasn't aware of the performance implications.

from chef-cookbooks.

jaymzh commented on April 28, 2024

I was this close to suggesting method calls that write to node attributes as a solution in my last message, and now I wish I had, haha.

I wouldn't have them write to the node. I'd have them return the data.

node.default['fb_cron']['jobs']['etsy_job'] = Etsy::Bar.gen_cronjobs()

Keep the write-to-the-node obvious. It's just a helper to generate it the right way.

I think I get the gen_runlist implementation, but I don't follow how it solves the etsy_stack_memcached problem of defining a compound API, apologies. Do you happen to have a slightly more concrete example?

Because it always makes sure the wrapper is before it, so it can define a whyrun_safe_ruby_block, just like any fb_foo_settings cookbook, which then uses the node-API for it (node['etsy_stack_memcached']) to populate memcached (node['fb_memcached']).

This has a few advantages: It's just anyone who can come along and add one of these "overriding" APIs, it has to be something specifically wrapping a core-cookbook, specifically named as such, and specially accepted by the core team. etsy_randomdude can't start overwriting fb_apache. There's a strict (implied) contract here that etsy_stack_memcached is our wrapper API to fb_memcached, and you should use that instead, here at Etsy.

A stack may be more than just writing fb attributes - it may include some one-off resources, etc., so it feels like it should be a cookbook with recipes and a library class, but not just a library class. I don't think you're suggesting this, but the point is that it's natural to write a cookbook here. So far, the way to drive a cookbook is to set attributes that it lazily reads, not to call a method it calls internally with different parameters.

I think you're misunderstanding my suggestion. Maybe.

For a random service, etsy_coolservice, you'd include the recipe as per normal. Perhaps it even populates a default cronjob. If you want to overwrite it, the README.md states:

## Customizing the refresher cron

The default recipe in this cookbook populates a cronjob to refresh metadata every few days. You may modify this by modifying `node['fb_cron']['jobs']['etsy_coolservice_refresher']` to update the `time` attribute.

However, since there are some other refresher options you may want to change, we have provided a method to generate the job for you with different options. You may do:

    node.default['fb_cron']['jobs']['etsy_coolservice_refresher'] = Etsy::CoolServices.gen_cron(options)

Where `options` are:
* `time` which can by `:hourly`, `:daily`, `:default`, `:weekly`, or a time string
* `overwrite` - defaults to true, but set to false if you want to merge new data

I WILL point out that this contrived example is bad because there's a WAY better way to do this.

The CORRECT way to do this is to have a much simpler cron job that reads a CONFIG file, where that config file is a template that is populated by the etsy_coolservice node attributes. In this way, people customize the behavior of the job itself via etsy_coolservice, and they only need to touch the cron itself to change the time, which they can do in a standard last-write fashion after etsy_coolservice::default populates a sane default. And that's how we envision all stackable services touching cron jobs. And how it works at FB (there's a base 'web' and then an 'internal' web and then various subcategories of 'internal' web, and they all do things like modify each other's crons/sysctls/etc.).

But it's a contrived example to show how one would use such a method.

A stack could have an attribute that it uses throughout, both in resources but also in compound API writes, and so you might have to both set an attribute and call one or methods - imagine if you had to set

In this case a compound API is probably the wrong call. It should write this to a config file, using a template, and then reference it there. The config will be correct because it's only written at runtime.

from chef-cookbooks.

ericnorris commented on April 28, 2024

Thank you both for your help so far. I've been thinking about our discussion over the past few days, and I wrote up a rough draft of what I think are the set of "rules" that are necessary to follow when a developer wants to dynamically modify a node attribute, so that I could share them with my colleagues.

It reads somewhat like a flowchart currently, and I may tweak the presentation a bit, but it currently is as follows:

Do you want to dynamically modify one or more cookbook API attributes based on an ohai (or some other static) node attribute?
- If so, this is okay.
- If not, this is what we refer to as a compound API. Continue reading.
Do you want to conditionally write to one or more cookbook API attributes?
- Does the cookbook API have an only_if sub-attribute?
  - If yes, use the only_if and pass in a Proc that checks your condition lazily.
  - If not, consider modifying the API in question to expose an only_if attribute that takes a Proc, and submitting that change internally and/or upstream to the Facebook chef-cookbooks repository. If that is not feasible, continue.
- Are you writing an fb cookbook that is used directly in etsy_init (i.e. a "core" cookbook)?
  - If so, you can use a whyrun_safe_ruby_block to write to the other cookbook API, but you must make sure that your cookbook is included before the cookbook that reads the attribute you are writing. Again, consider adding an only_if attribute to the API.
  - If not, continue.
- You now have two choices:
  - Consider breaking this into a separate cookbook that writes to the node attribute non-conditionally. Callers can then decide to include this cookbook or not, rather than tweaking an attribute.
  - Make this a method call.
Do you want to embed a cookbook API attribute inside of one or more other cookbook API attributes?
- Are you writing an fb cookbook that is used directly in etsy_init (i.e. a "core" cookbook)?
  - If so, you can use a whyrun_safe_ruby_block, but you must make sure that your cookbook is included before the cookbook that reads the attribute you are writing.
  - If not, continue.
- You now have two choices:
  - Consider making this a method call.
  - Move the embedded attribute out into a configuration file that you render separately, and have the thing you were trying to create (e.g. a cron) read that configuration file.

The "make this a method call" suggestion will need some examples fleshed out, like the Customizing the refresher cron example above, to help folks have concrete directions on how to structure their code.

When I discussed these rules internally, one of my colleagues pointed out that the "computed compound API" idea a.k.a. React for Chef, is much more attractive. I hate to keep bringing it up, but after thinking more about the code example I gave, I am not quite sure I've been convinced to reject it outright.

Going back to @joshuamiller01's comments about that:

As a result, any code which intermingles a read and a write into the same hash (technically Mash I think)

I think that, in my example, it would be a read from hash A, and a write to hash B. That is, you'd define your compound API in hash some_cookbook, and it'd intercept writes to node['some_cookbook']['some_compound_api'], and then write to node['fb_foo']['...'].

Finally, the events hooks are a spot which is brittle and something like the attributes code is called real early, such that there's a very good chance you'll eventually break chef in ways chef can't fix itself.

In the example I gave, Chef event hooks weren't needed. It's instead using a custom class as a node attribute, and so it can directly intercept writes via the []= method. I've tested this locally and Chef doesn't immediately explode when you do this, but I would be sympathetic to an argument that this is somewhat off the rails. Even then though, I'd want to discuss whether it was truly untenable, or if there was some other way to achieve it.

...

The rules that I have above, if they're even correct, are arguably pretty complicated for the average developer at Etsy to work with. It may be that this comes from a fundamentally different motivation that we have: our company is smaller and we expect a larger number of employees to write and maintain their own systems in Chef as a part of their responsibilities instead of having a core team that is large enough to handle writing fb cookbooks for everyone when they are needed. You can understand why I'm attracted to seeing if we could "cut through" these rules into something that works with (from our perspective, not necessarily yours) less mental overhead.

That said, it's possible that the answer is that we just have to follow these rules, and that's fine. Moving on to a few points from the last comment.

Keep the write-to-the-node obvious.

Ah, yeah, that makes sense.

There's a strict (implied) contract here that etsy_stack_memcached is our wrapper API to fb_memcached, and you should use that instead, here at Etsy.

I see, that also makes sense to me. That said, I think there's a slight miscommunication about our wrapper cookbooks on my part, and it may be that memcached is a bit of a bad example.

In the case of fb_apache, our plan so far wasn't to necessarily have one fb_apache wrapper, company-wide. Our plan was to have one wrapper cookbook per "thing", where thing might be "our monolith" or "a service". In the case of our monolith, we need to tweak it in a couple different ways depending on where we're running it, and (before we started run into this issue) each tweak felt natural to describe as a compound API interaction.

Furthermore, while we'd typically run only one stack on a host, we've also been thinking about our developer VM setup, where we might run multiple "stacks" on a single host. Since each stack originally was going to be self-contained, some of them may include fb_apache, which means only the first stack's compound writes to fb_apache would work - the latter stack's would be too late, as previously mentioned.

Regardless, we've been discussing your comments a great deal over the past few days and it does seem like it will be possible to do the above without compound APIs, but again, it feels somewhat less straightforward than we feel like it would be with just using compound APIs - if it were possible - in the first place.

The CORRECT way to do this is to have a much simpler cron job that reads a CONFIG file,

That's a good point, and it does address one need for having a compound API, so I've added that to the rules I shared in the beginning of this comment.

...

I suppose I could summarize this comment as: it still seems like it would be nice to tell cookbook API developers that they don't have to avoid compound APIs, and to tell cookbook API consumers that they only ever have to use node.default['some_attribute'] = 'something' to change a cookbook API. Although it hasn't been a problem for you both and your respective teams, maybe you would still agree that this would be more ideal?

Is there some way to think about the attribute read/write dependency graph that makes this possible, and avoids the downsides of just blindly overriding things at runtime?

from chef-cookbooks.

jaymzh commented on April 28, 2024

In the case of fb_apache, our plan so far wasn't to necessarily have one fb_apache wrapper, company-wide. Our plan was to have one wrapper cookbook per "thing", where thing might be "our monolith" or "a service". In the case of our monolith, we need to tweak it in a couple different ways depending on where we're running it, and (before we started run into this issue) each tweak felt natural to describe as a compound API interaction.

We do that a lot. But fb_apache should not be treated as a core cookbook. It's a service cookbook full stop. The way to do that is very different.

etsy_service_foo should include fb_apache itself. It now fully controls ordering and can do:

include_recipe 'fb_apache'

whyrun_safe_ruby_block "etsy_service_foo_fb_apache_wrapper" do
  block do
    ...
  end
end

Once again, there's an implicit, but specific, agreement here: I'm using the etsy_service_foo cookbook. It happens to use fb_apache internally, but that's irrelevant to me, I'm using the etsy_service_foo API. It turns out, as an advanced user, I can probably go touch fb_apache's nodes if it doesn't conflict with the way etsy_service_foo does, but at that point I'm in unsupported territory.

Service cookbooks should not be included in fb_init for precisely this reason, it handicaps people's ability to wrap them.

The more I have this discussion the less I'm seeing a good usecase for compound APIs and the more I'm growing convinced there's some documentation missing on the basic structuring of how stuff should work together.

In the cases I've run into that need this, usually I've found a few things to be true, that once I admitted, made me go another way:

I was avoiding fixing a service or script that needed a config file but didn't have one (like a cron that only accepts commandline args instead of reading a config file) and was trying to be lazy and build an API around it.
Something was in fb_init that shouldn't be, and couldn't be wrapped by folks
Something was not in fb_init that should be, and couldn't be leveraged by the core systems.

we expect a larger number of employees to write and maintain their own systems in Chef

To be clear, this is exactly how FB worked. The whole POINT of jamming the complexity into core cookbooks was because we expected every developer at FB to write their own cookbooks. The core team was 4 people and ONLY wrote core cookbooks, every SWE team wrote their own cookbooks for their own service (or PE/SRE team if they had one, but that was a small percentage).

from chef-cookbooks.

ericnorris commented on April 28, 2024

We do that a lot. But fb_apache should not be treated as a core cookbook. It's a service cookbook full stop. The way to do that is very different.

Ah, I'm somewhat confused here - I don't think I was saying that it should be a core cookbook. Your following etsy_service_foo example is exactly the kind of thing we were doing, but we ran into two problems:

If etsy_service_foo needed to tweak "core" APIs based on its own API, it couldn't (unless it was only_if-able).
If etsy_service_foo needed to tweak a "non-core" API (e.g. fb_apache) based on its own API, it could with careful ordering, but etsy_other_service_bar also wanted to do that, and a host wanted to run etsy_service_foo and etsy_other_service_bar, then only one of them would work.

...and so that leads us to the "rules" I discussed in my post above. That is, if you want to change other APIs, you can't, and you instead need to move the logic elsewhere, like method calls or config files.

It happens to use fb_apache internally, but that's irrelevant to me, I'm using the etsy_service_foo API

This is exactly the kind of mindset we wanted to have, and that's kind of why (2) could be an issue - if I'm running two services on one host, it should (if you agree that running two services on one host is defensible) "just work", but it turns out it wouldn't due to the cookbook's internals, which breaks the abstraction a bit.

more I'm growing convinced there's some documentation missing on the basic structuring of how stuff should work together.

I agree with this, and I'd be happy to help with that!

I also realize that a lot of this discussion probably feels like you're having to help us organize our code, and I want to stress that I appreciate it a great deal. I also think that it may illustrate my point that if we could just write things "naturally", like if we could write a cookbook that has an API and it may change other APIs and it actually worked in all situations, then the existing documentation would suffice; these cookbooks would be just like any other cookbook.

I'm sure if I gave you our web monolith's Chef code you could sort it out, and I'm sure we will be able to as well, but it's likely going to be a somewhat bespoke implementation rather than being just another API-driven cookbook, if that makes sense.

And this is the thing I think I'm having a hard time succinctly describing - that in our monolith we have many different vhosts that we need to turn on or off, or vhosts that we need to flavor slightly differently on some hosts, and many different crons that we may or may not run or will run with slightly different parameters, etc., and figuring out the right way to slice and dice these into config-readers or method calls, while still trying to present a high-level "include this cookbook to get a batteries-included monolith tweaked for you", is more difficult and "free-form" than just writing it as an API-driven cookbook.

from chef-cookbooks.

jaymzh commented on April 28, 2024

If etsy_service_foo needed to tweak "core" APIs based on its own API, it couldn't (unless it was only_if-able).

Do you have an example that's not cron? I think 95% of cases are solved by rendering a template, which the cron reads (and becomes an authoritative source, other things can read it too). Most problems are solved by "let chef resolve the proper config, write it out, and let everything consume it from there."

If etsy_service_foo needed to tweak a "non-core" API (e.g. fb_apache) based on its own API, it could with careful ordering, but etsy_other_service_bar also wanted to do that, and a host wanted to run etsy_service_foo and etsy_other_service_bar, then only one of them would work.

This is an interesting thought. So like, two services under different vhosts? I think this "just works" if they're both well behaved, but you have to pick which one wins. If your runlist is:

etsy_service_foo (includes fb_apache, setups up ruby_block)
etsy_service_bar (includes fb_apache, setups up ruby_block)

And they both only touch their own vhosts, then this all "just works".

If they touch some core configs, bar will win. Which is expected, it's later in the runlist.

That should be fine.

Maybe we should suss out a more specific example?

I also realize that a lot of this discussion probably feels like you're having to help us organize our code,

Not at all!! I'm stoked more people are using this. One problem at FB was a lot of this stuff was in my head for a long time, and people would present these problems and I'd go "ah here make a template" or "oh here, do X"... Naomi and Josh also became experts and together we tried to write it all down, but there's a lot more to document, I'm sure.

from chef-cookbooks.

ericnorris commented on April 28, 2024

If etsy_service_foo needed to tweak "core" APIs based on its own API, it couldn't (unless it was only_if-able).

Do you have an example that's not cron?

Ha, that's a good question, and I had to think about this a bit. I think I've seen this with:

Our own prometheus "shell exporter" cookbook, which sets up a templated systemd unit for invoking shell scripts that get turned into prometheus metrics (I think fb_systemd wasn't suitable because it didn't support template units?). We could probably solve that with only_if and expecting Chef::DelayedEvaluator for some of the attributes.
Our own filebeat cookbook and the fb_logrotate cookbook. I think concretely we ran into it in another "core" cookbook where we had an optional logfile attribute, and if it was specified we wanted to have that core cookbook set up the filebeat and logrotate configs for the file. I think that could be solved with only_if and Chef::DelayedEvaluator; we were able to solve it in this specific case via ordering since it was a core cookbook.
Similar to a cron, we saw something similar with fb_timers and wanting to optionally pass an argument to the timer job. I could see this as being fixed with Chef::DelayedEvaluator or a separate config file, though I'll note that it was a pretty simple job and adding a separate config reading thing to it does add some flavor of "toil" to writing those kinds of things - maybe there could be standard pattern for exposing lazy environment variables to crons and timers?
I wrote an etsy_ca_certificates cookbook and was about to run into an issue with another core cookbook that configured the host to trust a private CA; I could have solved this with ordering but I was somewhat dissatisfied with that as an answer, and that's part of the reason I came here.

While each of the above seems like it could be solved by either (1) adding laziness to core cookbooks, or (2) moving to a config file, or - for "core" cookbooks only - (3) ordering, I don't want to lose the point that this is hard to explain to developers, and it'd be nice if there was a way to have things "just work" with normal node.default writes, or something close to it (not my original node.compound_api proposal, considering your objections, but something else). That is, the "rules" that I shared a bit ago are a lot of overhead for both core and non-core cookbook devs, and I'd like it if we could engineer a solution that worked like normal pattern. The second proposal I had of using objects in the node API seems close to that, but maybe that is ultimately unworkable.

I think this "just works" if they're both well behaved, but you have to pick which one wins.

If I understand correctly, it's not about "which one wins", but which one works. In the following example:

etsy_service_foo (includes fb_apache, sets up ruby_block)
etsy_service_bar (includes fb_apache, sets up ruby_block)

I believe that etsy_service_bar's ruby_block would do nothing at all, because fb_apache already converged when etsy_service_foo converged, which happens before etsy_service_bar converges.

That's essentially the other problem I'm grappling with; stacks are meant to be self-contained, but stacks with compound APIs don't play well with other stacks that use the same underlying API because of this. In the above example, we could solve this by forcing the runlist to look like:

etsy_service_foo (sets up ruby_block)
etsy_service_bar (sets up ruby_block)
fb_apache

and it would work, but we'd be giving up the self-contained nature of the cookbook, and it'd be yet another thing to explain to developers ("don't include the cookbook you're relying on, ask the caller to add it to their run list after your cookbook").

In the "rules" I posted this is solved by never having ruby_blocks, and forcing the developer to expose it as a method. I apologize for the repetition, but the theme I keep coming to in these past couple messages though is: it can be tough to keep these rules in mind and it feels somewhat less "natural" than the original node.default pattern, so can we fully rule-out any sort of technical solution that allows people to do something as close to the original node.default API? That is, is there any way to define a compound API that somehow happens at compile time, in order to preserve the last-write-wins principle while avoiding the downsides of node.override?

If we can confidently rule that out, then I think the approach of adding laziness to a few places in some of the core cookbooks, documenting that core cookbooks should expect laziness in a few "standard" ways (like accepting ~~objects~~ hashes instead of arrays, defaulting to exposing an only_if, etc.), and documenting what you should do if what you're doing with a core cookbook goes beyond those basic laziness options (e.g. make a method call), seems like the right answer and something we could start to help with.

from chef-cookbooks.

jaymzh commented on April 28, 2024

maybe there could be standard pattern for exposing lazy environment variables to crons and timers?

Oh, I like that idea a lot. config_file => {'format' => 'json|shell|yaml', path='...', data = Proc()}`

(shell would fail if data was more than 2 levels).

I believe that etsy_service_bar's ruby_block would do nothing at all, because fb_apache already converged when etsy_service_foo converged, which happens before etsy_service_bar converges.

Ugh, sorry you're right, the right way to do this is for these cookbooks to specify they expect to be included before fb_apache and that the user should include fb_apache after it and any other cookbooks that require it. This is what happens when you don't touch Chef for a few months.

Again, the point here is to put as much onus on the API-cookbook authors and less on the leaf cookbook authors so that $random_developer doesn't need to understand too any of this to setup an instance of service_foo.

and it would work, but we'd be giving up the self-contained nature of the cookbook, and it'd be yet another thing to explain to developers ("don't include the cookbook you're relying on, ask the caller to add it to their run list after your cookbook").

I think that's a pretty straight-forward thing to document, it's not "in this case this, in this case this, in this case this.."

The FB model is fundamentally backwards from the standard model. In the standard model, if you want to wrap a cookbook you set attrs, then you include it, our model is you include it, then you specify attrs. This isn't really different, it's just a different model and needs documentation.

We had a lot of such wrappers, but it was incredibly rare that people wanted to double-wrap a cookbook in a single use-case. In the very, very few cases we did, we'd end up with something like a ::base and a ::default recipe and ::base would be just the API , and ::default would include ::base and fb_apache, so that if people doing weird stuff had a mechanism, and people doing the standard stuff could use the default recipe. Had we had a setup more like yours we might have made a standard practice that "when wrapping an API, you should always split things into ::api ::dependents" and told everyone to do things like:

%w{api dependencies} |recipe|
  %w{etsy_service_foo etsy_service_bar}.do each |svc|
    include_recipe "#{svc}::#{recipe}"
  end
end

And in fact, this coalesces pretty well with the fact that already today cookbooks don't depend on cookbooks in the fb_init runlist - fb_cron, fb_timers, fb_systctl, etc. - even though you use them, you don't declare dependencies on them OR include_recipe them. Same thing here "even though you wrap fb_apache, you don't include it, you document it for your users. It feels like it fits really well with the model, no?

from chef-cookbooks.

Compound API interactions for cookbooks included by `fb_init` are difficult to implement about chef-cookbooks HOT 25 OPEN

Comments (25)

Re: lazy stuff

fb_init Ordering

compound_interactions

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent