Giter VIP home page Giter VIP logo

Comments (5)

avikam avatar avikam commented on May 29, 2024

i think the problem is coming from confirm_spu(). It actually confirmed more SPUs (based on the admin interface) than needed and therefore doesn't leave.

if ready_spu /* ~admin.all().filter() */ == spu /* cli request */ as usize {
    ...
    return Ok(()); // never happens 
}

In fact, the bug reproduces whenever start is called with less SPUs and doesn't with greater or equal SPUs, than the previously specified under start.

Admin is listing SPUs from the metadata directory. The bug is also fixed if the spec files are deleted between different starts. I think it makes sense for shutdown to delete the SPU spec files, since the SPU processes are killed anyway. wdyt?

Here is an example of the above:
#3901

from fluvio.

digikata avatar digikata commented on May 29, 2024

Hi @avikam, that's a good analysis of what is causing the immediate problem. But fluvio cluster delete is what should delete spec files and delete data. The root issue is that we should really differentiate a couple of different command pairs for a cluster running in local mode.

fluvio cluster start|delete

  • start: creates creates a new cluster, but should fail if there is already a cluster running or in a shutdown state
  • delete: this is what should delete local mode spec files

To align with other parts of the cli, start|delete should maybe actually be create|delete (@sehz @nacardin what do you think?)

fluvio cluster shutdown|resume

  • shutdown shuts down running processes but should save data and any configuration needed to resume running
  • resume (new command for this issue), restarts with previous configuration, should not accept configuration flags, but
  • shutdown|resume should only be meaningful for a cluster in 'local' mode. It is pretty much meaningless at the moment for K8 fluvio clusters.

The smallest increment of work to progress this issue might be to add the logic to return an error and refuse to start if a fluvio cluster start is attempted when there are any local spu spec files.

from fluvio.

avikam avatar avikam commented on May 29, 2024

Yeah makes sense!
The reasons I considered this approach are that it looks like it's closer to the local k8s behavior (deleting the CRDs) and it maintains the simple start/shutdown interface. It's also consistent with the rest of the setup types (that don't have 'resume')
But def see the advantage of resuming already defined SPUs.

from fluvio.

sehz avatar sehz commented on May 29, 2024

Consider this issue before: #1672.

Complexity is that each infrastructure types requires different type to save system state. Personal cluster behave different than shared cluster. For personal cluster, it's ok to save in the file but shared cluster like K8 will need to behave differently.

Cleaner solution is to abstract out Cluster management further delegate to implementation specific behavior.

from fluvio.

digikata avatar digikata commented on May 29, 2024

Fully implemented, thanks to the work of @avikam!

from fluvio.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.