Giter VIP home page Giter VIP logo

Comments (7)

chogata avatar chogata commented on June 1, 2024

MooseFS should be deleting chunks even when in high speed rebalance. Possible reasons for not deleting chunks in the system include:

  • a disconnected chunk server in maintenance mode
  • a connecting/disconnecting chunk server (still in registration/de-registration of chunks phase - be aware de-registration is done by the master, so even if the chunk server process is already shut down, it might take the master a couple of minutes to finish, if the chunk server had a lot of chunks to begin with)
  • operation limit is reached: either deletion limit is set very low and you don't see the effect of deleting chunks as new chunks ready for deletion appear, or general number of other jobs is so high, that there are not enough resources for deletions (deletions are quite low on the priority scale)

Are you sure none of the above applies in your case? If you are, please tell us the version of MooseFS you are using and any config settings that differ from the defaults (you can omit custom instance name and custom pathnames).

from moosefs.

onlyjob avatar onlyjob commented on June 1, 2024

Yes, absolutely sure. Latest MooseFS release (3.0.117). One disk marked for rebalance < in preparation for its removal (to move its chunks to other disks).

from moosefs.

chogata avatar chogata commented on June 1, 2024

The one disk with < is in the same chunk server that has high speed rebalance on or in another one? You wrote "A chunk server is not deleting", but just making sure: you have only one chunk server in high speed rebalance? Or more? If yes, how many? And how many chunk servers in total in this instance?

I want to re-create your setup in our lab and run tests.

from moosefs.

onlyjob avatar onlyjob commented on June 1, 2024

The same chunkserver, obviously. Only one chunkserver is in high-speed rebalance -- the very chunkserver that is not deleting chunks (until rebalance is finished).

A dozen chunkservers total, but only one is in active high-speed replication mode, because one of its HDDs is marked < to empty it by relocating its chunks to other HDDs, altogether with HDD_HIGH_SPEED_REBALANCE_LIMIT = 3.

Total number of chunkservers hardly matters, as long as there is more than one...

A particular chunkserver is busy with load highlighted with <N> in "Servers" view, as well as corresponding "Server Chart" indicating internal high-speed rebalance.

from moosefs.

inkdot7 avatar inkdot7 commented on June 1, 2024

Hi, a just barely related question:

If one HDD (or more) is marked < to empty them, why is that particular chunkserver involved in having more traffic than other? I would have assumed that MooseFS would copy the chunks from other replicas on other chunkservers, to parallelise the rebalance operation to finish soon?

from moosefs.

chogata avatar chogata commented on June 1, 2024

@inkdot7 I'm not 100% sure I understand your question, but maybe this will explain: the master is not aware of chunk server's internal going-ons and that includes internal disk rebalance. For the master it's a normal chunk server. If a chunk server is in high speed rebalance mode and the high speed "tempo" is high, the chunk server may say to the master that it is overloaded and the master will try not to send it any tasks for a while.

The only thing master is aware of are marked for removal disks, but it's the * designation in mfshdd.cfg, not < or >. Marked for removal is different, in that the disk is considered damaged in some way and then the replications try not to use it at all i.e. if a chunk needs to be replicated, because it is on MFR disk, but a copy of this chunk also exists elsewhere, this other copy will be used as a source of replication.

MFR (*) is for replicating data on an endangered disk and the whole instance takes part in that, the disk itself is spared i/o whenever possible. Internal rebalance, whether "organic" or forced using < and/or > is done 100% internally and the rest of the system doesn't know and doesn't care about it.

from moosefs.

chogata avatar chogata commented on June 1, 2024

@onlyjob I tried to replicate your issue, but I can't, my instance deleted all the unnecessary chunks while one of the chunk servers was in high speed rebalance mode. I want to try to test it again on a larger scale (the instance that I used for testing was small and the internal rebalance on the one chunk server was completed in minutes), but I need to wait for completion of some other test we are currently running before I can try to do that.

from moosefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.