Giter VIP home page Giter VIP logo

Comments (11)

Emmenemoi avatar Emmenemoi commented on August 10, 2024

Yes, this is the issue I was talking about and it puts VM in bad state.
Thanks for documenting.
I'll create a xenserver7 branch because (my first impression from yesterday) it seems SR+RBDSR are both adding the paused state which crash the snapshot process.
I'm not going to test on xenserver<7 and let other people check if the xs7 branch will be compatible with xs<7.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Hi,
This error occurs because we unmap image before making snapshot due to problem, described in issue #4. If we have only one host it works fine. But if we have more then one host the error occurs as the image mapped on host2 but snapshot operation is executed by pool master i.e. host1 and try to unmap image that already unmapped on host1.
We could do not unmap the image before making snapshot but if we use nbd and try to make a snapshot the nbd device hang.
So we should find a means how to unmap the image on the right host.

As workaround you can use FUSE mode as it doesn't use map/unmap

from rbdsr.

mhoffmann75 avatar mhoffmann75 commented on August 10, 2024

Hi,
I was not aware that pool master is actually taking the snapshots - but sounds logical. Do you know if it's necessary to map/unmap on host2 (VM running on host2) when host1 is pool master and creating the snapshot?
If not necessary then it would be enough for RDBSR to check for existence of the /dev/nbd/xxx device before unmapping.
If it's also hanging the nbd driver on host2 then we need a mechanism for RBDSR on pool master to contact RBDSR on the pool member(s). Makes it quite complex i guess.

Same situation for VDI cloning?

from rbdsr.

mhoffmann75 avatar mhoffmann75 commented on August 10, 2024

BTW: fuse mode is broken in latest build, too:

['rbd-fuse', '-p', 'RBD_XenStorage-ff12160f-ff09-40bb-a874-1366ad907f44', '/run/sr-mount/ff12160f-ff09-40bb-a874-1366ad907f44', '--name', 'client.admin']
FAILED in util.pread: (rc 1) stdout: '', stderr: 'fuse: invalid argument `client.admin'

Seems as if rdb-fuse doesn't like: --name client.admin

Pull in from my fork to fix.

from rbdsr.

Emmenemoi avatar Emmenemoi commented on August 10, 2024

I added the fallback to kernel mode if fuse mode and cephx is used (commit 6e5ed2d)

from rbdsr.

Emmenemoi avatar Emmenemoi commented on August 10, 2024

By the way: why unmap + remap if blktap is paused + unpaused?
Isn't it double "paused" of the FS ?

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

because mapped nbd device hangs if you try to make snapshot of the image
i don't know why it hangs but if you unmap before snapshot it doesn't hang

from rbdsr.

mhoffmann75 avatar mhoffmann75 commented on August 10, 2024

Has anyone actually tested if nbd does hang too on a mapped volume if snapshot is done on another host where the volume ist NOT mapped?

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Issue has been fixed.
copy ceph_plugin.py to /etc/xapi.d/plugins on each host in pool
rename to ceph_plugin and make it executable
It now calls rbd-nbd map/unmap on the right host

Please test it

from rbdsr.

mhoffmann75 avatar mhoffmann75 commented on August 10, 2024

Just tested it - look very promising. VM Snapshot and snapshot deletion always worked. Regardless of host running on or moving to another host. So i guess you really fixed this.
Well done!

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Ok. Thank you Martin.
I'm going to close the issue.

from rbdsr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.