Giter VIP home page Giter VIP logo

Comments (34)

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Looks like rbd-nbd is being called with incompatible options.... perhaps my installation is messed up? I ran the script from the instructions....

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Hi,
Which Ceph version do you use?

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Tried with both jewel and luminous after uninstalling and reinstalling.... according to ceph docs there is no --name argument for rbd-nbd...

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Did you install any Ceph packages on your XS host before RBDSR?
Could you please send output of this command on XS host
yum list all | egrep "ceph|rbd"

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Well uninstall + reinstall appears to have left a mess.... I've removed the --name argument from the code and am attempting another migration ... it is running for longer than before...

ceph-common.x86_64                  1:12.0.3-0.el7          @ceph               
ceph-fuse.x86_64                    1:10.2.2-0.el7          @ceph-jewel         
libcephfs2.x86_64                   1:12.0.3-0.el7          @ceph               
librados2.x86_64                    1:12.0.3-0.el7          @ceph               
libradosstriper1.x86_64             1:12.0.3-0.el7          @ceph               
librbd1.x86_64                      1:12.0.3-0.el7          @ceph               
librgw2.x86_64                      1:12.0.3-0.el7          @ceph               
python-cephfs.x86_64                1:12.0.3-0.el7          @ceph               
python-rados.x86_64                 1:12.0.3-0.el7          @ceph               
python-rbd.x86_64                   1:12.0.3-0.el7          @ceph               
python-rgw.x86_64                   1:12.0.3-0.el7          @ceph               
rbd-fuse.x86_64                     1:12.0.3-0.el7          @ceph               
rbd-nbd.x86_64                      1:12.0.3-0.el7          @ceph               
ceph.x86_64                         1:12.0.3-0.el7          ceph                
ceph-base.x86_64                    1:12.0.3-0.el7          ceph                
ceph-debuginfo.x86_64               1:12.0.3-0.el7          ceph                
ceph-deploy.noarch                  1.5.37-0                ceph-noarch         
ceph-devel-compat.x86_64            1:10.2.7-0.el7          ceph-jewel          
ceph-fuse.x86_64                    1:12.0.3-0.el7          ceph                
ceph-libs-compat.x86_64             1:10.2.7-0.el7          ceph-jewel          
ceph-mds.x86_64                     1:12.0.3-0.el7          ceph                
ceph-mgr.x86_64                     1:12.0.3-0.el7          ceph                
ceph-mon.x86_64                     1:12.0.3-0.el7          ceph                
ceph-osd.x86_64                     1:12.0.3-0.el7          ceph                
ceph-radosgw.x86_64                 1:12.0.3-0.el7          ceph                
ceph-release.noarch                 1-1.el7                 ceph-noarch         
ceph-resource-agents.x86_64         1:12.0.3-0.el7          ceph                
ceph-selinux.x86_64                 1:12.0.3-0.el7          ceph                
ceph-test.x86_64                    1:12.0.3-0.el7          ceph                
cephfs-java.x86_64                  1:12.0.3-0.el7          ceph                
libcephfs-devel.x86_64              1:12.0.3-0.el7          ceph                
libcephfs1.x86_64                   1:10.2.7-0.el7          ceph-jewel          
libcephfs1-devel.x86_64             1:10.2.7-0.el7          ceph-jewel          
libcephfs_jni-devel.x86_64          1:12.0.3-0.el7          ceph                
libcephfs_jni1.x86_64               1:12.0.3-0.el7          ceph                
libcephfs_jni1-devel.x86_64         1:10.2.7-0.el7          ceph-jewel          
librados-devel.x86_64               1:12.0.3-0.el7          ceph                
librados2-devel.x86_64              1:10.2.7-0.el7          ceph-jewel          
libradosstriper-devel.x86_64        1:12.0.3-0.el7          ceph                
libradosstriper1-devel.x86_64       1:10.2.7-0.el7          ceph-jewel          
librbd-devel.x86_64                 1:12.0.3-0.el7          ceph                
librbd1-devel.x86_64                1:10.2.7-0.el7          ceph-jewel          
librgw-devel.x86_64                 1:12.0.3-0.el7          ceph                
librgw2-devel.x86_64                1:10.2.7-0.el7          ceph-jewel          
python-ceph-compat.x86_64           1:12.0.3-0.el7          ceph                
python34-ceph-argparse.x86_64       1:12.0.3-0.el7          ceph                
python34-cephfs.x86_64              1:12.0.3-0.el7          ceph                
python34-rados.x86_64               1:12.0.3-0.el7          ceph                
python34-rbd.x86_64                 1:12.0.3-0.el7          ceph                
python34-rgw.x86_64                 1:12.0.3-0.el7          ceph                
rados-objclass-devel.x86_64         1:12.0.3-0.el7          ceph                
radosgw-agent.noarch                1.2.7-0.el7             ceph-noarch         
rbd-mirror.x86_64                   1:12.0.3-0.el7          ceph                

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

And to your first question, I had been using another RBDSR implementation in the past, and had ceph jewel common installed from that implementation. Now it looks like rbd-nbd is grabbing the luminous version, which appears not to have the --name argument to rbd-nbd....

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Please send the output of these commands
rbd-nbd --version
rbd-nbd --help

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

well, this is odd....

[root@xen6 ~]# rbd-nbd --version
rbd-nbd: unknown command: --version


[root@xen6 ~]# rbd-nbd --help
Usage: rbd-nbd [options] map <image-or-snap-spec>  Map an image to nbd device
               unmap <device path>                 Unmap nbd device
               list-mapped                         List mapped nbd devices
Options:
  --device <device path>  Specify nbd device path
  --read-only             Map read-only
  --nbds_max <limit>      Override for module param nbds_max
  --max_part <limit>      Override for module param max_part
  --exclusive             Forbid writes by other clients

  --conf/-c FILE    read configuration from the given configuration file
  --id/-i ID        set ID portion of my name
  --name/-n TYPE.ID set name
  --cluster NAME    set cluster name (default: ceph)
  --setuser USER    set uid to user or uid (and gid to user's gid)
  --setgroup GROUP  set gid to group or gid
  --version         show version and quit

  -d                run in foreground, log to stderr.
  -f                run in foreground, log to usual location.
  --debug_ms N      set message debug level (e.g. 1)
[root@xen6 ~]# 

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Something is wrong with rbd-nbd after 12.0.0 version.
Try to install version 12.0.0
yum list installed | egrep "ceph|rbd" | awk '{print $1}' | xargs -l1 yum erase -y
yum install -x librados2-12.0.3 -x libradosstriper1-12.0.3 -x librados2-12.0.2 -x libradosstriper1-12.0.2 -x librados2-12.0.1 -x libradosstriper1-12.0.1 ceph-common-12.0.0 rbd-nbd-12.0.0 rbd-fuse-12.0.0

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

That failed with a segfault in rbd-nbd ....

May 19 15:40:06 xen5 SM: [5886] Calling rbd/nbd map/unmap on host OpaqueRef:1d1e830b-2197-f11e-e450-49a9e3fdb9bf
May 19 15:40:06 xen5 SM: [6034] ['rbd-nbd', '--nbds_max', '64', 'map', 'RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-7e066ced-6df4-4244-b75c-3bf630aaa33e', '--name', 'client.admin']
May 19 15:40:13 xen5 SM: [6034] FAILED in util.pread: (rc 1) stdout: '', stderr: '*** Caught signal (Segmentation fault) **
May 19 15:40:13 xen5 SM: [6034]  in thread 7f45127fc700 thread_name:tp_librbd
May 19 15:40:13 xen5 SM: [6034]  ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
May 19 15:40:13 xen5 SM: [6034]  1: (()+0x22aff) [0x7f453766baff]
May 19 15:40:13 xen5 SM: [6034]  2: (()+0xf100) [0x7f452cccf100]
May 19 15:40:13 xen5 SM: [6034]  3: (()+0x103d48) [0x7f4537055d48]
May 19 15:40:13 xen5 SM: [6034]  4: (()+0x104596) [0x7f4537056596]
May 19 15:40:13 xen5 SM: [6034]  5: (()+0x1046db) [0x7f45370566db]
May 19 15:40:13 xen5 SM: [6034]  6: (()+0x6b334) [0x7f4536fbd334]
May 19 15:40:13 xen5 SM: [6034]  7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7f452e3cd669]
May 19 15:40:13 xen5 SM: [6034]  8: (ThreadPool::WorkThread::entry()+0x10) [0x7f452e3ce680]
May 19 15:40:13 xen5 SM: [6034]  9: (()+0x7dc5) [0x7f452ccc7dc5]
May 19 15:40:13 xen5 SM: [6034]  10: (clone()+0x6d) [0x7f452b99928d]
May 19 15:40:13 xen5 SM: [6034] 2017-05-19 15:40:06.735115 7f45127fc700 -1 *** Caught signal (Segmentation fault) **
May 19 15:40:13 xen5 SM: [6034]  in thread 7f45127fc700 thread_name:tp_librbd
May 19 15:40:13 xen5 SM: [6034]
May 19 15:40:13 xen5 SM: [6034]  ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
May 19 15:40:13 xen5 SM: [6034]  1: (()+0x22aff) [0x7f453766baff]
May 19 15:40:13 xen5 SM: [6034]  2: (()+0xf100) [0x7f452cccf100]
May 19 15:40:13 xen5 SM: [6034]  3: (()+0x103d48) [0x7f4537055d48]
May 19 15:40:13 xen5 SM: [6034]  4: (()+0x104596) [0x7f4537056596]
May 19 15:40:13 xen5 SM: [6034]  5: (()+0x1046db) [0x7f45370566db]
May 19 15:40:13 xen5 SM: [6034]  6: (()+0x6b334) [0x7f4536fbd334]
May 19 15:40:13 xen5 SM: [6034]  7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7f452e3cd669]
May 19 15:40:13 xen5 SM: [6034]  8: (ThreadPool::WorkThread::entry()+0x10) [0x7f452e3ce680]
May 19 15:40:13 xen5 SM: [6034]  9: (()+0x7dc5) [0x7f452ccc7dc5]
May 19 15:40:13 xen5 SM: [6034]  10: (clone()+0x6d) [0x7f452b99928d]
May 19 15:40:13 xen5 SM: [6034]  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 19 15:40:13 xen5 SM: [6034]
May 19 15:40:13 xen5 SM: [6034]      0> 2017-05-19 15:40:06.735115 7f45127fc700 -1 *** Caught signal (Segmentation fault) **
May 19 15:40:13 xen5 SM: [6034]  in thread 7f45127fc700 thread_name:tp_librbd
May 19 15:40:13 xen5 SM: [6034]
May 19 15:40:13 xen5 SM: [6034]  ceph version 12.0.0 (b7d9d6eb542e2b946ac778bd3a381ce466f60f6a)
May 19 15:40:13 xen5 SM: [6034]  1: (()+0x22aff) [0x7f453766baff]
May 19 15:40:13 xen5 SM: [6034]  2: (()+0xf100) [0x7f452cccf100]
May 19 15:40:13 xen5 SM: [6034]  3: (()+0x103d48) [0x7f4537055d48]
May 19 15:40:13 xen5 SM: [6034]  4: (()+0x104596) [0x7f4537056596]
May 19 15:40:13 xen5 SM: [6034]  5: (()+0x1046db) [0x7f45370566db]
May 19 15:40:13 xen5 SM: [6034]  6: (()+0x6b334) [0x7f4536fbd334]
May 19 15:40:13 xen5 SM: [6034]  7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb59) [0x7f452e3cd669]
May 19 15:40:13 xen5 SM: [6034]  8: (ThreadPool::WorkThread::entry()+0x10) [0x7f452e3ce680]
May 19 15:40:13 xen5 SM: [6034]  9: (()+0x7dc5) [0x7f452ccc7dc5]
May 19 15:40:13 xen5 SM: [6034]  10: (clone()+0x6d) [0x7f452b99928d]
May 19 15:40:13 xen5 SM: [6034]  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
May 19 15:40:13 xen5 SM: [6034]

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Check the directory /etc/ceph on your XS host.
Are there the files
ceph.client.admin.keyring
ceph.conf
in /etc/ceph?
Deinstallation of ceph packages removes these files so you need to copy them form your ceph node again.

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Those files were there, I caught that issue earlier... I've since removed all the ceph files on all 4 of my xenserver nodes, and installed jewel instead of luminous. I seem to be getting further along now, but, it seems there is some confusion within XAPI / and this plugin about what files are where. For example, the VDI I am trying to copy copies for about 20-30 minutes, then, the following looks to be the first error on xen3 (the master):

May 19 16:58:24 xen3 SM: [25176] ['dmsetup', 'reload', 'RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d-VHD-20f1d796-2631-49c6-a615-d98a045bb12f', '--table', '0 16777216 snapshot-merge /dev/nbd/RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-20f1d796-2631-49c6-
a615-d98a045bb12f /dev/nbd/RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-fa77e818-38c6-4125-b115-06ef3427bf21 P 1']
May 19 16:58:24 xen3 SM: [25176] FAILED in util.pread: (rc 1) stdout: '', stderr: 'device-mapper: reload ioctl on RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d-VHD-20f1d796-2631-49c6-a615-d98a045bb12f failed: No such file or directory
May 19 16:58:24 xen3 SM: [25176] Command failed

But, on xen4 (one of the 3 slaves in the pool) I see the symlink in question exists:

[root@xen4 yum.repos.d]# ls -Fal /dev/nbd/RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-fa77e818-38c6-4125-b115-06ef3427bf21  
lrwxrwxrwx 1 root root 9 May 19 16:58 /dev/nbd/RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-fa77e818-38c6-4125-b115-06ef3427bf21 -> /dev/nbd0

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

It looks like the RBD symlink is on xen4, but the device mapper symlink is on xen3 .... What would cause that?

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Could you send the whole SMlog file from xen3?
You can send it on email.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

and from xen4 too if it is possible

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

sent. It will likely be noisy as this xen cluster is live.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Could you please send the output of these commands from both xen3 and xen4
rbd-nbd list-mapped 2>/dev/null
ls -la /dev/nbd/RBD_XenStorage-*/

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

and
rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d
from any xen

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

and last question :-)
Could you please describe how do you make the migration step by step

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Due to the first error with rbd-nbd RBDSR could leave some garbage on your xen hosts like:

  • mapped rbd-nbd devices
  • during the migration it creates new vdi in Ceph but as we caught error these vdi will never be used but RBDSR didn't delete them.

so we need to clean up this garbage by hand.

I'm going to make a correct error handling in RBDSR, but I have not finished it yet so we need to clean up the garbage by hand in case of errors.

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

The steps I am taking to perform the migration are:

  1. Open XenCenter in a windows remote desktop session
  2. Select an inactive VM named 'repo-debian'
  3. Click on its 'Storage' tab
  4. select its only drive
  5. click on move
  6. select the RBD SR named 'CEPH RBD Storage'
  7. click 'move'
  8. wait 20-30 minutes for the error

xen3:

/dev/nbd0
/dev/nbd1
/dev/nbd2

total 0
drwx------ 2 root root 100 May 19 16:58 .
drwx------ 3 root root  60 May 19 10:15 ..
lrwxrwxrwx 1 root root   9 May 19 16:58 VHD-20f1d796-2631-49c6-a615-d98a045bb12f -> /dev/nbd2
lrwxrwxrwx 1 root root   9 May 19 13:16 VHD-c8e13139-b853-4d10-a671-50969ce667c2 -> /dev/nbd0
lrwxrwxrwx 1 root root   9 May 19 13:57 VHD-e2eff9ef-963d-40d7-a04f-9c51ac05882b -> /dev/nbd1

xen4:

/dev/nbd0
total 0
drwx------ 2 root root 60 May 19 16:58 .
drwx------ 3 root root 60 May 19 10:15 ..
lrwxrwxrwx 1 root root  9 May 19 16:58 VHD-fa77e818-38c6-4125-b115-06ef3427bf21 -> /dev/nbd0
[root@xen3 yum.repos.d]#  rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d
NAME                                                                                SIZE PARENT FMT PROT LOCK
VHD-20f1d796-2631-49c6-a615-d98a045bb12f                                           8192M          2          
VHD-447dee1a-69ac-49eb-8532-9ad080cb30a6                                           8192M          2          
VHD-447dee1a-69ac-49eb-8532-9ad080cb30a6@SNAP-813c753c-1302-458f-9cab-57528a53677a 8192M          2 yes      
VHD-76293a9c-42b8-472e-bc26-ca27e02e07e7                                           8192M          2          
VHD-76293a9c-42b8-472e-bc26-ca27e02e07e7@SNAP-1bab9367-d282-4d2b-ba88-080caf673484 8192M          2 yes      
VHD-81390dc5-5a43-4185-a1ab-d4fc240d2077                                           8192M          2          
VHD-81390dc5-5a43-4185-a1ab-d4fc240d2077@SNAP-1333eeac-e222-44d9-b48a-2157a1f59cc2 8192M          2 yes      
VHD-8983ad13-79ab-4c34-8bdd-9f5a9a7dd829                                           8192M          2          
VHD-8983ad13-79ab-4c34-8bdd-9f5a9a7dd829@SNAP-f3707bcd-01f6-4323-8433-e8c9df3a7454 8192M          2 yes      
VHD-8ddc3650-6c0e-4aaf-87e4-8b9ae3a8f6f0                                           8192M          2          
VHD-8ddc3650-6c0e-4aaf-87e4-8b9ae3a8f6f0@SNAP-8b673d04-f588-4f9e-a1dc-5cc0224bffc1 8192M          2 yes      
VHD-95cd38f9-1b78-48a5-8fcf-69d5c1a8e8b5                                           8192M          2          
VHD-95cd38f9-1b78-48a5-8fcf-69d5c1a8e8b5@SNAP-36cc884c-4ae0-45dd-90c0-94b85edd5021 8192M          2 yes      
VHD-c8e13139-b853-4d10-a671-50969ce667c2                                           8192M          2          
VHD-cfa8d5ff-8874-4a95-8cae-9f8677e5c07d                                           8192M          2          
VHD-cfa8d5ff-8874-4a95-8cae-9f8677e5c07d@SNAP-a6598219-da3f-4e94-b6b2-754702734ba8 8192M          2 yes      
VHD-dad4d982-a0ab-483a-b541-dd82fc5f0adc                                           8192M          2          
VHD-dad4d982-a0ab-483a-b541-dd82fc5f0adc@SNAP-ab6775ad-b936-4692-9568-9bf30fc7079d 8192M          2 yes      
VHD-e2eff9ef-963d-40d7-a04f-9c51ac05882b                                           8192M          2          
VHD-fa77e818-38c6-4125-b115-06ef3427bf21                                           8192M          2          
VHD-fa77e818-38c6-4125-b115-06ef3427bf21@SNAP-4dd55481-d034-482a-b3bf-9949bad869c3 8192M          2 yes      
VHD-fbf0b88d-a1ad-43c1-a69c-3f15f22e4fa1                                           8192M          2          
VHD-fbf0b88d-a1ad-43c1-a69c-3f15f22e4fa1@SNAP-af9dbbbb-112a-4503-ba61-ade8d5c66a02 8192M          2 yes

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Do I understand correctly that you haven't migrated any VM to RBDSR yet?

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

That is correct

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

I have attempted several times. But, none have completed successfully.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Ok. Let's clean up it.
On all you XS hosts issue these commands:
rm -f /dev/nbd/RBD_XenStorage-RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-*
rm -f /run/sr-mount/f60dd3ac-50e9-4a27-8465-51374131de5d/*
rbd-nbd list-mapped 2>/dev/null | xargs -l1 rbd-nbd unmap
Now check that all rbd-nbd devices have been unmapped and all links removed
ls -la /dev/nbd/RBD_XenStorage-RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d/VHD-*
ls -la /run/sr-mount/f60dd3ac-50e9-4a27-8465-51374131de5d/*
rbd-nbd list-mapped 2>/dev/null

On pool master:
xe vdi-list sr-uuid=5aab7115-2d2c-466d-818c-909cff689467 | grep "^uuid" | awk '{print $5}' | xargs -I%% xe vdi-forget uuid=%%
rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d | grep VHD | awk '{print $1}' | grep SNAP | xargs -I%% rbd snap unprotect %% --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d
rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d | grep VHD | awk '{print $1}' | grep SNAP | xargs -I%% rbd snap rm %% --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d
rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d | grep VHD | awk '{print $1}' | xargs -I%% rbd rm %% --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d
Now check that all rbds have been deleted
rbd ls -l --pool RBD_XenStorage-f60dd3ac-50e9-4a27-8465-51374131de5d

As I remember you should have Ceph 12.0.0 installed at the moment, so you can try to migrate VM again.

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Well, the offline migration of the initial guest repo-debian worked. So, I then tried to live migrate 3 VM's... that did not work. I've just finished cleaning up after those failed attempts and have started another offline migration (repo-centos this time). I'll let you know how that turns out some time tomorrow. If it works, I'll try one live migration and report back on that as well.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Ok. Thank you.
Also please send SMlog and xensource.log in case of error. It difficult to understand what went wrong without these logfiles.

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

I'm working on cleaning up after the last attempt so I can run another test as I missed the SMlogs on the master as they've already rotated away. I'm encountering a problem unmapping /dev/nbd0 on xen5:

dmesg shows

[2494015.131967] block nbd0: NBD_DISCONNECT
[2494015.132034] block nbd0: Send control failed (result -32)

But there is no other message that I can see. I'll try running the test again, but I do not know if this residual nbd0 will corrupt the test.

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

attempt3.logs.tar.gz

Here are the logs in case the email didn't make it. xen3 is the master.

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Hi,
Please check the last version e8f51e9

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

I have been able to complete offline migrations. My master is presently stuck in maintenance mode, and I believe that is preventing live migrations from starting. I'll be working on sorting the master out this evening, so I should be able to test live migrations then. Thanks for all the help on this so far. Things look very promising so far!

from rbdsr.

rposudnevskiy avatar rposudnevskiy commented on August 10, 2024

Hi
Did you manage to test live migration?

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

Live migration did not work. I've attached the logs from the server (master) that live migration was attempted upon. The UI showed an error message like "VDI mirroring not available". The guest is a PV guest (debian 8.0) with xentools installed.
live_migration.zip

from rbdsr.

nate-byrnes avatar nate-byrnes commented on August 10, 2024

It looks like something has messed up my master's database.... after live migration failed, I tried to move a couple shutdown guest's VDI's (Which had worked before) and those also failed. Then I ran xe-toolstack-restart on the master for a reason I do not fully remember. Then I migrated a VM (on another SR) from one host to the master. The VM looks to be resident on the master, but it is not running in any way I can connect to... So I looked in the logs on the master and saw messages like:

May 29 13:37:18 xen3 xapi: [error|xen3|0 |creating storage D:82f45bcb8a02|xapi] Could not plug in pbd 'ada97dc5-922d-572d-4ea5-bcb59d5cc296': Server_error(SR_BACKEND_FAILURE_47, [ ; The SR is not available [opterr=The SR is not available [opterr=no such volume group: VG_XenStorage-be65790b-b601-6745-1b56-449d2077d301]];  ])
May 29 13:37:19 xen3 xapi: [error|xen3|567 |org.xen.xapi.xenops.classic events D:d4fc65707caf|xenops] xenopsd event: Caught Db_exn.DBCache_NotFound("missing row", "VM_guest_metrics", "OpaqueRef:NULL") while updating VM: has this VM been removed while this host is offline?
May 29 13:37:19 xen3 xapi: [error|xen3|582 |xapi events D:6407aadc70d8|xenops] events_from_xapi: missing from the cache: [ 2294786b-9dbf-4d27-9485-911845911ed9 ]
May 29 13:37:19 xen3 xapi: [error|xen3|563 ||xapi] Unexpected exception in message hook /opt/xensource/libexec/mail-alarm: INTERNAL_ERROR: [ Subprocess exited with unexpected code 1; stdout = [  ]; stderr = [ pool:other-config:mail-destination not specified#012 ] ]
May 29 13:37:19 xen3 xenopsd-xc: [error|xen3|76 |org.xen.xapi.xenops.classic events D:d4fc65707caf|memory] Failed to parse ionice result: unknown: prio 0
May 29 13:37:20 xen3 xenopsd-xc: [error|xen3|78 |org.xen.xapi.xenops.classic events D:d4fc65707caf|memory] Failed to parse ionice result: unknown: prio 0
May 29 13:37:20 xen3 xenopsd-xc: [error|xen3|79 |org.xen.xapi.xenops.classic events D:d4fc65707caf|memory] Failed to parse ionice result: unknown: prio 0
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|SR.add_to_other_config D:3af1c2eaecc2|sql] Duplicate key in set or map: table SR; field other_config; ref OpaqueRef:0049720b-7624-7846-86a0-699e0994e5a4; key dirty
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] SR.add_to_other_config D:3af1c2eaecc2 failed with exception Db_exn.Duplicate_key("SR", "other_config", "OpaqueRef:0049720b-7624-7846-86a0-699e0994e5a4", "dirty")
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] Raised Db_exn.Duplicate_key("SR", "other_config", "OpaqueRef:0049720b-7624-7846-86a0-699e0994e5a4", "dirty")
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 1/8 xapi @ xen3 Raised at file db_cache_impl.ml, line 265
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 2/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 22
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 3/8 xapi @ xen3 Called from file rbac.ml, line 236
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 4/8 xapi @ xen3 Called from file server_helpers.ml, line 72
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 5/8 xapi @ xen3 Called from file server_helpers.ml, line 90
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 6/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 22
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 7/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 26
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace] 8/8 xapi @ xen3 Called from file lib/backtrace.ml, line 176
May 29 13:37:20 xen3 xapi: [error|xen3|660 UNIX /var/lib/xcp/xapi|dispatch:SR.add_to_other_config D:6919c9f82d96|backtrace]
May 29 13:37:20 xen3 xenopsd-xc: [error|xen3|81 |org.xen.xapi.xenops.classic events D:d4fc65707caf|memory] Failed to parse ionice result: unknown: prio 0
May 29 13:37:20 xen3 xenopsd-xc: [error|xen3|82 |org.xen.xapi.xenops.classic events D:d4fc65707caf|memory] Failed to parse ionice result: unknown: prio 0
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] sm_exec D:ab19458b3dc3 failed with exception Storage_interface.Backend_error(_)
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] Raised Storage_interface.Backend_error(_)
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 1/8 xapi @ xen3 Raised at file sm_exec.ml, line 215
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 2/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 22
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 3/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 26
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 4/8 xapi @ xen3 Called from file server_helpers.ml, line 72
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 5/8 xapi @ xen3 Called from file server_helpers.ml, line 90
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 6/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 22
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 7/8 xapi @ xen3 Called from file lib/pervasiveext.ml, line 26
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace] 8/8 xapi @ xen3 Called from file lib/backtrace.ml, line 176
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|backtrace]
May 29 13:37:35 xen3 xapi: [error|xen3|613 |SR.attach D:35bfbf947b50|storage_access] SR.attach failed SR:OpaqueRef:0049720b-7624-7846-86a0-699e0994e5a4 error:INTERNAL_ERROR: [ Storage_interface.Backend_error(_) ]
May 29 13:37:35 xen3 xapi: [error|xen3|613 ||backtrace] SR.attach D:35bfbf947b50 failed with exception Storage_interface.Backend_error(_)
May 29 13:37:35 xen3 xapi: [error|xen3|613 ||backtrace] Raised Storage_interface.Backend_error(_)
May 29 13:37:35 xen3 xapi: [error|xen3|613 ||backtrace] 1/1 xapi @ xen3 Raised at file (Thread 613 has no backtrace table. Was with_backtraces called?, line 0
May 29 13:37:35 xen3 xapi: [error|xen3|613 ||backtrace]
May 29 13:37:35 xen3 xapi: [error|xen3|763 UNIX /var/lib/xcp/xapi|SR.add_to_other_config D:af37b7e4bfe9|sql] Duplicate key in set or map: table SR; field other_config; ref OpaqueRef:b15c6403-a1c5-0f86-1a45-e07b5141a25e; key dirty

Full logs attached
dirty.zip

from rbdsr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.