Giter VIP home page Giter VIP logo

Comments (19)

alfredodeza avatar alfredodeza commented on June 15, 2024 1

@Asher256 ceph-volume will allow you to have as many OSDs within a single device, because it can consume logical volumes. You would need to add your devices to a volume group, and then create as many data LVs from there.

If you have more questions, please use the mailing list

http://lists.ceph.com/listinfo.cgi/ceph-ansible-ceph.com

from ceph-ansible.

srgvg avatar srgvg commented on June 15, 2024

As long as the user has the ability to parametrize/configure it to its own liking.

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

This is more complex than I thought, for multiple reasons:

  • gdisk on precise doesn't support partitions on lvm (using saucy repo just works fine)
  • udev doesn't create any links in /dev/disk/by-partuuid/ while using a partition on a LV

Possible (complex) workaround:

  1. temps mount the device and get its osd number
  2. link the journal target link to the lvm partition 1
  3. set mod 777 on the lvm partition 1
  4. ceph-disk activate /dev/sd?1

from ceph-ansible.

ivomarino avatar ivomarino commented on June 15, 2024

I have a similar situation:

  • 1TB disk
    • /dev/sda1 (/dev/md0 == /boot)
    • /dev/sda2
    • /dev/sda3
    • /dev/sda4
    • /dev/sda5
    • /dev/sda6
    • /dev/sda7
    • /dev/sda8 (/dev/md1 == /)
  • 1TB disk
    • /dev/sdb1 (/dev/md0 == /boot)
    • /dev/sdb2
    • /dev/sdb3
    • /dev/sdb4
    • /dev/sdb5
    • /dev/sdb6
    • /dev/sdb7
    • /dev/sdb8 (/dev/md1 == /)

then I have one SSD for journals. What I would like to do is to use /dev/sda2 to /dev/sda7 (respectively /dev/sdbX) for OSDs, is this possible without erasing all data on /dev/sda due to ceph zap-disk (where also the OS resides)? If not what about eventually using LVM partitions build up on top of /dev/sda2 to /dev/sda7 for the OSDs? Thanks in advance for support.

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

AFAIR ceph-disk supports partition (so do we). So if /dev/sda2 to /dev/sda7 are partitions this should be good. The zap-disk is disabled by default so no worries.
If you give a partition though, the journal will end up being a file on the OSD data filesystem.

from ceph-ansible.

ivomarino avatar ivomarino commented on June 15, 2024

LVM disks are not supported or am I wrong? Tried to use a LVM disk as OSD and SSD disk as journal, zap disk on, SSD was formatted it seems but I got an error for LVM due to some overlapping partition table issues.

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

LVM disks are not supported by ceph-disk correct. I thought your issue was on the partition side.
I haven't seen any update for this, so you'll have to fallback to my workaround.

from ceph-ansible.

ivomarino avatar ivomarino commented on June 15, 2024

Your workaround means using partitions?

Strange thing when using parittions is that sda2 which is the first one always throws an error:
Warning! Secondary partition table overlaps the last partition by 33 blocks!

When I drop out /dev/sda2 /dev/sdb2 I get this error on sda3/sdb3 strangely, so the issue shifts up the OSD chain basically. Probably sda disk must have GPT and not MBR or similar.

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

Nop I meant this fix: #9 (comment)

from ceph-ansible.

ivomarino avatar ivomarino commented on June 15, 2024

Allright, thanks. I think I'll stick with the partition solution, hopefully fixing "Warning! Secondary partition table overlaps the last partition by 33 blocks!" on /dev/sda2, etc.

from ceph-ansible.

alsotoes avatar alsotoes commented on June 15, 2024

I have to use LVM because the SCSI driver used by my PCIe flash card supports only 15 partitions, and the solution after create all using ceph-deploy osd create was this code in a for on my shell script.
My journal partitons are like:
/dev/mapper/vg_journal-lv_journal1p1
/dev/mapper/vg_journal-lv_journal2p1
etc...

And mi osd partitions are like:
/dev/sdb1
/dev/sdc1
etc...

#!/bin/sh

SERVER=$1
PARTITIONS="b c d e f g h i j k l m n o p q r s t u v w"
PIVOT="/mnt/osd-pivot"
AUX=1 

ssh ${SERVER} "mkdir -p ${PIVOT}"
for i in $PARTITIONS; do
    ceph-deploy --overwrite-conf osd --zap-disk create $SERVER:sd$i:/dev/mapper/vg_journal-lv_journal$AUX
    ssh ${SERVER} mount /dev/sd${i}1 ${PIVOT}

    LVM_UUID=$(ssh ${SERVER} ls -l ${PIVOT}/journal | awk '{split($0,array," ")} END{print array[11]}' | sed  's|/dev/disk/by-partuuid/||g')
    OSD_ID=$(ssh ${SERVER} cat ${PIVOT}/whoami)

    ssh ${SERVER} umount ${PIVOT}
    ssh ${SERVER} "mkdir -p /var/lib/ceph/osd/ceph-${OSD_ID}"
    ssh ${SERVER} "rm -f /dev/disk/by-partuuid/${LVM_UUID}"
    ssh ${SERVER} "cd /dev/disk/by-partuuid/ ; ln -s /dev/mapper/vg_journal-lv_journal${AUX}p1 ${LVM_UUID}"    
    echo osd.${OSD_ID} ready using journal /dev/disk/by-partuuid/${LVM_UUID}
    AUX=$(( $AUX + 1 ))
done
ssh "${SERVER} rm -rf ${PIVOT}"

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

@alsotoes Thanks for sharing, I'll see how we can incorporate this.

from ceph-ansible.

socketpair avatar socketpair commented on June 15, 2024

http://tracker.ceph.com/issues/6042

from ceph-ansible.

Asher256 avatar Asher256 commented on June 15, 2024

@alsotoes: why did you choose LVM instead of mdadm (e.g. software RAID 0 or 1) for your PCIe Flash Cards? Flexibility? Is it because LVM does not write the complete partition during the initialization to ensure checksums operate properly (which can lead to faster degradation for SSDs)? Other advantages?

@leseb: about your comment "udev doesn't create any links in /dev/disk/by-partuuid/ while using a partition on a LV". I confirm. That's why this issue happened #2209 . I solved it by setting a udev rule that creates a symlink '/dev/disk/by-partuuid/XXX' for each /dev/md* partitions.

Questions to @leseb:

  1. Do you have an idea why there is no udev rule that creates '/dev/disk/by-partuuid/XXX' by default for '/dev/md*' partitions?
  2. According to your experience, what do you think about using a software RAID 1 (with mdadm) for Ceph OSD journals (in our configuration: 24 OSDs using 24HDD for data + RAID-1 of 2 PCIe SSD for OSD journals and a ceph-ansible's scenario = non-collocated)?

(In my last benchmarks, the only drawback we had with software RAID 1 was 4x more latency for OSD journals writing. The write speed in K/sec was the same)

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

@Asher256

  1. I think this a udev/kernel decision.
  2. I wouldn't use a RAID 1 for the journal as SSD in RAID1 tends to fail not simultaneously but the time between both is pretty low. So this a more dangerous than 2 separate SSD for journal.

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

I'm closing this as ceph-volume which is introduced in Luminous is going to replace ceph-disk and will play nice with lvm.

from ceph-ansible.

Asher256 avatar Asher256 commented on June 15, 2024

Thanks for the answer.

@leseb: question related to previous comments: what would you recommend us to do to solve the limitation of 15 partitions per PCIe SSD? (we need to create 48 partitions in each one of our PCIe SSD for the Ceph OSD journal)

RAID-0 (mdadm) for ceph OSD journals?
LVM for Ceph OSD journals?
Other recommendation?

from ceph-ansible.

leseb avatar leseb commented on June 15, 2024

First, why so many partitions per SSD?

from ceph-ansible.

Asher256 avatar Asher256 commented on June 15, 2024

@leseb:

First, why so many partitions per SSD?

Our setup:

  • 6 servers (24 HDDs for OSDs data / 2 PCIe SSD OSD journals)
  • Ceph-ansible scenario: non-collocated
  • Ceph-ansible filestore: bluestore

Because we use filestore=bluestore, ceph-ansible creates 2 partitions for each OSD in the PCIe SSD (2x24 HDDs = 48 partitions in the 2 PCIe SSD disks).

This is my ceph-ansible configuration: #2209 (the issue #2209 is still happening BTW. If you have any suggestion or idea to solve it, don't hesitate to comment!). Thank you in advance Sébastien.

from ceph-ansible.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.