Giter VIP home page Giter VIP logo

shine's Introduction

Shine - Lustre file system administration utility

Requirements

Installation

When possible, please use the RPM distribution for an easy install. You can get it from http://github.com/cea-hpc/shine/releases/latest/.

Shine is based on ClusterShell, a python library easily available, from EPEL by example.

If you want to do it from source, type:

# python setup.py install

On RHEL-6 like systems, you may want to use the provided init script:

# cp /var/share/shine/shine.init.redhat /etc/rc.d/init.d/shine

Quick Start

Make sure Shine is installed on all nodes.

Edit the file /etc/shine/shine.conf and copy it on all nodes.

To create myfs Lustre file system, copy the provided file system model file:

# cd /etc/shine/models
# cp example.lmf myfs.lmf

Edit myfs.lmf to match your needs. This file describes the file system to be installed.

Install the file system with:

# shine install -m /etc/shine/models/myfs.lmf

Then format the file system with:

# shine format -f myfs

Start servers with:

# shine start -f myfs

Mount clients with:

# shine mount -f myfs

Testing code

If you modify Shine source code, do not forget to test it with the test suite available in tests/ directory of the source code.

python-nose is the recommended way to run the testsuite. However unittest provided with Python 2.7 and above should also works.

$ export PYTHONPATH=$PWD/lib
$ cd tests
$ nosetests -v <TESTFILE.PY>
$ nosetests -v --all-modules

Some tests expect being able to ssh into the current hostname without password, make sure ssh $HOSTNAME echo ok works.

Some tests needs to launch real Lustre commands and so needs to have root permissions. These tests will be skipped if you do not have these permissions.

# nosetests -v --all-modules

shine's People

Contributors

arno avatar btravouillon avatar degremont avatar kcgthb avatar thiell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

shine's Issues

Add "-w" as alias to "-n"

Hello,

It would be great to have "-w" as an alias on "-n" in order to get compatibility with clush arguments.

Best regards,
Xavier

Reported by: xdelaruelle

KeyError when doing Format

#
# Shine error report - 2009-08-07 13:23:49
#

Command was: 'shine format -f testloop -R'

Traceback (most recent call last):
  File "/cea/local/shine/trunk/lib/Shine/Controller.py", line 114, in run_command
    return self.cmds.execute(cmd_args)
  File "/cea/local/shine/trunk/lib/Shine/Commands/CommandRegistry.py", line 123, in execute
    rc = command.execute()
  File "/cea/local/shine/trunk/lib/Shine/Commands/Format.py", line 252, in execute
    rc = self.fs_status_to_rc(status)
  File "/cea/local/shine/trunk/lib/Shine/Commands/Format.py", line 192, in fs_status_to_rc
    return self.target_status_rc_map[status]
KeyError: 3

Exception: 3

It seems we need to look around INPROGRESS state not mapped...

Reported by: degremont

Manually manage Lustre module loading/unloading

Shine should handle module loading and unloading by itself, rather than just doing modprobe lustre or lustre_rmmod.

-For loading:
maybe here, modprobe lustre is sufficent, but testing returned code and some grep lustre /proc/modules or grep lustre /proc/filesystems could be done to be really sure lustre is supported, and properly raise and catch exception is needed.

-For unloading:
lustre_rmmod is buggy and often missed modules. Unloading could be done by hand, doing the proper rmmod.

Those kind of command should be run, once per server. So we must implement ServerActions. (see ticket #97)

Reported by: degremont

Refactore Shine.Commands code

Classes which inherits from FSLiveCommand could be refactored as they shared a lot of similar code.
This includes: Start, Stop, Format, Status.

Moreover, Mount and Umount are quite the same, except some 'u' :)
They could be refactored nicely also.

This could reduces the amount of code dramatically.

Reported by: degremont

AssertionError when doing 'shine status'

# shine status
<Shine.Lustre.Client.Client instance at 0x2ad547e69d88> inti104 (172.16.3.105@o2ib0)
Traceback (most recent call last):
 File "/usr/sbin/shine", line 72, in ?
   main()
 File "/usr/sbin/shine", line 62, in main
   rc = controller.run_command(sys.argv[1:])
 File "/usr/lib/python2.4/site-packages/Shine/Controller.py", line 94, in run_command
   return self.cmds.execute(cmd_args)
 File "/usr/lib/python2.4/site-packages/Shine/Commands/CommandRegistry.py", line 123, in execute
   rc = command.execute()
 File "/usr/lib/python2.4/site-packages/Shine/Commands/Status.py", line 160, in execute
   statusdict = fs.status(status_flags)
 File "/usr/lib/python2.4/site-packages/Shine/Lustre/FileSystem.py", line 597, in status
   assert target.state != None
AssertionError

Reported by: lucaspa

Index management in configuration files

If a model defines several targets some with index, some without index, it will fail, giving wrong indexes.

ost: dev=/dev/sdaa
ost: dev=/dev/sdag 
ost: dev=/dev/sdhj index=0 # I really this be the OST0000
ost: dev=/dev/sdag 
# other, I do not cares

This should be fixed in Shine.Lustre.Configuration.FileSystem._setup_target_devices() using a RangeSet. The full function should be rewrite to be cleaner.

Reported by: degremont

Remove MGS reference in Shine.Lustre.Disk

Shine.Lustre.Disk did not need a reference to any target name.

The MGS check in this class should be removed, and the verification should be, in fact, done in the function calling this code.
This means, the MGT Target should be careful to not check the device fsname as it could be different from its own fsname.

Reported by: degremont

External MGS support

Add support for external MGS.

A MGS could be declared 'external':
mode=external

and will be ignored when doing the filesystem management, because we could not warantee the MGS machine could be reached through SSH.

So Shine simply skip the MGS and considers it is ok.

Reported by: degremont

"shine status -L" is not so local

Shine status -L tells it did not check a lot of nodes, but it's normal, we explicit tell local mode (-L).

Resolution: I think, in local mode, it should only check target on this node.

# shine status -L
FILESYSTEM COMPONENTS STATUS (ptmp)
+-----+----+-----------------+-------------------------------+
|type | #  |     nodes       |            status             |
+-----+----+-----------------+-------------------------------+
|CLI  |129 |fortoy[7,32-159] |not checked (128), mounted (1) |
+-----+----+-----------------+-------------------------------+

Reported by: degremont

Don't speak of target "MGT MGS"

Command messages like

Starting of OST lustre-OST0000 (...)
Format of MGT MGS (/tmp/mgs)
...

defined in Shine.Commands.* Eventhandlers should be simplify to simple used the target label and device name. The Target type is already defined in target label. This will avoid some strange messages like MGT MGS"

Reported by: degremont

View value is tested too late

When doing shine status -V something, shine valid the View field '''after''' having doing the status everywhere.

The test should be do '''before''' that.

Reported by: degremont

Tuning is not apply for client

Clients are totally skip when handling Tuning.
Tuning code only handled target servers.

Shine silently ignore all client tuning configuration.

Reported by: degremont

Fsck support

shine fsck should be added to shine features.

This is not the Lustre lfsck feature, simply the e2fsck on each filesystem target.

Reported by: degremont

Better error handling for Tuning

Tuning code should be sligthly rewrite to be nearer from what is done in Shine code.

Server.tune() should be rewrite to behave like other actions.
A new Tune action should be written and used in Server.tune()
This will help error handling in proxy_error by example.

When tuning failed, we should have more error details. No more this:

$ ./shine tune -f dev -v -n fortoy150
Tuning filesystem dev...
Tuning of filesystem dev failed.

Reported by: degremont

Shine show info doesn't report any information

No code is provided for the Shine show info command.

Add some code to display file system configuration parameters:

  • name : FS name
  • mount path : FS mount location on client nodes
  • device path : device path to use to mount the FS on client
  • mount options : client mount options
  • quotas : FS quota parameters
  • stripping : FS stripping policy
  • tuning : path to the file that contains FS tuning parameters
  • description : FS description

Reported by: jfereyre

Exclude node flag

An exclude node flag should be added to all command.
-x
This flag excludes the listed nodes from the filesystem nodes and the specified nodes with -n.

Reported by: degremont

Too short timeout in shine install

I tried to install my model on multiple nodes but I get this output :
shine install -m /etc/shine/models/model.lmf
Using Lustre model file /etc/shine/models/model.lmf
Error: Cannot create file system configuration directories (preinstall)
Hint: Timed out node(s): node[49,76,81,92,160-165,170-171,174,184,189,191] [rc=-1]

another run gives :

shine install -m /etc/shine/models/model.lmf
Using Lustre model file /etc/shine/models/model.lmf
Error: Cannot create file system configuration directories (preinstall)
Hint: Timed out node(s): node[50,64,71,76,81,122,153,160-164,167,171,174,176,184-185,190-191] [rc=-1]

After some investigation, it seems that the faulty code is here :
lib/Shine/Lustre/Actions/Proxies/Preinstall.py:69

To have a correct behaviour, I did this change :

Index: lib/Shine/Lustre/Actions/Proxies/Preinstall.py
===================================================================
--- lib/Shine/Lustre/Actions/Proxies/Preinstall.py      (revision 210)
+++ lib/Shine/Lustre/Actions/Proxies/Preinstall.py      (working copy)
@@ -45,7 +45,7 @@
         command = "%s preinstall -f %s -R" % (self.progpath, self.fs.fs_name)
 
         # Schedule command for execution
-        self.worker = self.task.shell(command, nodes=self.nodes, handler=self, timeout=2)
+        self.worker = self.task.shell(command, nodes=self.nodes, handler=self, timeout=10)
 
     def ev_close(self, worker):
         """

Regards,

Reported by: ac-cea

specific client mount_path doesn't work

Users may want to mount a filesystem under different mount point on their clients. The following option doesn't work:
client: node=somenode[2-99] mount_path=/some/fs

Code design: there is no ClientModel in Model.py, so client's options are target's one (which is bad...). Need to fix that.

Reported by: thiell

Implement possible value check for command flag, in Shine.Command.Command

It could be interesting to implement a generic handling of valid values for command flags, with attrs add_option().

  attr = { 'optional' : optional,
           'hidden' : False,
           'doc' : "specify view keyword" 
           'values': [ 'fs', 'target', 'disk' ] }

*This could be used for generating docs also
*This is usefull for: views and targets

Reported by: degremont

ha_node in storage.conf seems not operational with nodeset

shine --version
0.904 (lib: SVN commit r187, script: SVN commit r167)

cat /etc/shine/storage.conf | grep -P "^(mdt|ost)"
mdt: tag=mdt_loop_toto node=inti4 dev=/tmp/mdt_loop_toto
ost: tag=ost_loop_toto node=inti2 dev=/tmp/ost_loop_toto ha_node=inti[3,6,7]

...
install one fs with those targets
...

and here is the xmf result:

cat /var/cache/shine/conf/example.xmf | grep -P "^(mdt|ost):"
mdt: node=inti4 index=0 tag=mdt_loop_toto dev=/tmp/mdt_loop_toto
ost: node=inti2 index=0 tag=ost_loop_toto dev=/tmp/ost_loop_toto ha_node=inti3
ost: node=inti2 index=1 tag=ost_loop_toto dev=/tmp/ost_loop_toto ha_node=inti6
ost: node=inti2 index=2 tag=ost_loop_toto dev=/tmp/ost_loop_toto ha_node=inti7

... and surprise we get 3 osts !

Reported by: ohargoaa

Traceback when it could not resolv a hostname.

[root@inti0 named]# shine install -m /etc/shine/models/ptmp.lmf -n ucuh7
Using Lustre model file /etc/shine/models/ptmp.lmf
Traceback (most recent call last):
  File "/usr/sbin/shine", line 72, in ?
    main()
  File "/usr/sbin/shine", line 62, in main
    rc = controller.run_command(sys.argv[1:])
  File "/usr/lib/python2.4/site-packages/Shine/Controller.py", line 94, in run_command
    return self.cmds.execute(cmd_args)
  File "/usr/lib/python2.4/site-packages/Shine/Commands/CommandRegistry.py", line 123, in execute
    rc = command.execute()
  File "/usr/lib/python2.4/site-packages/Shine/Commands/Install.py", line 74, in execute
    fs.install(fs_conf.get_cfg_filename(), nodes=install_nodes)
  File "/usr/lib/python2.4/site-packages/Shine/Lustre/FileSystem.py", line 313, in install
    assert len(servers) > 0, "no servers?"
AssertionError: no servers?

It could have just said "host not found".

Reported by: xdelaruelle

Fix Python warnings under Fedora 11

Fix Python 2.6 deprecated warnings.

SVN commit r167 adds a global warnings filter in scripts/shine to avoid warnings in stderr at runtime.

Reported by: thiell

Router management

Shine should support router managanement as a kind of target.
It should be able to start/stop/status them.

Reported by: degremont

Improve tuning error handling

On shine mount, for now, a failure message is displayed in case of a failed mount and the tuning is skipped.
Improve this and apply tuning on nodes where the mount is successful.

Reported by: thiell

Improve shine help display

Presently, shine help summary is a bit terse and could be improved.
Rather than just have a list of command and their options.

Usage: shine <command> [options...]

  show    conf|fs|info|storage|tuning [-f <fsname>] [-v]
  install -m <LMF file path> [-n <nodes>]
  remove  [-n <nodes>] [-y] -f <fsname>
  format  [-n <nodes>] [-f <fsname>] [-t <target>] [-i <index(es)>] [-vqy]
  status  [-n <nodes>] [-f <fsname>] [-t <target>] [-i <index(es)>] [-vq] [-V <view>]
  start   [-n <nodes>] [-f <fsname>] [-t <target>] [-i <index(es)>] [-vq]
  stop    [-n <nodes>] [-f <fsname>] [-t <target>] [-i <index(es)>] [-vq]
  mount   [-n <nodes>] [-f <fsname>] [-n <nodes>] [-vq]
  umount  [-n <nodes>] [-f <fsname>] [-n <nodes>] [-vq]
  tune    [-n <nodes>] [-f <fsname>] [-t <target>] [-i <index(es)>] [-vq]

It could be interesting to:
*Group commands: Common, FS, Target, Clients, ...
*Give more information on each command

Reported by: degremont

ActionFailedError is not catched when an error is raised

When an error occured in Action.Install, an error is raised an never catched.

# shine start -f bootfs 
Starting 4 targets of bootfs on fortoy[4-6]
[17:52] In progress for 2 target(s) on fortoy[4-5] ...
[17:52] In progress for 1 target(s) on fortoy6 ...
Start successful.
Traceback (most recent call last):
  File "/cea/local/shine/trunk/scripts/shine", line 75, in ?
    main()
  File "/cea/local/shine/trunk/scripts/shine", line 65, in main
    rc = controller.run_command(sys.argv[1:])
  File "/cea/local/shine/trunk/lib/Shine/Controller.py", line 94, in run_command
    return self.cmds.execute(cmd_args)
  File "/cea/local/shine/trunk/lib/Shine/Commands/CommandRegistry.py", line 123, in execute
    rc = command.execute()
  File "/cea/local/shine/trunk/lib/Shine/Commands/Start.py", line 205, in execute
    status = fs.tune(tuning)
  File "/cea/local/shine/trunk/lib/Shine/Lustre/FileSystem.py", line 920, in tune
    self._distant_action_by_server(Install, tune_all, config_file=Globals().get_tuning_file())
  File "/cea/local/shine/trunk/lib/Shine/Lustre/FileSystem.py", line 294, in _distant_action_by_server
    task.resume()
  File "../lib/ClusterShell/Task.py", line 282, in resume
  File "../lib/ClusterShell/Engine/Engine.py", line 628, in run
  File "../lib/ClusterShell/Engine/Poll.py", line 186, in runloop
  File "../lib/ClusterShell/Engine/Engine.py", line 409, in remove
  File "/cea/local/clustershell/trunk/lib/ClusterShell/Worker/Ssh.py", line 159, in _close
    self.worker._check_fini()
  File "/cea/local/clustershell/trunk/lib/ClusterShell/Worker/Ssh.py", line 300, in _check_fini
    self._invoke("ev_close")
  File "../lib/ClusterShell/Worker/Worker.py", line 93, in _invoke
  File "../lib/ClusterShell/Event.py", line 51, in _invoke
  File "/cea/local/shine/trunk/lib/Shine/Lustre/Actions/Install.py", line 60, in ev_close
    rc)
Shine.Lustre.Actions.Action.ActionFailedError: Fatal: Installation of file system configuration failed on fortoy[4-6] (Operation not permitted)

Reported by: degremont

failover status in shine status output

shine status should tell the failover status of targets.

This could be achieved simply:

  • When doing shine status, also run on failover nodes.
  • For each target, check if status ok on master node => Online
  • if status ok on one of failover node => Migrated (nodeA)
  • if something else: error

This could be done during shine status feature remake with tickets #11, #2, #9 and #8

Reported by: degremont

Stop speaking about MGT, only MGS is meaning full for Lustre

The Shine target MGT should be renamed to MGS or, at least, only used internally, and all external messages should speak about a MGS.

I think it is simpler to rename everything to MGS. Moreover, this will simplify some mapping tables.

Reported by: degremont

Add activity notifications for status command

The status command may perform many checks on different cluster nodes. Adding activity notifications during waiting periods will be an good point for the user experience.

Reported by: thiell

Full HA support

Shine should be able to start and stop target on their different devices.

-Format is already supported.
-Status is handled by ticket #14
-Update is not handled by failover

Reported by: degremont

Tuning is applied twice

When starting targets, tuning is applied twice.
To see it: # shine start -d

On management node, Shine.Command.Start launches:
-shine start -R
-shine tune -R

On distant node:
-When start -R, it applied tuning also with Shine.Command.Start
-When tune is received, it applies it again.

Reported by: degremont

Quota support is broken

When activating quota with
quota: on
Formatting failed with the following error:

#
# Shine error report - 2009-08-18 14:30:23
#

Command was: 'shine format -f testext -R -t mdt -i 0'

Traceback (most recent call last):
  File "/cea/local/shine/trunk/lib/Shine/Controller.py", line 114, in run_command
    return self.cmds.execute(cmd_args)
  File "/cea/local/shine/trunk/lib/Shine/Commands/CommandRegistry.py", line 123, in execute
    rc = command.execute()
  File "/cea/local/shine/trunk/lib/Shine/Commands/Format.py", line 251, in execute
    quota_options=fs_conf.get_quota_options())
  File "/cea/local/shine/trunk/lib/Shine/Lustre/FileSystem.py", line 491, in format
    target.format(**kwargs)
  File "/cea/local/shine/trunk/lib/Shine/Lustre/Target.py", line 283, in format
    action.launch()
  File "/cea/local/shine/trunk/lib/Shine/Lustre/Actions/Format.py", line 56, in launch
    self.launch_format()
  File "/cea/local/shine/trunk/lib/Shine/Lustre/Actions/Format.py", line 87, in launch_format
    command.append('"--param=mdt.quota_type=%s"' % \
TypeError: unsubscriptable object

Exception: unsubscriptable object

I think the code in
Shine.Lustre.Actions.Format.launch_format() should be fixed. It handles quota_options strangely.

Reported by: degremont

Display bug with AsciiTable

Output of shine status :

FILESYSTEM COMPONENTS STATUS (ptmp)
+-----+----+------------------------+--------------------------------------------+
|type | #  |         nodes          |                   status                   |
+-----+----+------------------------+--------------------------------------------+
|MGT  |  1 |inti3                   |online (1)                                  |
|MDT  |  1 |inti3                   |online (1)                                  |
|OST  | 20 |inti[1-2]               |online (20)                                 |
|CLI  |137 |inti[0,4-135],uchu[2-5] |offline (2), CHECK FAILURE (43), mounted (9 ||                                 +2)                                          |
+-----+----+------------------------+--------------------------------------------+

What exactly means in CLI mounted "9+2" ?

Thanks in advance.

Reported by: lucaspa

target types are not checked

When specifying a specific target types like:
shine start -t mgs
Shine does not check whether this param is correct.

Reported by: degremont

Customizable path for external command

It is needed to be able to set a custom path when running external command.
This information could be set optionnaly in shine.conf

Maybe when this ticket will be closed, things will be simpler but workaround is possible.
https://sourceforge.net/apps/trac/clustershell/ticket/4
using something like "export PATH=/usr/lib/lustre/:${PATH}; cmd..."

task.shell() should be wrapped.

When this is done, things like in commit SVN commit r161, SVN commit r164 and SVN commit r176 should be removed.

Reported by: degremont

Where are the logs ?

A shine format, had better to show real mkfs.lustre sended commands in stdout or in any other place. After this command, there is nothing in /var/log/messages neither in /var/log/syslog on IO node.

Reported by: ohargoaa

"Shine start" blocks when launched localy

Hello,
I have setup a lustre filesystem localy on my system with :

...
# fs_name
# The Lustre filesystem name (8 characters max).
fs_name: test
# Management Target
mgt: node=node dev=/dev/sdb1
# mdt
# MetaData Target
mdt: node=node dev=/dev/sdc1
# ost
# Object Storage Target(s)
ost: node=node dev=/dev/sdb2
ost: node=node dev=/dev/sdc2
ost: node=node dev=/dev/sdd1
ost: node=node dev=/dev/sde1
...

When I try to launch shine start or shine stop, shine hangs :

# shine stop -f test -vd
Stopping 6 targets of test on node
None: Stopping MDT test-MDT0000 (/dev/sdc1)...
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-S1HLrT/mountdata' '/dev/sdc1']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdc1: catastrophic mode - not reading inode or group bitmaps
[16:46] In progress for 1 target(s) on node ...

To be able to launch it localy, I must add the -L flag :

# shine stop -f test -vd -L
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-bxNGkg/mountdata' '/dev/sdc1']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdc1: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sdc1]
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-wWt7TR/mountdata' '/dev/sde1']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sde1: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sde1]
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-UtJaZH/mountdata' '/dev/sdd1']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdd1: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sdd1]
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-SHM9bo/mountdata' '/dev/sdc2']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdc2: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sdc2]
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-In1BZC/mountdata' '/dev/sdb2']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdb2: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sdb2]
POPEN2: [/bin/sh,-c,export PATH=/usr/lib/lustre/:${PATH}; debugfs -c -R 'dump /CONFIGS/mountdata /tmp/shine-debugfs-NUlGVS/mountdata' '/dev/sdb1']
LINE debugfs 1.41.6.sun1 (30-May-2009)
LINE /dev/sdb1: catastrophic mode - not reading inode or group bitmaps
POPEN2: [/bin/sh,-c,umount /dev/sdb1]
Stop successful.

Shine should detect when I have local OSS, MGS, MDS to start.

Regards,

Aurélien

Reported by: ac-cea

Shine remote errors provoke pickle display

When a remote Shine command returned an error, the local shine displays the pickled output and the Admin intended output.

# shine format -f testloop -y
fortoy7: Remote action format failed: SHINE:2:ev_format_start:gAJ9c
QBVBnRhcmdldHEBKGNTaGluZS5MdXN0cmUuVGFyZ2V0Ck1HVApxAm9xA31xBShVB3Nl
cnZlcnNxBl1xB2NTaGluZS5MdXN0cmUuU2VydmVyClNlcnZlcgpxCCmBcQl9cQooVQd
fbGVuZ3RocQtLAFUJX3BhdHRlcm5zcQx9cQ1VCGZvcnRveSVzcQ4oY0NsdXN0ZXJTaG
VsbC5Ob2RlU2V0ClJhbmdlU2V0CnEPb3EQfXERKGgLSwFVCV9hdXRvc3RlcHESR1SyS
a0llMN9VQdfcmFuZ2VzcRNdcRQoSwdLB0sBSwB0cRVhdWJzaBJOVQNuaWRxFlUQMTcy
LjE2LjI4LjdAdGNwMHEXdWJhVQN0YWdxGE5VC3N0YXR1c19pbmZvcRlOVQVsYWJlbHE
aVQNNR1NxG1UFaW5kZXhxHEsAVQ5hY3Rpb25fZW5hYmxlZHEdiFUKbGRkX3N2bmFtZX
EeTlUFZ3JvdXBxH05VCl9sZGRfZmxhZ3NxIEsAVQlkZXZfaXNibGtxIYlVA2RldnEiV
QgvdG1wL21nc3EjVQZzZXJ2ZXJxJGgJVQpsZGRfZnNuYW1lcSVOVQVzdGF0ZXEmSwNV
BGpkZXZxJ05VD3NlbGVjdGVkX3NlcnZlcnEoSwBVCGRldl9zaXplcSlLAFUEdHlwZXE
qVQNtZ3RxK3Vicy4=
SHINE:2:ev_format_start:gAJ9cQBVBnRhcmdldHEBKGNTaGluZS5MdXN0cmUuVGF
yZ2V0Ck1EVApxAm9xA31xBShVB3NlcnZlcnNxBl1xB2NTaGluZS5MdXN0cmUuU2Vydm
VyClNlcnZlcgpxCCmBcQl9cQooVQdfbGVuZ3RocQtLAFUJX3BhdHRlcm5zcQx9cQ1VC
GZvcnRveSVzcQ4oY0NsdXN0ZXJTaGVsbC5Ob2RlU2V0ClJhbmdlU2V0CnEPb3EQfXER
KGgLSwFVCV9hdXRvc3RlcHESR1SySa0llMN9VQdfcmFuZ2VzcRNdcRQoSwdLB0sBSwB
0cRVhdWJzaBJOVQNuaWRxFlUQMTcyLjE2LjI4LjdAdGNwMHEXdWJhVQN0YWdxGE5VC3
N0YXR1c19pbmZvcRlOVQVsYWJlbHEaVRB0ZXN0bG9vcC1NRFQwMDAwcRtVBWluZGV4c
RxLAFUOYWN0aW9uX2VuYWJsZWRxHYhVCmxkZF9zdm5hbWVxHk5VBWdyb3VwcR9OVQpf
bGRkX2ZsYWdzcSBLAFUJZGV2X2lzYmxrcSGJVQNkZXZxIlUJL3RtcC9tZHQwcSNVBnN
lcnZlcnEkaAlVCmxkZF9mc25hbWVxJU5VBXN0YXRlcSZLA1UEamRldnEnTlUPc2VsZW
N0ZWRfc2VydmVycShLAFUIZGV2X3NpemVxKUsAVQR0eXBlcSpVA21kdHErdWJzLg==
SHINE:2:ev_format_start:gAJ9cQBVBnRhcmdldHEBKGNTaGluZS5MdXN0cmUuVGF
yZ2V0Ck9TVApxAm9xA31xBShVB3NlcnZlcnNxBl1xB2NTaGluZS5MdXN0cmUuU2Vydm
VyClNlcnZlcgpxCCmBcQl9cQooVQdfbGVuZ3RocQtLAFUJX3BhdHRlcm5zcQx9cQ1VC
GZvcnRveSVzcQ4oY0NsdXN0ZXJTaGVsbC5Ob2RlU2V0ClJhbmdlU2V0CnEPb3EQfXER
KGgLSwFVCV9hdXRvc3RlcHESR1SySa0llMN9VQdfcmFuZ2VzcRNdcRQoSwdLB0sBSwB
0cRVhdWJzaBJOVQNuaWRxFlUQMTcyLjE2LjI4LjdAdGNwMHEXdWJhVQN0YWdxGE5VC3
N0YXR1c19pbmZvcRlOVQVsYWJlbHEaVRB0ZXN0bG9vcC1PU1QwMDAwcRtVBWluZGV4c
RxLAFUOYWN0aW9uX2VuYWJsZWRxHYhVCmxkZF9zdm5hbWVxHk5VBWdyb3VwcR9OVQpf
bGRkX2ZsYWdzcSBLAFUJZGV2X2lzYmxrcSGJVQNkZXZxIlUJL3RtcC9vc3QwcSNVBnN
lcnZlcnEkaAlVCmxkZF9mc25hbWVxJU5VBXN0YXRlcSZLA1UEamRldnEnTlUPc2VsZW
N0ZWRfc2VydmVycShLAFUIZGV2X3NpemVxKUsAVQR0eXBlcSpVA29zdHErdWJzLg==
Unknown error: 3
(details in /tmp/shine-error-2009-08-07_13:23:49)

Reported by: degremont

Automatic synchronization of shine clients *.xmf cache

It could be interesting to add a new operation like sync (shine sync ...) to enforce that clients cache are in sync with the one of the master. This could be done using md5sum of the *.xmf files of the master and compare the results with the clients ones.

It would be great to ensure that clients are synchronized with their shine master on the different operations of shine like mount,umount...

This would also add the possibility to automatically deploy the *.xmf files on the clients without any cache in a seamless way.

Reported by: mh-cea

filesystem name must be 1-8 chars

It could be better to add this test at shine install command. Else we get an error at shine format while processing mkfs.lustre command.
As a consequence we have to remove the file system.

[root@inti0 ~]# shine format -f hashinefs
Target_types : None
Format hashinefs on inti[2,6-7]: are you sure? (y)es/(N)o: y
Starting format of 3 targets on inti[2,6-7]
inti2: Format of OST ost_inti2.ddn9900.2 (/dev/ldn.ddn9900.2) failed with error 1
mkfs.lustre: filesystem name must be 1-8 chars
mkfs.lustre: exiting with 1 (Operation not permitted)
inti6: Format of MDT mdt_inti6sdc3 (/dev/sdc3) failed with error 1
mkfs.lustre: filesystem name must be 1-8 chars
mkfs.lustre: exiting with 1 (Operation not permitted)
inti7: Format of MGT mgt_inti7sdb1 (/dev/sdb1) failed with error 1
mkfs.lustre: filesystem name must be 1-8 chars
mkfs.lustre: exiting with 1 (Operation not permitted)
Format successful.
FILESYSTEM COMPONENTS STATUS (hashinefs)
+-----+--+------+------------+
|type |# |nodes |   status   |
+-----+--+------+------------+
|MGT  |1 |inti7 |offline (1) |
|MDT  |1 |inti6 |offline (1) |
|OST  |1 |inti2 |offline (1) |
+-----+--+------+------------+

Reported by: ohargoaa

Multiple nid_map definition in model file doesn't work

If I set several nid_map lines in a model file:

nid_map: nodes=fortoy[1-7,10-12] nids=fortoy[1-7,10-12]-ib0@o2ib0
nid_map: nodes=fortoy[32-159] nids=fortoy[32-159]-ib0@o2ib0

The second line is totally ignored (doesn't complain about 2 lines)
But, when I run shine, it says:

Configuration: Cannot get NID for fortoy32, aborting. 
Please verify `nid_map' configuration.

Reported by: degremont

Better output for the status command

The status command (with the default view : fs) can be improved by providing more information when different states are encountered for a target.

For example:

|CLI  |129 |fortoy[7,32-159] |offline (1), CHECK FAILURE (46), mounted (82) |

For this case, we should know directly which node is offline and on which nodeset the check failed.

Reported by: thiell

OST pool support

Add support for OST pool management through Shine.

Reported by: degremont

Missing /etc/shine/tuning.conf on IO node at shine start command

If this file is missing, we have no possibility to know on which node.

[root@inti0 ~]# shine start -f hashine
Target_types : None
Starting 3 targets on inti[2,6-7]
inti2: Remote action start failed: Configuration.__init__(temporary_xmf_path=None)
SHINE:2:ev_starttarget_start:gAJ9cQBVBnRhcmdldHEBKGNTaGluZS5MdXN0cmUuVGFyZ2V0Ck9TVApxAm9xA31xBShVB3NlcnZlcnNxBl1xB2NTaGluZS5MdXN0cmUuU2VydmVyClNlcnZlcgpxCCmBcQl9cQooVQdfbGVuZ3RocQtLAFUJX3BhdHRlcm5zcQx9cQ1VBmludGklc3EOKGNDbHVzdGVyU2hlbGwuTm9kZVNldApSYW5nZVNldApxD29xEH1xEShoC0sBVQlfYXV0b3N0ZXBxEkdUskmtJZTDfVUHX3Jhbmdlc3ETXXEUKEsCSwJLAUsAdHEVYXVic2gSTlUDbmlkcRZVDmludGkyLWljMEBvMmlicRd1YmFVA3RhZ3EYVRNvc3RfaW50aTIuZGRuOTkwMC4ycRlVC3N0YXR1c19pbmZvcRpOVQVsYWJlbHEbVQ9oYXNoaW5lLU9TVDAwMDBxHFUFaW5kZXhxHUsAVQ5hY3Rpb25fZW5hYmxlZHEeiFUKbGRkX3N2bmFtZXEfTlUFZ3JvdXBxIE5VCl9sZGRfZmxhZ3NxIUsAVQlkZXZfaXNibGtxIolVA2RldnEjVRIvZGV2L2xkbi5kZG45OTAwLjJxJFUGc2VydmVycSVoCVUKbGRkX2ZzbmFtZXEmTlUFc3RhdGVxJ0sDVQRqZGV2cShOVQ9zZWxlY3RlZF9zZXJ2ZXJxKUsAVQhkZXZfc2l6ZXEqSwBVBHR5cGVxK1UDb3N0cSx1YnMu
Disk: tag:ost_inti2.ddn9900.2 - server=inti2 (inti2-ic0@o2ib) - fs:hashine - dev:/dev/ldn.ddn9900.2
None
SHINE:2:ev_starttarget_done:gAJ9cQBVBnRhcmdldHEBKGNTaGluZS5MdXN0cmUuVGFyZ2V0Ck9TVApxAm9xA31xBShVBm1udGRldnEGVRIvZGV2L2xkbi5kZG45OTAwLjJxB1UHc2VydmVyc3EIXXEJY1NoaW5lLkx1c3RyZS5TZXJ2ZXIKU2VydmVyCnEKKYFxC31xDChVB19sZW5ndGhxDUsAVQlfcGF0dGVybnNxDn1xD1UGaW50aSVzcRAoY0NsdXN0ZXJTaGVsbC5Ob2RlU2V0ClJhbmdlU2V0CnERb3ESfXETKGgNSwFVCV9hdXRvc3RlcHEUR1SySa0llMN9VQdfcmFuZ2VzcRVdcRYoSwJLAksBSwB0cRdhdWJzaBROVQNuaWRxGFUOaW50aTItaWMwQG8yaWJxGXViYVUDdGFncRpVE29zdF9pbnRpMi5kZG45OTAwLjJxG1ULc3RhdHVzX2luZm9xHFUiaGFzaGluZS1PU1QwMDAwIGlzIGFscmVhZHkgc3RhcnRlZHEdVQVsYWJlbHEeVQ9oYXNoaW5lLU9TVDAwMDBxH1UFaW5kZXhxIEsAVQ5hY3Rpb25fZW5hYmxlZHEhiFUKbGRkX3N2bmFtZXEiTlUFZ3JvdXBxI05VCl9sZGRfZmxhZ3NxJEsAVQlkZXZfaXNibGtxJYhVA2RldnEmVRIvZGV2L2xkbi5kZG45OTAwLjJxJ1UGc2VydmVycShoC1UKbGRkX2ZzbmFtZXEpTlUFc3RhdGVxKksAVQRqZGV2cStOVQ9zZWxlY3RlZF9zZXJ2ZXJxLEsAVQhkZXZfc2l6ZXEtSwBVBHR5cGVxLlUDb3N0cS91YnMu
Start successful.
Configuration: Failed to open the tuning configuration file : [Errno 2] No such file or directory: '/etc/shine/tuning.conf'
FILESYSTEM COMPONENTS STATUS (hashine)
+-----+--+------+-----------+
|type |# |nodes |  status   |
+-----+--+------+-----------+
|MGT  |1 |inti7 |online (1) |
|OST  |1 |inti2 |online (1) |
|MDT  |1 |inti6 |online (1) |
+-----+--+------+-----------+ 

Reported by: ohargoaa

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.