Giter VIP home page Giter VIP logo

Comments (6)

garvct avatar garvct commented on July 26, 2024 1

Try to maximize the number of disks working on the I/O operation. beegfs-df can help to see what disks/targets are active.

from azurehpc.

garvct avatar garvct commented on July 26, 2024

The following procedure worked for me.

  • Have existing beegfs (4xL8sv2), but want to increase storage and metadata servers from 4 to 6.
    -azhpc-resize beegfssm 6

[hpcadmin@beegfsm beegfs]$ beegfs-check-servers
Management

beegfsm [ID: 1]: reachable at 10.34.4.14:8008 (protocol: TCP)

Metadata

beegfa57e000000 [ID: 1]: reachable at 10.34.4.4:8005 (protocol: TCP)
beegfa57e000004 [ID: 2]: reachable at 10.34.4.8:8005 (protocol: TCP)
beegfa57e000003 [ID: 3]: reachable at 10.34.4.7:8005 (protocol: TCP)
beegfa57e000001 [ID: 4]: reachable at 10.34.4.5:8005 (protocol: TCP)
beegfa57e000006 [ID: 5]: reachable at 10.34.4.12:8005 (protocol: TCP)
beegfa57e000005 [ID: 6]: reachable at 10.34.4.6:8005 (protocol: TCP)

Storage

beegfa57e000001 [ID: 1]: reachable at 10.34.4.5:8003 (protocol: TCP)
beegfa57e000003 [ID: 2]: reachable at 10.34.4.7:8003 (protocol: TCP)
beegfa57e000004 [ID: 3]: reachable at 10.34.4.8:8003 (protocol: TCP)
beegfa57e000000 [ID: 4]: reachable at 10.34.4.4:8003 (protocol: TCP)
beegfa57e000006 [ID: 5]: reachable at 10.34.4.12:8003 (protocol: TCP)
beegfa57e000005 [ID: 6]: reachable at 10.34.4.6:8003 (protocol: TCP)

We can see that 2 extra storage and metadata servers have been added.

from azurehpc.

lmiroslaw avatar lmiroslaw commented on July 26, 2024

It worked. Thanks. However, this is strange that I don't see the performance improvement when doubling the size of beegfsm. I am testing the performance by copying the 24GB folder between two locations: time cp sim sim3 -R
The folder contains ca. 120 directories with several files in each in MB range (2.2M, 119MB, 47MB).

For small and bigger beegfsm I get the same result.
real 2m26.809s
user 0m0.461s
sys 0m29.615s

vs
real 2m32.859s
user 0m0.440s
sys 0m28.253s

IO Pattern: 55k reads, 50k writes, summing up to 90% of execution time.

I also tried to change the chunk_size with beegfs-ctl --setpattern --chunksize=1m --numtargets=8 /beegfs/chunksize_1m_4t to 1m, 64kB and 4m size with 8, 1, 8 targets, respectively.

This did not affect the results much.

from azurehpc.

garvct avatar garvct commented on July 26, 2024

Have you tried multiple cp's ? Maybe each cp to a different target. May need to determine if the source data is on 4 storage targets or more. Need to determine if reading or writing is slowing the performance.

from azurehpc.

lmiroslaw avatar lmiroslaw commented on July 26, 2024

First feedback: This is my first attempt to parallelize cp operation:

for i in {0..N}
do
  cp -r $sourcedir/processor$i/* $destination/processor$i  &
done
wait # wait for cp threads to finish

With this code I was able to reduce the copying time from 1m41sec to 58 secs. Now I will test the same code after doubling the size of the cluster.

from azurehpc.

garvct avatar garvct commented on July 26, 2024

Closed

from azurehpc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.