Giter VIP home page Giter VIP logo

Comments (20)

markmoe19 avatar markmoe19 commented on September 1, 2024 1

I can confirm the size_t resolves the --sort name issue for me! Thanks!
snippet of output below and it takes 50 some minutes and a lot of RAM to complete the sort
It is 540M files and many have really long paths.
snippet.txt

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024 1

Right, I normally do split the dwalk for mfu file genertation separate from dwalk to generate text file from the mfu file. I keep the mfu file for 7 days back and rotate them out after that. Useful for future, faster dwalk and dfind runs, thanks!

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Thanks, @markmoe19 . I'd like to find and fix the underlying problem. It's not immediately clear as to what the cause is.

Does it fail for other sort options like --sort size, or is it unique to --sort name?

I see it's printing a stack trace at the point of the segfault. It would help to also include line numbers. Does it still fail if you build in debug mode -DCMAKE_BUILD_TYPE=Debug?

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

debug.txt
I was able to reproduce with crash with debug option. Looks like the crash is more like using 64 cpus across 2 nodes rather than 64 cpus on 1 node. See attached debug.txt file.

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

Not sure if this matters or not, we have some files with \n and/or \r in the actual file name. dwalk seems to output that ok (with the \n causing a line-break as expected). So, that is probably not the issue in this case, but just wanted to mention the wild characters that might be in our filenames.

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

@adammoody the crash does not happen with --sort size, only with --sort name as shown in debug.txt attachment above

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Thanks, @markmoe19 . The line numbers help clarify the problematic code path. I'll see if that's enough. I may come back to you and request adding some printf statements to get more debug info.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

I haven't spotted anything obvious in the code, and I can't get this to segfault in my testing so far.

I'm working up a branch of DTCMP with some printf statements in various spots to get more info. When you have a chance, I'd like to have you run with this debug build. I'll post some instructions on how to build with that next week.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

@markmoe19 , I suspect the problematic code is more likely to be in DTCMP. Before we take that step, can you reproduce the segfault after making the changes below to add a couple printf statements to sort_files_stat() in src/common/mfu_flist_sort.c of mpiFileUtils?

diff --git a/src/common/mfu_flist_sort.c b/src/common/mfu_flist_sort.c
index effb80a..1de69d2 100644
--- a/src/common/mfu_flist_sort.c
+++ b/src/common/mfu_flist_sort.c
@@ -265,6 +265,11 @@ static mfu_flist sort_files_stat(const char* sortfields, mfu_flist flist)
     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
     MPI_Comm_size(MPI_COMM_WORLD, &ranks);
 
+    uint64_t global_size = mfu_flist_global_size(flist);
+    printf("%d: local_size=%d global_size=%d chars=%d\n",
+        rank, (int)incount, (int)global_size, (int)chars);
+    fflush(stdout);
+
     /* build type for file path */
     MPI_Datatype dt_filepath, dt_user, dt_group;
     MPI_Type_contiguous((int)chars,       MPI_CHAR, &dt_filepath);
@@ -529,6 +534,10 @@ static mfu_flist sort_files_stat(const char* sortfields, mfu_flist flist)
         idx++;
     }
 
+    printf("%d: key_extent=%d, keysat_extent=%d, bufsize=%d exp=%d\n",
+        rank, (int)key_extent, (int)keysat_extent, (int)(sortptr - (char*)sortbuf), (int)(sortbufsize));
+    fflush(stdout);
+
     /* sort data */
     void* outsortbuf;
     int outsortcount;

With this, each rank should print a couple of messages in a dwalk --sort name. This is to help verify that the input buffer is sized correctly based on the list and MPI derived datatypes.

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

snippet.txt

new crash output is attached. I happen to run with "--sort size" first and it did not crash (which is expected). The attached though is using "--sort name" which did cause the crash also as expected. Debug mode was enabled and your extra printf commands were added. Thanks.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Ok, thanks. That all looks reasonable, and in fact, I think it provided a great clue.

I noticed that it's printing some negative values for the size of the buffer. That's because I mistakenly used an int datatype in the debug printf statements. However, that also pointed out that you are using some large input buffers and that DTCMP might also have an overflow bug. That indeed looks to be the case:

https://github.com/LLNL/dtcmp/blob/dfd514b04f9b7fd492aea8a2f8db811a4b314f00/src/dtcmp_merge_2way.c#L47-L53

Are you installing DTCMP by hand or using another method like Spack?

If you are installing by hand, can you edit src/dtcmp_merge_2way.c to replace the int in these two int remainder = ... lines with size_t types, rebuild DTCMP, and try the dwalk --sort again with the modified DTCMP library?

If you are not yet installing by hand, I can provide some instructions on how to do that.

BTW, I've optimistically got a PR ready to go: LLNL/dtcmp#17

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

I'm using build instructions from https://mpifileutils.readthedocs.io/en/v0.11.1/build.html

dtcmp is included from "wget https://github.com/hpc/mpifileutils/releases/download/v0.11.1/mpifileutils-v0.11.1.tgz" and expands in the folder at mpifileutils-v0.11.1/dtcmp

Just to be sure, are you saying that in the file dtcmp_merge_2way.c, I need to replace "int remainder" with "size_t remainder"? Thanks

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Ok, good. That distribution builds DTCMP and mpiFileUtils all in one shot, so that simplifies things.

Yes, you got it. Go ahead and make those two int --> size_t changes in dtcmp_meger_2way.c and rebuild.

In the meantime, since I now have a better idea of the data sizes involved, I'll try again to reproduce the segfault here.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

It took some trial and error to find a configuration that used enough memory without using so much as to OOM, but I was able to reproduce the segfault (with int) and then verify that the DTCMP fix (with size_t) resolves it in my case. I went ahead and merged LLNL/dtcmp#17 into DTCMP, which will be packaged with the next mpiFileUtils release.

I'd still like to know whether the fix works for you, especially since you could use it as a work around until the next release is stamped.

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

1.8TB each on 2 nodes when I sort the data by name! Each node has 2.0TB RAM, so it just fits.
When I don't sort the data, these jobs typically take 266GB RAM on 1 node.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Great! Glad that we figured that out.

I'm sure the sort operation in DTCMP could be optimized further -- DTCMP is not intentionally slow, but it was written more for functionality than performance. For one, I think it's doing a bunch of intermediate string copies using the current algorithm. It would probably help to modify the elements to record the pointer to the string rather than a copy of the string itself. The strings could then be rearranged once at the end after fully sorting.

Having said that, it is using a parallel sort. If you have access to more resources, it should run faster by using more procs/nodes.

You can go ahead and drop those debug printf statements we added. I don't think we need those any longer.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

And I think you've already mentioned doing this, but for testing, you can break the walk and sort into two steps:

srun -n64 -N2 dwalk --output unsorted.mfu /path/to/walk
srun -n256 -N8 dwalk --input unsorted.mfu --sort name --output sorted.mfu

This lets you try different sort configurations without having to walk again.

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

It scales well, 4 nodes takes about half the time.

2nodes, 32proc per node = 540M files walked in 7282s, sorted in 3067s, wrote to text output file in 62s
4nodes, 32proc per node = 542M files walked in 3755s, sorted in 1246s, wrote to text output file in 39s

The different total file count is just yesterday versus today.

from mpifileutils.

adammoody avatar adammoody commented on September 1, 2024

Ok, looks good. Thanks for sharing the performance numbers. That's quite the set of files to be working with.

I'll go ahead and close this issue out as being resolved by LLNL/dtcmp#17, which will be included in the upcoming v0.12 release of mpiFileUtils.

Thanks again, @markmoe19 , for reporting this issue and for taking the time to work through it with me!

from mpifileutils.

markmoe19 avatar markmoe19 commented on September 1, 2024

Thanks for the fixes! mpifileutils really helps us quickly manage very large amounts of data!

from mpifileutils.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.