Giter VIP home page Giter VIP logo

mpich's People

Contributors

abrooks98 avatar alexeymalkhanov avatar dalcinl avatar danghvu avatar goodell avatar hajimefu avatar hzhou avatar jainsura-intel avatar jayeshkrishna avatar jczhang07 avatar jdinan avatar jeffhammond avatar masamichitakagi avatar minsii avatar pavanbalaji avatar raffenet avatar rkalidas avatar roblatham00 avatar sagarth avatar shawnccx avatar shintaro-iwasaki avatar sonjahapp avatar sssharka avatar suhuang99 avatar tarudoodi avatar wesbland avatar wgropp avatar wkliao avatar yfguo avatar zhenggb72 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mpich's Issues

MPICH2 Compile Error

Originally by "McDonald, Sean M CTR USAF AFMC AFRL/RXQO" [email protected] on 2008-07-31 17:30:59 -0500


Hello,

I am trying to compile MPICH2 on a Scientific Linux 5.0 box
(www.scientificlinux.org). The configuration seems to be fine, but I
get compile errors. I looked around and found that in Makefile.in on
Line 50 and 52 it has a hardcoded path to "${srcdir} &&
/sandbox/balaji/trunk/maint/mpich2-1.0.7/maint/simplemake". I would
think it should be something like "${srcdir} && /maint/simplemake/".
This is in version 1.0.7. I modified the Makefile, but this error seems
to be in other places.

I downloaded version 1.0.6p1 and it compiled and ran fine. I would like
to use the newest version for a system I am setting up, so could you
look into this problem and let me know a solution. Thanks

Sean McDonald
AFRL/RXQ Network Administrator
139 Barnes Dr., Suite 2
Tyndall AFB, FL. 32403
Duty Phone: (850) 283-6407 - DSN: 523-6407
Email: [email protected]

test/mpi/io/resized failing on all platforms in old nightly tests

Originally by goodell on 2008-08-01 07:51:00 -0500


http://www.mcs.anl.gov/research/projects/mpich2/todo/runs/IA32-Linux-
GNU-mpd-ch3:nemesis-2008-07-31-22-00-testsumm-mpich2-fail.xml

resized
1 processes
./io
fail

Error: Unsupported datatype passed to ADIOI_Count_contiguous_blocks,
combiner = 18
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0rank 0 in
job 278 schwinn.mcs.anl.gov_37714 caused collective abort of all
ranks
exit status of rank 0: return code 1

adio/common/system_hints.c

Originally by William Gropp [email protected] on 2008-08-04 14:12:41 -0500


I'm getting a failure in this file:

gcc -I/Users/gropp/tmp/mpich2-sock/src/mpid/ch3/include -I/Users/
gropp/projects/software/mpich2/src/mpid/ch3/include -I/Users/gropp/
tmp/mpich2-sock/src/mpid/common/datatype -I/Users/gropp/projects/
software/mpich2/src/mpid/common/datatype -I/Users/gropp/tmp/mpich2-
sock/src/mpid/common/locks -I/Users/gropp/projects/software/mpich2/
src/mpid/common/locks -I/Users/gropp/tmp/mpich2-sock/src/mpid/ch3/
channels/sock/include -I/Users/gropp/projects/software/mpich2/src/
mpid/ch3/channels/sock/include -I/Users/gropp/tmp/mpich2-sock/src/
mpid/common/sock -I/Users/gropp/projects/software/mpich2/src/mpid/
common/sock -I/Users/gropp/tmp/mpich2-sock/src/mpid/common/sock/poll -
I/Users/gropp/projects/software/mpich2/src/mpid/common/sock/poll -g -
Wall -O2 -Wstrict-prototypes -Wmissing-prototypes -Wundef -Wpointer-
arith -Wbad-function-cast -ansi -DGCC_WALL -D_POSIX_C_SOURCE=199506L -
std=c89 -DFORTRANUNDERSCORE -DHAVE_ROMIOCONF_H -I. -I/Users/gropp/
projects/software/mpich2/src/mpi/romio/adio/common/../include -I../
include -I../../include -I/Users/gropp/projects/software/mpich2/src/
mpi/romio/adio/common/../../../../../src/include -I../../../../../src/
include -c /Users/gropp/projects/software/mpich2/src/mpi/romio/adio/
common/system_hints.c
/Users/gropp/projects/software/mpich2/src/mpi/romio/adio/common/
system_hints.c:82:63: warning: character constant too long for its type
/Users/gropp/projects/software/mpich2/src/mpi/romio/adio/common/
system_hints.c: In function 'file_to_info':
/Users/gropp/projects/software/mpich2/src/mpi/romio/adio/common/
system_hints.c:82: error: parse error before ':' token
/Users/gropp/projects/software/mpich2/src/mpi/romio/adio/common/
system_hints.c:111:16: warning: character constant too long for its type
/Users/gropp/projects/software/mpich2/src/mpi/romio/adio/common/
system_hints.c:111: error: parse error before ':' token
make[5]: *** [system_hints.o] Error 1
Make failed in directory adio/common
make[4]: *** [mpiolib] Error 1
make[3]: *** [mpio] Error 2
make[2]: *** [all-redirect] Error 1
make[1]: *** [all-redirect] Error 2
make: *** [all-redirect] Error 2
groppmac:~/tmp/mpich2-sock gropp$

It looks like it is using calloc and free instead of the memory
routines (which will introduce a compile-time error with --enable-
dbg=mem is selected, which I always do). I'll fix this, but this is
a reminder to (a) use the memory routines and (b) configure with --
enable-dbg=mem .

Bill

William Gropp
Paul and Cynthia Saylor Professor of Computer Science
University of Illinois Urbana-Champaign

nemesis ext_procs optimization

Originally by goodell on 2008-08-01 08:42:34 -0500


In [de6e5ee] I committed a rough cut of dynamic processes for nemesis
newtcp. In mpid_nem_inline.h I commented out an optimization that
uses MPID_nem_mem_region.ext_procs because it prevents the proper
operation of dynamic processes. Unfortunately, removing it adds
~100ns to our zero-byte message latencies. So there is a FIXME in
the code that reads like this:

 /* FIXME the ext_procs bit is an optimization for the all-local-procs case.
    This has been commented out for now because it breaks dynamic processes.
    Some other solution should be implemented eventually, possibly using a
    flag that is set whenever a port is opened. [goodell@ 2008-06-18] */

In general, this won't affect real uses who run any inter-node jobs,
since they were already polling every time anyway. However, it does
hurt those wonderful microbenchmarks. A hack fix is to leave this in
but also check to see if a port has been opened. A possibly better
fix is to only poll the network every X iterations of "poll
everything", where X is some tunable parameter.

This req is a reminder for this FIXME.

-Dave

[MPICH2 Req #4176] Fwd: [mpich2-dev] Apparent bypass of correct macros in collective operation code

Originally by goodell on 2008-08-01 08:39:54 -0500


so that we don't forget...

Begin forwarded message:
> From: Dave Goodell <[email protected]>
> Date: July 11, 2008 Jul 11 8:46:04 AM CDT
> To: [email protected]
> Cc: Dave Goodell <[email protected]>
> Subject: Re: [mpich2-dev] Apparent bypass of correct macros in  
> collective operation code
>
> Good catch, Joe.  These ought to be changed to use the  
> MPIU_THREADPRIV_* macros.  I'll forward this to mpich2-maint@ to  
> make sure we don't forget to fix it.
>
> -Dave
>
> On Jul 10, 2008, at 8:47 PM, Joe Ratterman wrote:
>
>> I was recently doing some thread hacking, and I found that some of  
>> my changes where causing a problem in the collective operation C  
>> files: mpich2/src/mpi/coll/op*.c
>>
>> Specifically, this sort of code was a problem since I got rid of  
>> the op_errno field in the MPICH_PerThread object.
>> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/coll/opbor.c
>>    165      default: {
>>    166          MPICH_PerThread_t *p;
>>    167          MPIR_GetPerThread(&p);
>>    168          p->op_errno = MPIR_Err_create_code( MPI_SUCCESS,  
>> MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_OP,  
>> "**opundefined","**opundefined %s", "MPI_BOR" );
>>    169          break;
>>    170      }
>>
>> I think that there are macros to do this, as seen in the  
>> allreduce.c file (extra lines deleted):
>>    117      MPIU_THREADPRIV_DECL;
>>    126      MPIU_THREADPRIV_GET;
>>    158          MPIU_THREADPRIV_FIELD(op_errno) = 0;
>>    473          if (MPIU_THREADPRIV_FIELD(op_errno))
>>    474              mpi_errno = MPIU_THREADPRIV_FIELD(op_errno);
>>
>> With the default macros, that basically does the same thing, but I  
>> didn't have to change the .c file--only the header files.  The  
>> same thing happens in errutil.c
>> https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/errhan/ 
>> errutil.c
>>    156  /* These routines export the nest increment and decrement  
>> for use in ROMIO */
>>    157  void MPIR_Nest_incr_export( void )
>>    158  {
>>    159      MPICH_PerThread_t *p;
>>    160      MPIR_GetPerThread(&p);
>>    161      p->nest_count++;
>>    162  }
>>    163  void MPIR_Nest_decr_export( void )
>>    164  {
>>    165      MPICH_PerThread_t *p;
>>    166      MPIR_GetPerThread(&p);
>>    167      p->nest_count--;
>>    168  }
>>
>>
>>
>> I really think that these places should be using the existing  
>> macros to handle the work.
>>
>>
>> Comments?
>> Joe Ratterman
>> [email protected]
>

RE: [MPICH2 Req #4127] smpd and singleton init

Originally by "Jayesh Krishna" [email protected] on 2008-07-31 16:49:05 -0500


Hi,
As you mentioned, the current code (smpd) in trunk requires PM support
for MPI_Comm_connect()/MPI_Comm_accept() (hence the requirement that
mpiexec should be in the PATH).
We should be able to remove this dependency but I need to run some more
tests before I can confirm that. I am on vacation till Monday so I will
run the tests on Monday and get back to you.
Have a nice weekend,

Regards,
Jayesh


From: Edric Ellis [mailto:[email protected]]
Sent: Monday, July 28, 2008 7:38 AM
To: Jayesh Krishna
Cc: [email protected]
Subject: RE: [MPICH2 Req #4127] smpd and singleton init

Hi Jayesh,

Actually, I now seem to be able to get things communicating once the
processes have mpiexec on their $PATH, and smpd is running. I'm not sure
quite what I fixed though.

In any case, it would be much better for us to be able to restore the
behaviour as per 1.0.3 where the smpd process wasn't needed for
connect/accept.

Cheers,

Edric.


From: Edric Ellis
Sent: Monday, July 28, 2008 11:42 AM
To: 'Jayesh Krishna'
Cc: [email protected]
Subject: RE: [MPICH2 Req #4127] smpd and singleton init

Hi Jayesh,

I've just been looking at the latest MPICH2 from SVN, and what I find is
that I can now get past the call to MPI_Init() without running smpd, but
as soon as I try to perform the MPI_Comm_connect / MPI_Comm_accept phase,
the "connect" process reports an error because it can't execv mpiexec. Is
that expected?

I've tried adding the path to mpiexec to $PATH, and that doesn't help -
even running "smpd -d" shows that the processes are talking to smpd, but
they then both get stuck in a poll().

Cheers,

Edric.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Friday, June 06, 2008 3:14 PM
To: Edric Ellis
Cc: [email protected]
Subject: RE: [MPICH2 Req #4127] smpd and singleton init

Hi,

Just to add to my prev email, currently the singleton init client fails
when smpd is not running on the system (We will be fixing the code so that
the singleton init client fails only when the PM is not running and the
client requires the PM (external PM) services to proceed. We expect this
to be fixed in our next release.)

Regards,

Jayesh


From: Jayesh Krishna [mailto:[email protected]]
Sent: Friday, June 06, 2008 9:08 AM
To: 'Edric Ellis'
Cc: [email protected]
Subject: RE: [MPICH2 Req #4127] smpd and singleton init

Hi,
Yes we have changed the way singleton init client works with SMPD. Now
when a singleton init client is run it tries to connect to the process
manager (without this change the process has no support from the PM and
calls like MPI_Comm_spawn() won't work).
Is there any reason why you don't want smpd to be running on these
machines ?

(PS: We didn't have time to fix the connect/accept problem in smpd for
1.0.7 release. We will be looking into fixing the bug in the next
release.)
Regards,
Jayesh
-----Original Message-----
From: Edric Ellis [mailto:[email protected]]
Sent: Friday, June 06, 2008 8:20 AM
To: [email protected]
Cc: [email protected]
Subject: [MPICH2 Req #4127] smpd and singleton init

Hi mpich2-maint,

I'm looking at upgrading MATLAB from using MPICH2-1.0.3 to MPICH2-1.0.7,
and I notice a change in the way in which singleton init works for the
smpd process manager (we use the smpd build on UNIX and Windows).

In 1.0.3, what we do is the following:

  1. Launch our application under some other services of our own 2. Inside
    our application, call "MPI_Init( 0, 0 )"
  2. Use MPI_Comm_connect/accept to glue things together

When I substitute 1.0.7, MPI_Init fails because smpd isn't running.

Is there a way to get 1.0.7 to behave as per 1.0.3 - i.e. without any
reliance on the smpd process?

As a separate question: what is the status of the connect/accept stall
problems under smpd? I believe that this was fixed in 1.0.6 for mpd.

Cheers,

Edric.

[MPICH2 Req #3942] ch3:nemesis:newtcp and gforker valgrind errors

Originally by goodell on 2008-08-01 08:42:26 -0500


I see some valgrind errors when I build with ch3:nemesis:newtcp and
gforker. I couldn't figure out the problem in a few minutes of
investigation, so I'm filing this bug report so that we don't lose
track of this. There is likely a simpler configuration and test case
that will elicit these warnings, I just haven't spent any time paring
things down and playing with configure args.

-Dave

Configuration line:
./configure --prefix=/home/goodell/testing/nemesis_gforker/test_1/ 
mpich2-installed --with-pm=gforker --with-device=ch3:nemesis:newtcp -- 
enable-g=dbg,log,meminit --disable-fast --enable-nemesis-dbg-nolocal

Test program:
bblogin% cat test.c
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc,char *argv[])
{
    int rank,np;
    int i;
    char buf[100];
    MPI_Status status;

    MPI_Init (&argc,&argv);
    MPI_Comm_rank(MPI_COMM_WORLD,&rank);
    MPI_Comm_size(MPI_COMM_WORLD,&np);

    if (rank == 0) {
        for (i = 1; i < np; i++) {
            MPI_Send(buf, 0, MPI_CHAR, i, 0, MPI_COMM_WORLD);
        }
        for (i = 1; i < np; i++) {
            MPI_Send(buf, 0, MPI_CHAR, i, 0, MPI_COMM_WORLD);
        }
    }
    else {
        MPI_Recv(buf, 0, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
        MPI_Recv(buf, 0, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
    }
    MPI_Finalize();
    return 0;
}

Valgrind output:
bblogin% valgrind ./a.out
==28198## Memcheck, a memory error detector.28198== Copyright (C) 2002-2006, and GNU GPL'd, by Julian Seward et  
al.
==28198== Using LibVEX rev 1658, a library for dynamic binary  
translation.
==28198## Copyright (C) 2004-2006, and GNU GPL'd, by OpenWorks LLP.28198== Using valgrind-3.2.1-Debian, a dynamic binary  
instrumentation framework.
==28198== Copyright (C) 2000-2006, and GNU GPL'd, by Julian Seward et  
al.
==28198## For more details, rerun with: -v28198## ==28198== Invalid read of size 828198##    at 0x40152A4: (within /lib/ld-2.5.so)28198##    by 0x400A7CD: (within /lib/ld-2.5.so)28198##    by 0x4006164: (within /lib/ld-2.5.so)28198##    by 0x40084AB: (within /lib/ld-2.5.so)28198##    by 0x40116EC: (within /lib/ld-2.5.so)28198##    by 0x400D725: (within /lib/ld-2.5.so)28198##    by 0x401114A: (within /lib/ld-2.5.so)28198##    by 0x534BB7F: (within /lib/libc-2.5.so)28198##    by 0x400D725: (within /lib/ld-2.5.so)28198##    by 0x534BCE6: __libc_dlopen_mode (in /lib/libc-2.5.so)28198##    by 0x5327516: __nss_lookup_function (in /lib/libc-2.5.so)28198##    by 0x53275C4: (within /lib/libc-2.5.so)28198==  Address 0x4032CE0 is 16 bytes inside a block of size 23  
alloc'd
==28198##    at 0x4C20A69: malloc (vg_replace_malloc.c:149)28198##    by 0x4008999: (within /lib/ld-2.5.so)28198##    by 0x40116EC: (within /lib/ld-2.5.so)28198##    by 0x400D725: (within /lib/ld-2.5.so)28198##    by 0x401114A: (within /lib/ld-2.5.so)28198##    by 0x534BB7F: (within /lib/libc-2.5.so)28198##    by 0x400D725: (within /lib/ld-2.5.so)28198##    by 0x534BCE6: __libc_dlopen_mode (in /lib/libc-2.5.so)28198##    by 0x5327516: __nss_lookup_function (in /lib/libc-2.5.so)28198##    by 0x53275C4: (within /lib/libc-2.5.so)28198##    by 0x532DC0A: gethostbyname_r (in /lib/libc-2.5.so)28198##    by 0x532D402: gethostbyname (in /lib/libc-2.5.so)28198==
==28198== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 16  
from 1)
==28198## malloc/free: in use at exit: 2,826 bytes in 15 blocks.28198## malloc/free: 106 allocs, 91 frees, 8,405,715 bytes allocated.28198## For counts of detected errors, rerun with: -v28198## searching for pointers to 15 not-freed blocks.28198## checked 264,992 bytes.28198## ==28198== LEAK SUMMARY:28198##    definitely lost: 0 bytes in 0 blocks.28198##      possibly lost: 0 bytes in 0 blocks.28198##    still reachable: 2,826 bytes in 15 blocks.28198##         suppressed: 0 bytes in 0 blocks.28198== Reachable blocks (those to which a pointer was found) are  
not shown.
==28198== To see them, rerun with: --show-reachable=yes

datatype bug

Originally by Darius Buntinas [email protected] on 2008-08-01 15:55:57 -0500


Forwarding from David Gingold.
-d

-------- Original Message --------

Darius --

I got the changes integrated into our library this week, and all
seemed well until I tried the failing Intel MPI test again. Now it
falls over with a similar assertion, using a different datatype.

The test case below (which is what I sent before, but this time with
different block lengths and displacements) reproduces the problem in
our library. Can you try this easily with yours?

-dg

....

include <assert.h>

include <stdlib.h>

include <stdio.h>

include <mpi.h>

include "mpid_dataloop.h"

int MPID_Segment_init(const DLOOP_Buffer buf,
DLOOP_Count count,
DLOOP_Handle handle,
struct DLOOP_Segment *segp,
int hetero);

void MPID_Segment_pack(struct DLOOP_Segment *segp,
DLOOP_Offset first,
DLOOP_Offset *lastp,
void *pack_buffer);

int main(int argc, char *argv[])
{
int ierr;
MPID_Segment segment;
MPI_Aint last;
int dis[2], blklens[2];
MPI_Datatype type;
int send_buffer[60];
int recv_buffer[60];

ierr = MPI_Init(&argc, &argv);
assert(ierr == MPI_SUCCESS);

dis[0] = 0;
dis[1] = 15;

blklens[0] = 0;
blklens[1] = 10;

last = 192;

ierr = MPI_Type_indexed(2, blklens, dis, MPI_INT, &type);
assert(ierr == MPI_SUCCESS);

ierr = MPI_Type_commit(&type);
assert(ierr == MPI_SUCCESS);

ierr = MPID_Segment_init(send_buffer, 1, type, &segment, 0);
assert(ierr == MPI_SUCCESS);

MPID_Segment_pack(&segment, 88, &last, recv_buffer);

MPI_Finalize();
return 0;

}

Re: MPE logging API with threadsafety

Originally by "P. Klein" [email protected] on 2008-08-04 08:31:40 -0500


Hi Anthony,

thanks once again for the beta release. In February,I planned to sent to
you more or less immediately a response on how things work, but I have
been pretty busy in those days. Therefore, I decided to wait until I can
tell you something more exciting than just "it runs".

Please find attached a paper which uses the thread-safe MPE and for your
amusement the .slog file mentioned in this paper. I am looking forward
to hear your opinion about our work.

Kind regards
Peter

Kind regards
Peter

Anthony Chan wrote:

Hi Peter,

I have put together a threadsafe version of MPE logging API in
the latest RC tarball which can be downloaded at

ftp://ftp.mcs.anl.gov/pub/mpi/mpe/beta/mpe2-1.0.7rc1.tar.gz

A sample C program that uses the updated API at
/share/examples_logging/pthread_sendrecv_user.c
which can be compiled just like pthread_sendrecv as documented
in the Makefile. Documentation of the updated API's manpages can
be found in /man and /www.
Let me know if the updated API has any problem used in
your multithreaded program.

A.Chan

On Wed, 9 Jan 2008, Anthony Chan wrote:

            Dr. rer. nat. Peter Klein

| | ||||| Fraunhofer ITWM
|_|||||| Abteilung: OPT
| | |
|||| Fraunhoferplatz 1
|**
|**||||| D-67663 Kaiserslautern
| ___ |
|| | | | |/|| phone (+49-)|(0)631 31600 4591
|| | |/| | || fax (+49-)|(0)631 31600 1099
|
______________| e-Mail: [email protected]

Build fails

Originally by "Rajeev Thakur" [email protected] on 2008-08-01 14:56:17 -0500


If I do a fresh main/updatefiles, configure, and make, I get the following
error:

rm -f mpich2-mpdroot.o
copying python files/links into /sandbox/thakur/tmp/bin
rm -f mpich2-mpdroot
make[4]: Leaving directory /sandbox/thakur/tmp/src/pm/mpd' make[3]: Leaving directory/sandbox/thakur/tmp/src/pm'
make[2]: Leaving directory /sandbox/thakur/tmp/src/pm' make[1]: Leaving directory/sandbox/thakur/tmp/src'
make[1]: Entering directory /sandbox/thakur/tmp/examples' CC /homes/thakur/cvs/mpich2/examples/cpi.c ../bin/mpicc -o cpi cpi.o -lm /sandbox/thakur/tmp/lib/libmpich.a(socksm.o)(.text+0x66): In function dbg_print_sc_tbl':
: undefined reference to CONN_STATE_TO_STRING' collect2: ld returned 1 exit status make[1]: *** [cpi] Error 1 make[1]: Leaving directory/sandbox/thakur/tmp/examples'
make: *** [all-redirect] Error 2

Using 32 bit as rank

Originally by wei huang [email protected] on 2008-08-05 18:47:22 -0500


Hi list,

We here are trying to run mvapich2, which is based on mpich2-1.0.7, on
more than 32k processes. However, we find that MPIDI_Message_match
structure uses only int16_t as the rank. This is not enough for job larger
than 32k. It looks like the follow change that uses int32_t for rank is
needed to scale. Would you consider integrate this change in future mpich2
releases? Thanks.

Index: src/mpid/ch3/include/mpidpre.h
===================================================================
--- src/mpid/ch3/include/mpidpre.h      (revision 2891)
+++ src/mpid/ch3/include/mpidpre.h      (revision 2892)
@@ -65,7 +65,7 @@
 typedef struct MPIDI_Message_match
 {
     int32_t tag;
-    int16_t rank;
+    int32_t rank;
     int16_t context_id;
 }
 MPIDI_Message_match;

Regards,
Wei Huang

774 Dreese Lab, 2015 Neil Ave,
Dept. of Computer Science and Engineering
Ohio State University
OH 43210
Tel: (614)292-8501

PATCH: Fixes for MPI_Comm_dup and MPI_Comm_split (intercommunicator case)

Originally by "Lisandro Dalcin" [email protected] on 2008-08-01 14:39:19 -0500


Hi all,

Some intercommunicator collectives make use of 'is_low_group' field in
MPID_Comm structure. This field is not being correctly filled when
MPI_Comm_dup() and MPI_Comm_split() is called on an intercommunicator,
and then MPI_Barrier(), MPI_Allgather(), MPI_Allgatherv() (and
probably MPI_Reduce_scatter(), I've not tried) deadlock.

You have attached a tentative patch (against SVN trunk) for fixing this issue.

I've tested them for MPI_Comm_dup() case, but not for the
MPI_Comm_split() case (but it seems that the low group flag just needs
to be inherited from the parent intercommunicator, but perhaps I'm
missing something, so please review this case with care).

BTW, Could you anticipate in what version (1.1.0 or perhaps 1.0.7p1)
could this issue get fixed?

Regards,

Lisandro Dalcín

Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

Re: [MPICH2 Req #3768] Problem with MPICH2

Originally by Anthony Chan [email protected] on 2008-08-06 16:43:39 -0500


I would think that the default binary on 64bit machine is 64bit, i.e
you don't need to set any *FLAGS in building mpich2. Assuming it is
not the case that you do need to modify the binary format, you need to
set CFLAGS, CXXFLAGS, FFLAGS and F90FLAGS ("./configure --help" will
show all the relevant *FLAGS, be sure don't set CPPFLAGS before configuing
mpich2) to -m64.

A.Chan
----- "Vijay Mann" [email protected] wrote:

Hi,

I think we are hitting into the same problem with fortran mpich
libraries. We are using gfortran (which accepts -m64 flag for 64 bit
compilation).

We tried the following set of flags:
export FCFLAGS="-m64"
export FCFLAGS_f90="-m64"
export FFLAGS="-m64

and they didn't seem to work.

Can you please help?

Thanks,

Vijay Mann
Technical Staff Member,
IBM India Research Laboratory, New Delhi, India.
Phone: 91-11- 41292168
http://www.research.ibm.com/people/v/vijamann/

Anthony Chan [email protected]

11/16/2007 04:52 AM
To Pradipta De/India/IBM@IBMIN

cc [email protected], Vijay Mann/India/IBM@IBMIN

Subject Re: [MPICH2 Req #3768] Problem with MPICH2

Did you set CFLAGS and CXXFLAGS to the same 64bit flag used by your
C/C++ compiler ?

On Thu, 15 Nov 2007, Pradipta De wrote:

Hi,

We downloaded and compiled MPICH2 on a PowerPC box running FC6.
We are trying to use mpicxx to compile our mpi code in 64-bit mode.
We get the following error: (our mpich2 install directory is
/hpcfs/downloaded_software/mpich2-install/)

/usr/bin/ld: skipping incompatible
/hpcfs/downloaded_software/mpich2-install//lib/libmpichcxx.a when
searching for -lmpichcxx
/usr/bin/ld: cannot find -lmpichcxx
collect2: ld returned 1 exit status

Is there some flag that needs to be specified during configuration
to
allow for 64-bit version ?

thanks, and regards,
-- pradipta

ssm build

Originally by "Rajeev Thakur" [email protected] on 2008-07-30 11:33:25 -0500


All the ssm builds in last night's tests failed to compile. Might be a
simple fix.

Rajeev

Beginning make
Using variables CC='gcc' CFLAGS=' -O2' LDFLAGS='' AR='ar' FC='g77' F90='f95'
FFLAGS=' -O2' F90FLAGS=' -O2' CXX='g++'
In directory: /sandbox/buntinas/cb/mpich2/src/mpid/ch3/util/shm
CC
/home/MPI/testing/mpich2/mpich2/src/mpid/ch3/util/shm/ch3u_finalize_sshm.c
/home/MPI/testing/mpich2/mpich2/src/mpid/ch3/util/shm/ch3u_finalize_sshm.c:
In function 'MPIDI_CH3U_Finalize_sshm':
/home/MPI/testing/mpich2/mpich2/src/mpid/ch3/util/shm/ch3u_finalize_sshm.c:7
4: error: too few arguments to function 'MPIDI_PG_Get_next'
/home/MPI/testing/mpich2/mpich2/src/mpid/ch3/util/shm/ch3u_finalize_sshm.c:7
7: error: too few arguments to function 'MPIDI_PG_Get_next'
/home/MPI/testing/mpich2/mpich2/src/mpid/ch3/util/shm/ch3u_finalize_sshm.c:8
0: error: too few arguments to function 'MPIDI_PG_Get_next'

[MPICH2 Req #4214] cleanup MPIR_Get_contextid (and callers)

Originally by goodell on 2008-08-01 07:24:49 -0500


(this is a re-send of req#4214 so that trac learns about it)

The MPIR_Get_contextid function needs to be overhauled a bit. It
doesn't use the standard MPICH2 error handling approach yet it's a
non-trivial function. Specifically, I've run into issues lately
where the comm subsystem is hosed in such a way it makes the
NMPI_Allreduce call that MPIR_Get_contextid makes fail.
Unfortunately, MPIR_Get_contextid simply returns 0 if there was a
problem, so the stack trace is simply thrown away and all errors show
up like this:

Fatal error in MPI_Comm_accept: Other MPI error, error stack:
MPI_Comm_accept(117)..: MPI_Comm_accept(port="tag#1$description#intel-
loane[1]$port#46959$ifname#140.221.37.57$", MPI_INFO_NULL, root=0,
MPI_COMM_WORLD, newcomm=0x7ff0004dc) failed
MPID_Comm_accept(149).:
MPIDI_Comm_accept(915): Too many communicators

In reality, the original error was caused deep down in the nemesis
layer, but you can't see it here.

I'm filing this instead of just fixing it because there are two
versions of this function that need to be fixed and tested on all
platforms. Also, all the call sites need to be updated to check the
mpi_errno and handle it accordingly. This isn't critical for the
release, so it can probably wait a little while.

-Dave

Re: [MPICH2 Req #3768] Problem with MPICH2

Originally by Vijay Mann [email protected] on 2008-08-06 15:43:46 -0500


Hi,

I think we are hitting into the same problem with fortran mpich libraries.
We are using gfortran (which accepts -m64 flag for 64 bit compilation).

We tried the following set of flags:
export FCFLAGS="-m64"
export FCFLAGS_f90="-m64"
export FFLAGS="-m64

and they didn't seem to work.

Can you please help?

Thanks,

Vijay Mann
Technical Staff Member,
IBM India Research Laboratory, New Delhi, India.
Phone: 91-11- 41292168
http://www.research.ibm.com/people/v/vijamann/

Anthony Chan [email protected]
11/16/2007 04:52 AM

To
Pradipta De/India/IBM@IBMIN
cc
[email protected], Vijay Mann/India/IBM@IBMIN
Subject
Re: [MPICH2 Req #3768] Problem with MPICH2

Did you set CFLAGS and CXXFLAGS to the same 64bit flag used by your
C/C++ compiler ?

On Thu, 15 Nov 2007, Pradipta De wrote:

Hi,

We downloaded and compiled MPICH2 on a PowerPC box running FC6.
We are trying to use mpicxx to compile our mpi code in 64-bit mode.
We get the following error: (our mpich2 install directory is
/hpcfs/downloaded_software/mpich2-install/)

/usr/bin/ld: skipping incompatible
/hpcfs/downloaded_software/mpich2-install//lib/libmpichcxx.a when
searching for -lmpichcxx
/usr/bin/ld: cannot find -lmpichcxx
collect2: ld returned 1 exit status

Is there some flag that needs to be specified during configuration to
allow for 64-bit version ?

thanks, and regards,
-- pradipta

autoconf warnings

Originally by Dave Goodell [email protected] on 2008-07-31 11:26:44 -0500


Whenever I do a ./maint/updatefiles I get these warning messages over
and over. Presumably they're harmless, since everything still builds
and runs just fine, but they'd be nice to get rid of.

-Dave

configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_static_works, ...): suspicious cache-id, must
contain cv to be cached
/sandbox/chan/autoconf/autoconf-2.62/lib/autoconf/general.m4:1973:
AC_CACHE_VAL is expanded from...
/sandbox/chan/autoconf/autoconf-2.62/lib/autoconf/general.m4:1993:
AC_CACHE_CHECK is expanded from...
./libtool.m4:640: AC_LIBTOOL_LINKER_OPTION is expanded from...
./libtool.m4:2551: _LT_AC_LANG_C_CONFIG is expanded from...
./libtool.m4:2550: AC_LIBTOOL_LANG_C_CONFIG is expanded from...
./libtool.m4:80: AC_LIBTOOL_SETUP is expanded from...
./libtool.m4:60: _AC_PROG_LIBTOOL is expanded from...
./libtool.m4:25: AC_PROG_LIBTOOL is expanded from...
configure.in:202: the top level
configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_pic_works, ...): suspicious cache-id, must contain
cv to be cached
./libtool.m4:595: AC_LIBTOOL_COMPILER_OPTION is expanded from...
./libtool.m4:4666: AC_LIBTOOL_PROG_COMPILER_PIC is expanded from...
configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_pic_works_CXX, ...): suspicious cache-id, must
contain cv to be cached
./libtool.m4:2663: _LT_AC_LANG_CXX_CONFIG is expanded from...
./libtool.m4:2662: AC_LIBTOOL_LANG_CXX_CONFIG is expanded from...
./libtool.m4:1701: _LT_AC_TAGCONFIG is expanded from...
configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_pic_works_F77, ...): suspicious cache-id, must
contain cv to be cached
./libtool.m4:3756: _LT_AC_LANG_F77_CONFIG is expanded from...
./libtool.m4:3755: AC_LIBTOOL_LANG_F77_CONFIG is expanded from...
configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_pic_works_GCJ, ...): suspicious cache-id, must
contain cv to be cached
./libtool.m4:3862: _LT_AC_LANG_GCJ_CONFIG is expanded from...
./libtool.m4:3861: AC_LIBTOOL_LANG_GCJ_CONFIG is expanded from...
configure.in:202: warning: AC_CACHE_VAL
(lt_prog_compiler_static_works, ...): suspicious cache-id, must
contain cv to be cached

mpdboot error

Originally by "Osentoski, Sarah" [email protected] on 2008-08-04 16:11:20 -0500


Hi,

I have a question. I tried to set up mpi on a set of 5 computers.
mpdboot works on each machine individually.

However if I run:

-bash-3.00$ mpdboot -n 5 -d --verbose --ncpus=3
debug: starting
running mpdallexit on erl01
LAUNCHED mpd on erl01 via
debug: launch cmd= /home/sosentoski/mpich2-install/bin/mpd.py
--ncpus=3 -e -d
debug: mpd on erl01 on port 35273
RUNNING: mpd on erl01
debug: info for running mpd: {'ncpus': 3, 'list_port': 35273,
'entry_port': *, 'host': 'erl01', 'entry_host': *, 'ifhn': ''}
LAUNCHED mpd on erl04 via erl01
debug: launch cmd= ssh -x -n -q erl04
'/home/sosentoski/mpich2-install/bin/mpd.py -h erl01 -p 35273
--ncpus=1 -e -d'
LAUNCHED mpd on erl07 via erl01
debug: launch cmd= ssh -x -n -q erl07
'/home/sosentoski/mpich2-install/bin/mpd.py -h erl01 -p 35273
--ncpus=1 -e -d'
LAUNCHED mpd on erl06 via erl01
debug: launch cmd= ssh -x -n -q erl06
'/home/sosentoski/mpich2-install/bin/mpd.py -h erl01 -p 35273
--ncpus=1 -e -d'
LAUNCHED mpd on erl05 via erl01
debug: launch cmd= ssh -x -n -q erl05
'/home/sosentoski/mpich2-install/bin/mpd.py -h erl01 -p 35273
--ncpus=1 -e -d'
debug: mpd on erl07 on port no_port
mpdboot_erl01 (handle_mpd_output 406): from mpd on erl07, invalid port
info:
no_port

Do you have any helpful hints about what might be wrong with my set up?

Thanks

Sarah Osentoski

Errors in maint/updatefiles

Originally by "Rajeev Thakur" [email protected] on 2008-08-04 14:23:09 -0500


Maint/updatefiles gives the following errors now.

Shortname **mpi_accumulate %p %d %D %d %d %d %D %O %W for specific messages
has no expansion (first seen in file :src/mpi/rma/accumulate.c)
Shortname **mpi_alloc_mem %d %I %p for specific messages has no expansion
(first seen in file :src/mpi/rma/alloc_mem.c)
Shortname **mpi_get %p %d %D %d %d %d %D %W for specific messages has no
expansion (first seen in file :src/mpi/rma/get.c)
Shortname **mpi_pack_external %s %p %d %D %p %d %p for specific messages has
no expansion (first seen in file :src/mpi/datatype/pack_external.c)
Shortname **mpi_put %p %d %D %d %d %d %D %W for specific messages has no
expansion (first seen in file :src/mpi/rma/put.c)
Shortname **mpi_type_create_hvector %d %d %d %D %p for specific messages has
no expansion (first seen in file :src/mpi/datatype/type_create_hvector.c)
Shortname **mpi_type_hvector %d %d %d %D %p for specific messages has no
expansion (first seen in file :src/mpi/datatype/type_hvector.c)
Shortname **mpi_unpack_external %s %p %d %p %p %d %D for specific messages
has no expansion (first seen in file :src/mpi/datatype/unpack_external.c)
Shortname **mpi_win_create %p %d %d %I %C %p for specific messages has no
expansion (first seen in file :src/mpi/rma/win_create.c)
Because of errors in extracting error messages, the file
src/mpi/errhan/defmsg.h was not updated.
There are unused error message texts in src/mpi/errhan/errnames.txt
See the file unusederr.txt for the complete list

.....
Creating src/pm/smpd/smpd_version.h
Updating README's version ID.

Problems encountered while running updatefiles.
These may cause problems when configuring or building MPICH2.
 Error message files in src/mpi/errhan were not updated.

MPICH2 fpi.exe hanging on Windows XP

Originally by "Ayer, Timothy C." [email protected] on 2008-08-04 09:32:36 -0500


I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it on 2
hosts (hostA, hostB) and trying to run the fpi.exe built with fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe <\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe <\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It does
prompt for input (number of intervals) but once entered it hangs. I have
launched the smpd process using smpd -d but see no output from the smpd
after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe <\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.