Giter VIP home page Giter VIP logo

Comments (29)

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-04 09:32:36 -0500


This message has 1 attachment(s)

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-04 09:32:36 -0500


Attachment added: fpi.f (2.4 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-04 09:58:49 -0500


Hi,
If you are running your executable from a shared network drive you need
to map (see "--map" option of mpiexec in the window's developer's guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the "-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Type:
bug
Status: new | Priority:
major
Component: mpich2 |
-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It
does prompt for input (number of intervals) but once entered it hangs. I
have launched the smpd process using smpd -d but see no output from the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me
to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-04 09:58:49 -0500


Attachment added: part0001.html (4.1 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-04 13:01:41 -0500


Attachment added: part0001.2.html (10.6 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-04 13:01:41 -0500


You should try,

mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
file://hosta/temp/fpi.exe

Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have access
to drives shared by users)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The exe can be directly accessed from hostB by executing
\hostA\temp\fpi.exe, that is, you could type it directly into a command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note: I did try: mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

  call MPI_INIT( ierr )
  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

All execute.

Thanks,
Tim


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks Jayesh for the quick reply. This is a network availabe UNC path -
why do I need to map a drive?

I am familiar with the machines file - I was just using the command line
for debugging.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
If you are running your executable from a shared network drive you need
to map (see "--map" option of mpiexec in the window's developer's guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the "-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Type:
bug
Status: new | Priority:
major
Component: mpich2 |
-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It
does prompt for input (number of intervals) but once entered it hangs. I
have launched the smpd process using smpd -d but see no output from the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me
to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 13:26:47 -0500


Hello,

Was there actually a bug that has been fixed? ...so I should download
1.1a1, the pre-release version?

I had sent some smpd -d output to Jayesh Krishna on 8/5/2008 but did not
hear back.

Thanks for your help.
Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:16 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:

  Type:  bug                                            |      Status:

closed
Priority: major | Component:
mpich2
Resolution: fixed | Keywords:

------------------------------------------------------------+---------------

Changes (by thakur):

  • status: new => closed
  • resolution: => fixed

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:4

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 13:32:48 -0500


Sorry, I did read that message I was just a little surprised. Thank you.

Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:27 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:

  Type:  bug                                            |      Status:

closed
Priority: major | Component:
mpich2
Resolution: fixed | Keywords:

------------------------------------------------------------+---------------

Comment (by Ayer, Timothy C.):

Hello,

Was there actually a bug that has been fixed? ...so I should download
1.1a1, the pre-release version?

I had sent some smpd -d output to Jayesh Krishna on 8/5/2008 but did not
hear back.

Thanks for your help.
Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:16 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:

   Type:  bug                                            |      Status:

closed
Priority: major | Component:
mpich2
Resolution: fixed | Keywords:

------------------------------------------------------------+---------------

Changes (by thakur):

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 13:58:26 -0500


Attachment added: part0001.3.html (31.6 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 13:58:26 -0500


Hi,
The logs sent by you show that the communication btw the process
managers on the hosts is good. The problem looks to be with the
communication btw the MPI processes.

Can you try compiling icpi.c (MPICH2\examples) and run the program in

your setup (Make sure that the problem is not related to fortran
bindings).

I have seen that some times that the uninstall/install of MPICH2 does

not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

dir c:\windows\system32\mpich2_.dll
dir c:\windows\system32\mpe_.dll

Send us the results for verification (Sanity check- they should have the
same datestamp)

Also when running fpi.exe using your setup try leaving the job (or may

be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the MPI
processes from both hosts.

(PS: The MPICH2 1.1.0a1 release
(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Please find attached the output from the smpd -d procs. Also, the output
from the mpiexec just so you can see what I typed.

H:>mpiexec.exe -map v:\10.30.73.170\temp -hosts 2 10.30.73.170
10.30.73.34 v:\fpi.exe
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive
100


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The socket/channel connection between the MPI processes take place during
MPI_Bcast() (not before that in fpi.f).


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The firewall has been disabled.

The inputs were from me entering values for estimating pi...I wanted to
make sure the program ran through all the logic.

I will send the other debug output a little later.

Also, as an fyi, we have been running MPICH on thousands of PC's for
years now. The other strange part is that over a year ago I did
successfully run MPICH2 on over 30 processors. My first thought was the
firewall as well.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Do you have windows firewall (or any firewall) running on these machines

?

Why do I see two inputs (10 & 100) in the mpiexec debug output ?

Can you send us the debug output of smpd along with mpiexec ?

Can you check the status of the remote smpd from each host ?

--- On host A, run "smpd -status IPAddressOf_hostB"
--- On host B, run "smpd -status IPAddressOf_hostA"

(PS: I just tried running fpi.exe in a shared drive across two 32-bit
windows XP machines in our lab but did not get any errors/hang)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

This is the same fpi.f which comes with the installation with the
exception that I have added print statements.

The setup is homogenous (both 32-bit). The output is attached.

Thanks for your help.

Tim


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you modified

the program ?)?

I am assuming that the setup is not heterogeneous (MPICH2 currently does

not support running jobs across machines with different data models eg:
You cannot run your MPI job across 32-bit and 64-bit machines)

Please provide us with the debug/verbose output when running fpi.exe.

Start smpd on both the machines in debug mode (1. Stop any instances of
smpd running on the system, smpd -stop 2. Start smpd in debug mode, smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA IPAddressOf_hostB
y:\fpi.exe)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks, here is the output (note: I have not included IP address or
actual hostnames in this email but did use them in testing)

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

OUTPUT:
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

XXXXXX (hostname of hostA)


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The command hostname (c:\windows\system32\hostname.exe)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You have "hostname" at the end of the second line...what is that referring
to?


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

What is the error message (output) that you get when you run mpiexec ?
Pls provide us with the output of the following commands (Make sure that
you specify ipaddresses of the hosts involved),

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

No this does not work...the behavior is the same. The UNC's should/have
worked regardless of whether a user a user is logged in. We have never
relied on drive network drive mappings since they are intermittently an
"interactive" feature.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You should try,

mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
file://hosta/temp/fpi.exe

Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have access
to drives shared by users)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The exe can be directly accessed from hostB by executing
\hostA\temp\fpi.exe, that is, you could type it directly into a command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note: I did try: mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

  call MPI_INIT( ierr )
  call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
  call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

All execute.

Thanks,
Tim


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks Jayesh for the quick reply. This is a network availabe UNC path -
why do I need to map a drive?

I am familiar with the machines file - I was just using the command line
for debugging.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
If you are running your executable from a shared network drive you need
to map (see "--map" option of mpiexec in the window's developer's guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the "-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Type:
bug
Status: new | Priority:
major
Component: mpich2 |
-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It
does prompt for input (number of intervals) but once entered it hangs. I
have launched the smpd process using smpd -d but see no output from the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me
to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Rajeev Thakur on 2008-08-13 14:10:11 -0500


Tim,
We have a new bug tracking system (Trac) that I am not fully familiar
with. I was going through the list trying to close ones I thought
(mistakenly or otherwise) needed no further action. I didn't know that it
also sent a note to the sender :-). Jayesh will follow up with you further
on this issue.

Rajeev

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 14:11:09 -0500


Hi Jayesh,

Great to hear from you. I will try your suggestions (icpi.c and slow
response).

Also here is the output you requested. I have been wondering why the dates
on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)??? ...I should
have mentioned it sooner.

Thanks,
Tim

C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of c:\windows\system32

04/04/2008 05:46 PM 135,168 mpe.dll
1 File(s) 135,168 bytes
0 Dir(s) 4,497,502,208 bytes free

C:\WINDOWS\system32>

C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of C:\WINDOWS\system32

Directory of C:\WINDOWS\system32

04/04/2008 05:28 PM 1,110,016 mpich2.dll
04/04/2008 05:47 PM 151,552 mpich2mpe.dll
04/04/2008 05:23 PM 159,744 mpich2mpi.dll
04/04/2008 06:31 PM 1,159,168 mpich2mt.dll
04/04/2008 06:42 PM 1,351,680 mpich2mtp.dll
04/04/2008 05:43 PM 1,306,624 mpich2p.dll
04/04/2008 05:55 PM 1,093,632 mpich2shm.dll
04/04/2008 06:03 PM 1,294,336 mpich2shmp.dll
11/23/2005 02:33 AM 1,032,192 mpich2sshm.dll <<<<<<<<<<<<<<<<
11/23/2005 02:36 AM 1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008 06:14 PM 1,122,304 mpich2ssm.dll
04/04/2008 06:22 PM 1,343,488 mpich2ssmp.dll
12 File(s) 12,419,072 bytes
0 Dir(s) 4,497,502,208 bytes free

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+---------------

Comment (by Jayesh Krishna):

Hi,
The logs sent by you show that the communication btw the process
managers on the hosts is good. The problem looks to be with the
communication btw the MPI processes.

Can you try compiling icpi.c (MPICH2\examples) and run the program in

your setup (Make sure that the problem is not related to fortran
bindings).

I have seen that some times that the uninstall/install of MPICH2 does

not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

dir c:\windows\system32\mpich2_.dll
dir c:\windows\system32\mpe_.dll

Send us the results for verification (Sanity check- they should have the
same datestamp)

Also when running fpi.exe using your setup try leaving the job (or may

be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the MPI
processes from both hosts.

(PS: The MPICH2 1.1.0a1 release
(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Please find attached the output from the smpd -d procs. Also, the output
from the mpiexec just so you can see what I typed.

H:>mpiexec.exe -map v:\10.30.73.170\temp -hosts 2 10.30.73.170
10.30.73.34 v:\fpi.exe
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive
100


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The socket/channel connection between the MPI processes take place during
MPI_Bcast() (not before that in fpi.f).


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The firewall has been disabled.

The inputs were from me entering values for estimating pi...I wanted to
make sure the program ran through all the logic.

I will send the other debug output a little later.

Also, as an fyi, we have been running MPICH on thousands of PC's for
years now. The other strange part is that over a year ago I did
successfully run MPICH2 on over 30 processors. My first thought was the
firewall as well.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Do you have windows firewall (or any firewall) running on these machines

?

Why do I see two inputs (10 & 100) in the mpiexec debug output ?

Can you send us the debug output of smpd along with mpiexec ?

Can you check the status of the remote smpd from each host ?

--- On host A, run      "smpd -status IPAddressOf_hostB"
--- On host B, run      "smpd -status IPAddressOf_hostA"

(PS: I just tried running fpi.exe in a shared drive across two 32-bit
windows XP machines in our lab but did not get any errors/hang)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

This is the same fpi.f which comes with the installation with the
exception that I have added print statements.

The setup is homogenous (both 32-bit). The output is attached.

Thanks for your help.

Tim


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you modified

the program ?)?

I am assuming that the setup is not heterogeneous (MPICH2 currently does

not support running jobs across machines with different data models eg:
You cannot run your MPI job across 32-bit and 64-bit machines)

Please provide us with the debug/verbose output when running fpi.exe.

Start smpd on both the machines in debug mode (1. Stop any instances of
smpd running on the system, smpd -stop 2. Start smpd in debug mode, smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA IPAddressOf_hostB
y:\fpi.exe)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks, here is the output (note: I have not included IP address or
actual hostnames in this email but did use them in testing)

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

OUTPUT:
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

XXXXXX (hostname of hostA)


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The command hostname (c:\windows\system32\hostname.exe)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You have "hostname" at the end of the second line...what is that referring
to?


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

What is the error message (output) that you get when you run mpiexec ?
Pls provide us with the output of the following commands (Make sure that
you specify ipaddresses of the hosts involved),

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

No this does not work...the behavior is the same. The UNC's should/have
worked regardless of whether a user a user is logged in. We have never
relied on drive network drive mappings since they are intermittently an
"interactive" feature.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You should try,

mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
file://hosta/temp/fpi.exe

Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have access
to drives shared by users)

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The exe can be directly accessed from hostB by executing
\hostA\temp\fpi.exe, that is, you could type it directly into a command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note: I did try: mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

   call MPI_INIT( ierr )
   call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
   call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

All execute.

Thanks,
Tim


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh


From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks Jayesh for the quick reply. This is a network availabe UNC path -
why do I need to map a drive?

I am familiar with the machines file - I was just using the command line
for debugging.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
If you are running your executable from a shared network drive you need
to map (see "--map" option of mpiexec in the window's developer's guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the "-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Type:
bug
Status: new | Priority:
major
Component: mpich2 |
-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It
does prompt for input (number of intervals) but once entered it hangs. I
have launched the smpd process using smpd -d but see no output from the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me
to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 14:13:57 -0500


Thanks for letting me know. I knew something was up...this explains it :)
...no worries.

Jayesh and I are currently "discussing" it. ;)

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 3:10 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+---------------

Comment (by Rajeev Thakur):

Tim,
We have a new bug tracking system (Trac) that I am not fully familiar
with. I was going through the list trying to close ones I thought
(mistakenly or otherwise) needed no further action. I didn't know that it
also sent a note to the sender :-). Jayesh will follow up with you further
on this issue.

Rajeev

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 14:37:57 -0500


Attachment added: part0001.4.html (26.6 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 14:37:57 -0500


Hi,
Hmmm... This looks like the problem that I mentioned in my email.
sshm.dll s should have the same datestamp as other dlls (should not be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory (Please

be careful! Make sure that you delete only mpich2_.dll & mpe_.dll)

Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you. I will try your suggestions (icpi.c and slow
response).

Also here is the output you requested. I have been wondering why the
dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim

C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of c:\windows\system32

04/04/2008 05:46 PM 135,168 mpe.dll
1 File(s) 135,168 bytes
0 Dir(s) 4,497,502,208 bytes free

C:\WINDOWS\system32>

C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of C:\WINDOWS\system32

Directory of C:\WINDOWS\system32

04/04/2008 05:28 PM 1,110,016 mpich2.dll
04/04/2008 05:47 PM 151,552 mpich2mpe.dll
04/04/2008 05:23 PM 159,744 mpich2mpi.dll
04/04/2008 06:31 PM 1,159,168 mpich2mt.dll
04/04/2008 06:42 PM 1,351,680 mpich2mtp.dll
04/04/2008 05:43 PM 1,306,624 mpich2p.dll
04/04/2008 05:55 PM 1,093,632 mpich2shm.dll
04/04/2008 06:03 PM 1,294,336 mpich2shmp.dll
11/23/2005 02:33 AM 1,032,192 mpich2sshm.dll <<<<<<<<<<<<<<<<
11/23/2005 02:36 AM 1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008 06:14 PM 1,122,304 mpich2ssm.dll
04/04/2008 06:22 PM 1,343,488 mpich2ssmp.dll
12 File(s) 12,419,072 bytes
0 Dir(s) 4,497,502,208 bytes free

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

Hi,
The logs sent by you show that the communication btw the process
managers on the hosts is good. The problem looks to be with the
communication btw the MPI processes.

Can you try compiling icpi.c (MPICH2\examples) and run the program in

your setup (Make sure that the problem is not related to fortran
bindings).

I have seen that some times that the uninstall/install of MPICH2 does

not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it
does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

dir c:\windows\system32\mpich2_.dll
dir c:\windows\system32\mpe_.dll

Send us the results for verification (Sanity check- they should have

the
same datestamp)

Also when running fpi.exe using your setup try leaving the job (or may

be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the MPI
processes from both hosts.

(PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Please find attached the output from the smpd -d procs. Also, the
output
from the mpiexec just so you can see what I typed.

H:>mpiexec.exe -map v:\10.30.73.170\temp -hosts 2 10.30.73.170
10.30.73.34 v:\fpi.exe
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive
100

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The socket/channel connection between the MPI processes take place
during
MPI_Bcast() (not before that in fpi.f).

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The firewall has been disabled.

The inputs were from me entering values for estimating pi...I wanted to
make sure the program ran through all the logic.

I will send the other debug output a little later.

Also, as an fyi, we have been running MPICH on thousands of PC's for
years now. The other strange part is that over a year ago I did
successfully run MPICH2 on over 30 processors. My first thought was the
firewall as well.

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Do you have windows firewall (or any firewall) running on these

machines
?

Why do I see two inputs (10 & 100) in the mpiexec debug output ?

Can you send us the debug output of smpd along with mpiexec ?

Can you check the status of the remote smpd from each host ?

 --- On host A, run      "smpd -status IPAddressOf_hostB"
 --- On host B, run      "smpd -status IPAddressOf_hostA"

(PS: I just tried running fpi.exe in a shared drive across two 32-bit
windows XP machines in our lab but did not get any errors/hang)

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

This is the same fpi.f which comes with the installation with the
exception that I have added print statements.

The setup is homogenous (both 32-bit). The output is attached.

Thanks for your help.

Tim

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?

I am assuming that the setup is not heterogeneous (MPICH2 currently

does
not support running jobs across machines with different data models eg:
You cannot run your MPI job across 32-bit and 64-bit machines)

Please provide us with the debug/verbose output when running fpi.exe.

Start smpd on both the machines in debug mode (1. Stop any instances of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA IPAddressOf_hostB
y:\fpi.exe)

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks, here is the output (note: I have not included IP address or
actual hostnames in this email but did use them in testing)

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

OUTPUT:
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

XXXXXX (hostname of hostA)

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The command hostname (c:\windows\system32\hostname.exe)

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You have "hostname" at the end of the second line...what is that
referring
to?

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

What is the error message (output) that you get when you run mpiexec ?
Pls provide us with the output of the following commands (Make sure
that
you specify ipaddresses of the hosts involved),

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA

IPAddressOf_hostB y:\fpi.exe

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

No this does not work...the behavior is the same. The UNC's should/have
worked regardless of whether a user a user is logged in. We have never
relied on drive network drive mappings since they are intermittently an
"interactive" feature.

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You should try,

mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
file://hosta/temp/fpi.exe

Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The exe can be directly accessed from hostB by executing
\hostA\temp\fpi.exe, that is, you could type it directly into a command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note: I did try: mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

    call MPI_INIT( ierr )
    call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
    call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

All execute.

Thanks,
Tim

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh

_____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks Jayesh for the quick reply. This is a network availabe UNC path

why do I need to map a drive?

I am familiar with the machines file - I was just using the command line
for debugging.

_____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
If you are running your executable from a shared network drive you
need
to map (see "--map" option of mpiexec in the window's developer's guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the
"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:owner-
[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Type:
bug
Status: new | Priority:
major
Component: mpich2 |
-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2). I have installed it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast). It
does prompt for input (number of intervals) but once entered it hangs.
I
have launched the smpd process using smpd -d but see no output from the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \hostA\temp\fpi.exe
<\hostA\temp\fpi.exe>

Any suggestions would be appreciated. Also let me know if you want me
to
send debug output.

Thanks,
Tim


Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

<<fpi.f>>

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 14:50:00 -0500


Hi,
I spoke too soon. We have discontinued supporting sshm channel and that
is the reason that you have an old version of sshm related dlls in your
system32 directory.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:38 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Jayesh Krishna):

Hi,
Hmmm... This looks like the problem that I mentioned in my email.
-sshm*.dll s should have the same datestamp as other dlls (should not be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory (Please

be careful! Make sure that you delete only mpich2_.dll & mpe_.dll) #
Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you. I will try your suggestions (icpi.c and slow
response).

Also here is the output you requested. I have been wondering why the
dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim

C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of c:\windows\system32

04/04/2008 05:46 PM 135,168 mpe.dll
1 File(s) 135,168 bytes
0 Dir(s) 4,497,502,208 bytes free

C:\WINDOWS\system32>

C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of C:\WINDOWS\system32

Directory of C:\WINDOWS\system32

04/04/2008 05:28 PM 1,110,016 mpich2.dll
04/04/2008 05:47 PM 151,552 mpich2mpe.dll
04/04/2008 05:23 PM 159,744 mpich2mpi.dll
04/04/2008 06:31 PM 1,159,168 mpich2mt.dll
04/04/2008 06:42 PM 1,351,680 mpich2mtp.dll
04/04/2008 05:43 PM 1,306,624 mpich2p.dll
04/04/2008 05:55 PM 1,093,632 mpich2shm.dll
04/04/2008 06:03 PM 1,294,336 mpich2shmp.dll
11/23/2005 02:33 AM 1,032,192 mpich2sshm.dll <<<<<<<<<<<<<<<<
11/23/2005 02:36 AM 1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008 06:14 PM 1,122,304 mpich2ssm.dll
04/04/2008 06:22 PM 1,343,488 mpich2ssmp.dll
12 File(s) 12,419,072 bytes
0 Dir(s) 4,497,502,208 bytes free

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

Reporter:  "Ayer, Timothy  C." [email protected]  |

Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

Hi,
The logs sent by you show that the communication btw the process
managers on the hosts is good. The problem looks to be with the
communication btw the MPI processes.

Can you try compiling icpi.c (MPICH2\examples) and run the program in

your setup (Make sure that the problem is not related to fortran
bindings).

I have seen that some times that the uninstall/install of MPICH2 does

not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it
does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

dir c:\windows\system32\mpich2_.dll
dir c:\windows\system32\mpe_.dll

 Send us the results for verification (Sanity check- they should have

the
same datestamp)

Also when running fpi.exe using your setup try leaving the job (or

may
be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the MPI
processes from both hosts.

(PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Please find attached the output from the smpd -d procs. Also, the
output
from the mpiexec just so you can see what I typed.

H:>mpiexec.exe -map v:\10.30.73.170\temp -hosts 2 10.30.73.170
10.30.73.34 v:\fpi.exe
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive
100

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The socket/channel connection between the MPI processes take place
during
MPI_Bcast() (not before that in fpi.f).

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The firewall has been disabled.

The inputs were from me entering values for estimating pi...I wanted to
make sure the program ran through all the logic.

I will send the other debug output a little later.

Also, as an fyi, we have been running MPICH on thousands of PC's for
years now. The other strange part is that over a year ago I did
successfully run MPICH2 on over 30 processors. My first thought was
the
firewall as well.

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Do you have windows firewall (or any firewall) running on these

machines
?

Why do I see two inputs (10 & 100) in the mpiexec debug output ?

Can you send us the debug output of smpd along with mpiexec ?

Can you check the status of the remote smpd from each host ?

  --- On host A, run      "smpd -status IPAddressOf_hostB"
  --- On host B, run      "smpd -status IPAddressOf_hostA"

(PS: I just tried running fpi.exe in a shared drive across two 32-bit
windows XP machines in our lab but did not get any errors/hang)

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

This is the same fpi.f which comes with the installation with the
exception that I have added print statements.

The setup is homogenous (both 32-bit). The output is attached.

Thanks for your help.

Tim

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?

I am assuming that the setup is not heterogeneous (MPICH2 currently

does
not support running jobs across machines with different data models eg:
You cannot run your MPI job across 32-bit and 64-bit machines)

Please provide us with the debug/verbose output when running fpi.exe.

Start smpd on both the machines in debug mode (1. Stop any instances of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA IPAddressOf_hostB
y:\fpi.exe)

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks, here is the output (note: I have not included IP address or
actual hostnames in this email but did use them in testing)

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

OUTPUT:
Process 0 of 2 is alive
Enter the number of intervals: (0 quits)
Process 1 of 2 is alive
Before bcast 1 of 2 is alive
10
Before bcast 0 of 2 is alive

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

XXXXXX (hostname of hostA)

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The command hostname (c:\windows\system32\hostname.exe)

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You have "hostname" at the end of the second line...what is that
referring
to?

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

What is the error message (output) that you get when you run mpiexec ?
Pls provide us with the output of the following commands (Make sure

that
you specify ipaddresses of the hosts involved),

mpiexec.exe -map y:\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

No this does not work...the behavior is the same. The UNC's
should/have
worked regardless of whether a user a user is logged in. We have never
relied on drive network drive mappings since they are intermittently an
"interactive" feature.

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

You should try,

mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
file://hosta/temp/fpi.exe

Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive
is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

The exe can be directly accessed from hostB by executing
\hostA\temp\fpi.exe, that is, you could type it directly into a
command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note: I did try: mpiexec.exe -map y:\hostA\temp -hosts 2 hostA hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

     call MPI_INIT( ierr )
     call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
     call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )

All execute.

Thanks,
Tim

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh

 _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Thanks Jayesh for the quick reply. This is a network availabe UNC path

why do I need to map a drive?

I am familiar with the machines file - I was just using the command
line
for debugging.

 _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
 If you are running your executable from a shared network drive you

need
to map (see "--map" option of mpiexec in the window's developer's
guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any
other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the
"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:owner-
[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Type:
bug
Status: new |
Priority:
major
Component: mpich2 |

-----------------------------------------------------------+------------
-----------------------------------------------------------+----

I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have installed

it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

The following tests work fine from hostA, both prompt for a number of

intervals, accept input, and produce and estimate of PI

mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe

<\hostA\temp\fpi.exe>

mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe

<\hostA\temp\fpi.exe>

The following test hangs when submitted from hostA (in MPI_Bcast).  It

does prompt for input (number of intervals) but once entered it hangs.
I
have launched the smpd process using smpd -d but see no output from
the
smpd after I enter an interval value

mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe

<\hostA\temp\fpi.exe>

Any suggestions would be appreciated.   Also let me know if you want

me
to
send debug output.

Thanks,
Tim

_____________________
Timothy C. Ayer
High Performance Technical Computing
United Technologies - Pratt & Whitney
[email protected]
(860) 565 - 5268 v
(860) 565 - 2668 f

 <<fpi.f>>

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 14:50:00 -0500


Attachment added: part0001.5.html (30.8 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 14:53:03 -0500


That's a bummer I thought for sure that must be it....oh well. I will
pursue the other two options.

Thanks,
Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 3:50 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+---------------

Comment (by Jayesh Krishna):

Hi,
I spoke too soon. We have discontinued supporting sshm channel and that
is the reason that you have an old version of sshm related dlls in your
system32 directory.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:38 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Jayesh Krishna):

Hi,
Hmmm... This looks like the problem that I mentioned in my email.
-sshm*.dll s should have the same datestamp as other dlls (should not be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory (Please

be careful! Make sure that you delete only mpich2_.dll & mpe_.dll) #
Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you. I will try your suggestions (icpi.c and slow
response).

Also here is the output you requested. I have been wondering why the
dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim

C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of c:\windows\system32

04/04/2008 05:46 PM 135,168 mpe.dll
1 File(s) 135,168 bytes
0 Dir(s) 4,497,502,208 bytes free

C:\WINDOWS\system32>

C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
Volume in drive C is System
Volume Serial Number is D8B5-0657

Directory of C:\WINDOWS\system32


Directory of C:\WINDOWS\system32

04/04/2008 05:28 PM 1,110,016 mpich2.dll
04/04/2008 05:47 PM 151,552 mpich2mpe.dll
04/04/2008 05:23 PM 159,744 mpich2mpi.dll
04/04/2008 06:31 PM 1,159,168 mpich2mt.dll
04/04/2008 06:42 PM 1,351,680 mpich2mtp.dll
04/04/2008 05:43 PM 1,306,624 mpich2p.dll
04/04/2008 05:55 PM 1,093,632 mpich2shm.dll
04/04/2008 06:03 PM 1,294,336 mpich2shmp.dll
11/23/2005 02:33 AM 1,032,192 mpich2sshm.dll <<<<<<<<<<<<<<<<
11/23/2005 02:36 AM 1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008 06:14 PM 1,122,304 mpich2ssm.dll
04/04/2008 06:22 PM 1,343,488 mpich2ssmp.dll
12 File(s) 12,419,072 bytes
0 Dir(s) 4,497,502,208 bytes free

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

 Reporter:  "Ayer, Timothy  C." [email protected]  |

Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

Hi,
  The logs sent by you show that the communication btw the process
managers on the hosts is good. The problem looks to be with the
communication btw the MPI processes.

# Can you try compiling icpi.c (MPICH2\examples) and run the program in
your setup (Make sure that the problem is not related to fortran
bindings).
# I have seen that some times that the uninstall/install of MPICH2 does
not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it

does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

>>> dir c:\windows\system32\mpich2*.dll
>>> dir c:\windows\system32\mpe*.dll

  Send us the results for verification (Sanity check- they should have

the
same datestamp)

# Also when running fpi.exe using your setup try leaving the job (or

may
be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the MPI
processes from both hosts.

(PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

Regards,
Jayesh


  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


Please find attached the output from the smpd -d procs.  Also, the

output
from the mpiexec just so you can see what I typed.

H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
10.30.73.34 v:\fpi.exe
 Process            0  of            2  is alive
Enter the number of intervals: (0 quits)
 Process            1  of            2  is alive
 Before bcast            1  of            2  is alive
10
 Before bcast            0  of            2  is alive
100



  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


The socket/channel connection between the MPI processes take place

during
MPI_Bcast() (not before that in fpi.f).

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


The firewall has been disabled.

The inputs were from me entering values for estimating pi...I wanted to
make sure the program ran through all the logic.

I will send the other debug output a little later.

Also,  as an fyi, we have been running MPICH on thousands of PC's for
years now.  The other strange part is that over a year ago I did
successfully run MPICH2 on over 30 processors.  My first thought was

the
firewall as well.

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


# Do you have windows firewall (or any firewall) running on these

machines
?
# Why do I see two inputs (10 & 100) in the mpiexec debug output ?
# Can you send us the debug output of smpd along with mpiexec ?
# Can you check the status of the remote smpd from each host ?
--- On host A, run "smpd -status IPAddressOf_hostB"
--- On host B, run "smpd -status IPAddressOf_hostA"

(PS: I just tried running fpi.exe in a shared drive across two 32-bit
windows XP machines in our lab but did not get any errors/hang)

Regards,
Jayesh

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


This is the same fpi.f which comes with the installation with the
exception that I have added print statements.

The setup is homogenous (both 32-bit).  The output is attached.

Thanks for your help.

Tim

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


# Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?
# I am assuming that the setup is not heterogeneous (MPICH2 currently
does
not support running jobs across machines with different data models eg:
You cannot run your MPI job across 32-bit and 64-bit machines)
# Please provide us with the debug/verbose output when running fpi.exe.
Start smpd on both the machines in debug mode (1. Stop any instances of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA IPAddressOf_hostB
y:\fpi.exe)

Regards,
Jayesh

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


Thanks, here is the output (note:  I have not included IP address or
actual hostnames in this email but did use them in testing)

# mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

OUTPUT:
 Process            0  of            2  is alive
Enter the number of intervals: (0 quits)
 Process            1  of            2  is alive
 Before bcast            1  of            2  is alive
10
 Before bcast            0  of            2  is alive

# mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

XXXXXX (hostname of hostA)



  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


The command hostname (c:\windows\system32\hostname.exe)

Regards,
Jayesh

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


You have "hostname" at the end of the second line...what is that

referring
to?

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 What is the error message (output) that you get when you run mpiexec ?
 Pls provide us with the output of the following commands (Make sure

that
you specify ipaddresses of the hosts involved),

# mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe
# mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

Regards,
Jayesh


  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


No this does not work...the behavior is the same.  The UNC's

should/have
worked regardless of whether a user a user is logged in. We have never
relied on drive network drive mappings since they are intermittently an
"interactive" feature.

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


You should try,

mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
<file://hosta/temp/fpi.exe>

 Let us know if it works for you.

(PS: The shared drive is accessible across machines because the drive

is
accessible/mapped by the user logged on to the machines. SMPD runs as a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

Regards,
Jayesh

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


The exe can be directly accessed from hostB by executing
\\hostA\temp\fpi.exe, that is, you could type it directly into a

command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB
\\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

The interesting part is that it gets through the initialization:

      call MPI_INIT( ierr )
      call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
      call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


All execute.

Thanks,
Tim

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


How (what mechanism) does hostB access data (exe) in hostA ?

Regards,
Jayesh

  _____

From: Ayer, Timothy C. [mailto:[email protected]]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


Thanks Jayesh for the quick reply.  This is a network availabe UNC path
why do I need to map a drive?
I am familiar with the machines file - I was just using the command

line
for debugging.

  _____

From: Jayesh Krishna [mailto:[email protected]]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



 Hi,
  If you are running your executable from a shared network drive you

need
to map (see "--map" option of mpiexec in the window's developer's
guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any
other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

(PS: Instead of the "-hosts" option you could try using the

"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

Regards,
Jayesh
-----Original Message-----
From: [email protected] [mailto:owner-

[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Type:
bug
Status: new |
Priority:
major
Component: mpich2 |

-----------------------------------------------------------+------------
-----------------------------------------------------------+----

 I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have installed

it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

 The following tests work fine from hostA, both prompt for a number of
intervals, accept input, and produce and estimate of PI

 mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
<\\hostA\temp\fpi.exe>

 mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
<\\hostA\temp\fpi.exe>



 The following test hangs when submitted from hostA (in MPI_Bcast).  It
does  prompt for input (number of intervals) but once entered it hangs.

I
have launched the smpd process using smpd -d but see no output from
the
smpd after I enter an interval value

 mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
<\\hostA\temp\fpi.exe>


 Any suggestions would be appreciated.   Also let me know if you want

me
to
send debug output.

 Thanks,
 Tim

 _____________________
 Timothy C. Ayer
 High Performance Technical Computing
 United Technologies - Pratt & Whitney
 [email protected]
 (860) 565 - 5268 v
 (860) 565 - 2668 f

  <<fpi.f>>


--
Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36>

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 15:01:56 -0500


Attachment added: part0001.6.html (39.5 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-08-13 15:01:56 -0500


Hi,
I just cross-verified the timestamps of the dlls and they look alright.
Make sure that you have the date/timestamps right on all the hosts
involved.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:53 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

That's a bummer I thought for sure that must be it....oh well. I will
pursue the other two options.

Thanks,
Tim

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 3:50 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

Hi,
I spoke too soon. We have discontinued supporting sshm channel and
that
is the reason that you have an old version of sshm related dlls in your
system32 directory.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:owner-
[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:38 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Jayesh Krishna):

Hi,
Hmmm... This looks like the problem that I mentioned in my email.
-sshm*.dll s should have the same datestamp as other dlls (should not
be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory

(Please
be careful! Make sure that you delete only mpich2_.dll & mpe_.dll) #
Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

 Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you.  I will try your suggestions (icpi.c and slow

response).

Also here is the output you requested.  I have been wondering why the

dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim


C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of c:\windows\system32

04/04/2008  05:46 PM           135,168 mpe.dll
               1 File(s)        135,168 bytes
               0 Dir(s)   4,497,502,208 bytes free

C:\WINDOWS\system32>


C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of C:\WINDOWS\system32


 Directory of C:\WINDOWS\system32

04/04/2008  05:28 PM         1,110,016 mpich2.dll
04/04/2008  05:47 PM           151,552 mpich2mpe.dll
04/04/2008  05:23 PM           159,744 mpich2mpi.dll
04/04/2008  06:31 PM         1,159,168 mpich2mt.dll
04/04/2008  06:42 PM         1,351,680 mpich2mtp.dll
04/04/2008  05:43 PM         1,306,624 mpich2p.dll
04/04/2008  05:55 PM         1,093,632 mpich2shm.dll
04/04/2008  06:03 PM         1,294,336 mpich2shmp.dll
11/23/2005  02:33 AM         1,032,192 mpich2sshm.dll

<<<<<<<<<<<<<<<<
11/23/2005 02:36 AM 1,294,336 mpich2sshmp.dll
<<<<<<<<<<<<<<<<
04/04/2008 06:14 PM 1,122,304 mpich2ssm.dll
04/04/2008 06:22 PM 1,343,488 mpich2ssmp.dll
12 File(s) 12,419,072 bytes
0 Dir(s) 4,497,502,208 bytes free

-----Original Message-----
From: [email protected]

[mailto:[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

  Reporter:  "Ayer, Timothy  C." [email protected]  |

Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

 Hi,
   The logs sent by you show that the communication btw the process
 managers on the hosts is good. The problem looks to be with the
 communication btw the MPI processes.

 # Can you try compiling icpi.c (MPICH2\examples) and run the program

in
your setup (Make sure that the problem is not related to fortran
bindings).
# I have seen that some times that the uninstall/install of MPICH2
does
not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it
does
not hurt to check for it though). To make sure that you have the
right
dlls try listing the MPICH2 dlls in your windows system32 directory
on
both the hosts,

 >>> dir c:\windows\system32\mpich2*.dll
 >>> dir c:\windows\system32\mpe*.dll

   Send us the results for verification (Sanity check- they should

have
the
same datestamp)

 # Also when running fpi.exe using your setup try leaving the job (or

may
be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the
MPI
processes from both hosts.

 (PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Tuesday, August 05, 2008 9:20 AM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 Please find attached the output from the smpd -d procs.  Also, the

output
from the mpiexec just so you can see what I typed.

 H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
 10.30.73.34 v:\fpi.exe
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive
 100



   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 5:10 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 The socket/channel connection between the MPI processes take place

during
MPI_Bcast() (not before that in fpi.f).

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 4:00 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 The firewall has been disabled.

 The inputs were from me entering values for estimating pi...I wanted

to
make sure the program ran through all the logic.

 I will send the other debug output a little later.

 Also,  as an fyi, we have been running MPICH on thousands of PC's for
 years now.  The other strange part is that over a year ago I did
 successfully run MPICH2 on over 30 processors.  My first thought was

the
firewall as well.

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 4:46 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 # Do you have windows firewall (or any firewall) running on these

machines
?
# Why do I see two inputs (10 & 100) in the mpiexec debug output ?
# Can you send us the debug output of smpd along with mpiexec ?
# Can you check the status of the remote smpd from each host ?
--- On host A, run "smpd -status IPAddressOf_hostB"
--- On host B, run "smpd -status IPAddressOf_hostA"

 (PS: I just tried running fpi.exe in a shared drive across two 32-bit
 windows XP machines in our lab but did not get any errors/hang)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 3:11 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 This is the same fpi.f which comes with the installation with the
 exception that I have added print statements.

 The setup is homogenous (both 32-bit).  The output is attached.

 Thanks for your help.

 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 3:48 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 # Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?
# I am assuming that the setup is not heterogeneous (MPICH2 currently
does
not support running jobs across machines with different data models
eg:
You cannot run your MPI job across 32-bit and 64-bit machines)
# Please provide us with the debug/verbose output when running
fpi.exe.
Start smpd on both the machines in debug mode (1. Stop any instances
of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA
IPAddressOf_hostB
y:\fpi.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 2:21 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 Thanks, here is the output (note:  I have not included IP address or
 actual hostnames in this email but did use them in testing)

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

 OUTPUT:
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

 XXXXXX (hostname of hostA)



   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 3:13 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 The command hostname (c:\windows\system32\hostname.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 2:11 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 You have "hostname" at the end of the second line...what is that

referring
to?

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 2:47 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


  What is the error message (output) that you get when you run mpiexec

?
Pls provide us with the output of the following commands (Make sure
that
you specify ipaddresses of the hosts involved),

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe
# mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 1:25 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 No this does not work...the behavior is the same.  The UNC's

should/have
worked regardless of whether a user a user is logged in. We have
never
relied on drive network drive mappings since they are intermittently
an
"interactive" feature.

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 2:02 PM
 To: 'Ayer, Timothy C.'
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 You should try,

 mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
 <file://hosta/temp/fpi.exe>

  Let us know if it works for you.

 (PS: The shared drive is accessible across machines because the drive

is
accessible/mapped by the user logged on to the machines. SMPD runs as
a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 12:50 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 The exe can be directly accessed from hostB by executing
 \\hostA\temp\fpi.exe, that is, you could type it directly into a

command
prompt from hostB if you wanted. Note also that \temp directory is
a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH
(MPICH1).

 Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA

hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

 The interesting part is that it gets through the initialization:

       call MPI_INIT( ierr )
       call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
       call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


 All execute.

 Thanks,
 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 1:33 PM
 To: 'Ayer, Timothy C.'
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 How (what mechanism) does hostB access data (exe) in hostA ?

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]]
 Sent: Monday, August 04, 2008 12:31 PM
 To: Jayesh Krishna
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 Thanks Jayesh for the quick reply.  This is a network availabe UNC

path
-
why do I need to map a drive?

 I am familiar with the machines file - I was just using the command

line
for debugging.

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Monday, August 04, 2008 10:56 AM
 To: [email protected]
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



  Hi,
   If you are running your executable from a shared network drive you

need
to map (see "--map" option of mpiexec in the window's developer's
guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any
other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

 (PS: Instead of the "-hosts" option you could try using the

"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

 Regards,
 Jayesh
 -----Original Message-----
 From: [email protected] [mailto:owner-

[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Type:
bug
Status: new |
Priority:
major
Component: mpich2 |

-----------------------------------------------------------+------------
-----------------------------------------------------------+----

  I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have installed

it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

  The following tests work fine from hostA, both prompt for a number

of
intervals, accept input, and produce and estimate of PI

  mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>

  mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>



  The following test hangs when submitted from hostA (in MPI_Bcast).

It
does prompt for input (number of intervals) but once entered it
hangs.
I
have launched the smpd process using smpd -d but see no output from
the
smpd after I enter an interval value

  mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>


  Any suggestions would be appreciated.   Also let me know if you want

me
to
send debug output.

  Thanks,
  Tim

  _____________________
  Timothy C. Ayer
  High Performance Technical Computing
  United Technologies - Pratt & Whitney
  [email protected]
  (860) 565 - 5268 v
  (860) 565 - 2668 f

   <<fpi.f>>


 --
 Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36>

--
Ticket URL:

https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 15:08:53 -0500


Will do.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Wednesday, August 13, 2008 4:02 PM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
I just cross-verified the timestamps of the dlls and they look alright.
Make sure that you have the date/timestamps right on all the hosts involved.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]
mailto:[email protected] ] On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:53 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

That's a bummer I thought for sure that must be it....oh well. I will
pursue the other two options.

Thanks,
Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 3:50 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+---------------

Comment (by Jayesh Krishna):

Hi,
I spoke too soon. We have discontinued supporting sshm channel and that
is the reason that you have an old version of sshm related dlls in your
system32 directory.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:owner- mailto:owner-
[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:38 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Jayesh Krishna):

Hi,
Hmmm... This looks like the problem that I mentioned in my email.
-sshm*.dll s should have the same datestamp as other dlls (should not be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory

(Please
be careful! Make sure that you delete only mpich2_.dll & mpe_.dll) #
Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

 Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you.  I will try your suggestions (icpi.c and slow

response).

Also here is the output you requested.  I have been wondering why the

dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim


C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of c:\windows\system32

04/04/2008  05:46 PM           135,168 mpe.dll
               1 File(s)        135,168 bytes
               0 Dir(s)   4,497,502,208 bytes free

C:\WINDOWS\system32>


C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of C:\WINDOWS\system32


 Directory of C:\WINDOWS\system32

04/04/2008  05:28 PM         1,110,016 mpich2.dll
04/04/2008  05:47 PM           151,552 mpich2mpe.dll
04/04/2008  05:23 PM           159,744 mpich2mpi.dll
04/04/2008  06:31 PM         1,159,168 mpich2mt.dll
04/04/2008  06:42 PM         1,351,680 mpich2mtp.dll
04/04/2008  05:43 PM         1,306,624 mpich2p.dll
04/04/2008  05:55 PM         1,093,632 mpich2shm.dll
04/04/2008  06:03 PM         1,294,336 mpich2shmp.dll
11/23/2005  02:33 AM         1,032,192 mpich2sshm.dll  <<<<<<<<<<<<<<<<
11/23/2005  02:36 AM         1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008  06:14 PM         1,122,304 mpich2ssm.dll
04/04/2008  06:22 PM         1,343,488 mpich2ssmp.dll
              12 File(s)     12,419,072 bytes
               0 Dir(s)   4,497,502,208 bytes free

-----Original Message-----
From: [email protected]

[mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

  Reporter:  "Ayer, Timothy  C." [email protected]  |

Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

 Hi,
   The logs sent by you show that the communication btw the process
 managers on the hosts is good. The problem looks to be with the
 communication btw the MPI processes.

 # Can you try compiling icpi.c (MPICH2\examples) and run the program

in
your setup (Make sure that the problem is not related to fortran
bindings).
# I have seen that some times that the uninstall/install of MPICH2
does
not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it
does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

 >>> dir c:\windows\system32\mpich2*.dll
 >>> dir c:\windows\system32\mpe*.dll

   Send us the results for verification (Sanity check- they should have

the
same datestamp)

 # Also when running fpi.exe using your setup try leaving the job (or

may
be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the
MPI
processes from both hosts.

 (PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Please find attached the output from the smpd -d procs.  Also, the

output
from the mpiexec just so you can see what I typed.

 H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
 10.30.73.34 v:\fpi.exe
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive
 100



   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The socket/channel connection between the MPI processes take place

during
MPI_Bcast() (not before that in fpi.f).

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The firewall has been disabled.

 The inputs were from me entering values for estimating pi...I wanted

to
make sure the program ran through all the logic.

 I will send the other debug output a little later.

 Also,  as an fyi, we have been running MPICH on thousands of PC's for
 years now.  The other strange part is that over a year ago I did
 successfully run MPICH2 on over 30 processors.  My first thought was

the
firewall as well.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 # Do you have windows firewall (or any firewall) running on these

machines
?
# Why do I see two inputs (10 & 100) in the mpiexec debug output ?
# Can you send us the debug output of smpd along with mpiexec ?
# Can you check the status of the remote smpd from each host ?
--- On host A, run "smpd -status IPAddressOf_hostB"
--- On host B, run "smpd -status IPAddressOf_hostA"

 (PS: I just tried running fpi.exe in a shared drive across two 32-bit
 windows XP machines in our lab but did not get any errors/hang)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 This is the same fpi.f which comes with the installation with the
 exception that I have added print statements.

 The setup is homogenous (both 32-bit).  The output is attached.

 Thanks for your help.

 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 # Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?
# I am assuming that the setup is not heterogeneous (MPICH2 currently
does
not support running jobs across machines with different data models
eg:
You cannot run your MPI job across 32-bit and 64-bit machines)
# Please provide us with the debug/verbose output when running
fpi.exe.
Start smpd on both the machines in debug mode (1. Stop any instances
of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA
IPAddressOf_hostB
y:\fpi.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Thanks, here is the output (note:  I have not included IP address or
 actual hostnames in this email but did use them in testing)

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

 OUTPUT:
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

 XXXXXX (hostname of hostA)



   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The command hostname (c:\windows\system32\hostname.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 You have "hostname" at the end of the second line...what is that

referring
to?

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

  What is the error message (output) that you get when you run mpiexec

?
Pls provide us with the output of the following commands (Make sure
that
you specify ipaddresses of the hosts involved),

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe
# mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 No this does not work...the behavior is the same.  The UNC's

should/have
worked regardless of whether a user a user is logged in. We have
never
relied on drive network drive mappings since they are intermittently
an
"interactive" feature.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 You should try,

 mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
 <file://hosta/temp/fpi.exe <file://hosta/temp/fpi.exe> >

  Let us know if it works for you.

 (PS: The shared drive is accessible across machines because the drive

is
accessible/mapped by the user logged on to the machines. SMPD runs as
a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The exe can be directly accessed from hostB by executing
 \\hostA\temp\fpi.exe, that is, you could type it directly into a

command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

 Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA

hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

 The interesting part is that it gets through the initialization:

       call MPI_INIT( ierr )
       call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
       call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


 All execute.

 Thanks,
 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 How (what mechanism) does hostB access data (exe) in hostA ?

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Thanks Jayesh for the quick reply.  This is a network availabe UNC

path
-
why do I need to map a drive?

 I am familiar with the machines file - I was just using the command

line
for debugging.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

  Hi,
   If you are running your executable from a shared network drive you

need
to map (see "--map" option of mpiexec in the window's developer's
guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any
other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

 (PS: Instead of the "-hosts" option you could try using the

"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

 Regards,
 Jayesh
 -----Original Message-----
 From: [email protected] [mailto:owner- <mailto:owner-> 

[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Type:
bug
Status: new |
Priority:
major
Component: mpich2 |

-----------------------------------------------------------+------------
-----------------------------------------------------------+----

  I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have installed

it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

  The following tests work fine from hostA, both prompt for a number of
 intervals, accept input, and produce and estimate of PI

  mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>

  mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>



  The following test hangs when submitted from hostA (in MPI_Bcast).

It
does prompt for input (number of intervals) but once entered it
hangs.
I
have launched the smpd process using smpd -d but see no output from
the
smpd after I enter an interval value

  mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>


  Any suggestions would be appreciated.   Also let me know if you want

me
to
send debug output.

  Thanks,
  Tim

  _____________________
  Timothy C. Ayer
  High Performance Technical Computing
  United Technologies - Pratt & Whitney
  [email protected]
  (860) 565 - 5268 v
  (860) 565 - 2668 f

   <<fpi.f>>


 --
 Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36

https://trac.mcs.anl.gov/projects/mpich2/ticket/36 >

--
Ticket URL:

https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-08-13 15:08:54 -0500


Attachment added: part0001.7.html (39.7 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-09-11 12:08:42 -0500


Jayesh,

I apologize for the delay. I hope to get back to this soon but other items
have taken higher priority.

Thanks,
Tim


From: Ayer, Timothy C.
Sent: Wednesday, August 13, 2008 4:10 PM
To: Jayesh Krishna; Ayer, Timothy C.
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Will do.


From: Jayesh Krishna [mailto:[email protected]]
Sent: Wednesday, August 13, 2008 4:02 PM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

Hi,
I just cross-verified the timestamps of the dlls and they look alright.
Make sure that you have the date/timestamps right on all the hosts involved.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]
mailto:[email protected] ] On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:53 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

That's a bummer I thought for sure that must be it....oh well. I will
pursue the other two options.

Thanks,
Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 3:50 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+---------------
Reporter: "Ayer, Timothy C." [email protected] | Owner:
jayesh
Type: bug | Status:
assigned
Priority: major | Component:
mpich2
Resolution: | Keywords:

------------------------------------------------------------+---------------

Comment (by Jayesh Krishna):

Hi,
I spoke too soon. We have discontinued supporting sshm channel and that
is the reason that you have an old version of sshm related dlls in your
system32 directory.

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:owner- mailto:owner-
[email protected]]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:38 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Jayesh Krishna):

Hi,
Hmmm... This looks like the problem that I mentioned in my email.
-sshm*.dll s should have the same datestamp as other dlls (should not be
from 2005!).
Please try the following,

Uninstall MPICH2 on the hosts involved in your job.

Manually delete the MPICH2 dlls from windows\system32 directory

(Please
be careful! Make sure that you delete only mpich2_.dll & mpe_.dll) #
Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .

Re-compile cpi.c/fpi.c and try running your job.

 Let us know the results.

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:11 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----

Comment (by Ayer, Timothy C.):

Hi Jayesh,

Great to hear from you.  I will try your suggestions (icpi.c and slow

response).

Also here is the output you requested.  I have been wondering why the

dates on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
...I should have mentioned it sooner.

Thanks,
Tim


C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of c:\windows\system32

04/04/2008  05:46 PM           135,168 mpe.dll
               1 File(s)        135,168 bytes
               0 Dir(s)   4,497,502,208 bytes free

C:\WINDOWS\system32>


C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
 Volume in drive C is System
 Volume Serial Number is D8B5-0657

 Directory of C:\WINDOWS\system32


 Directory of C:\WINDOWS\system32

04/04/2008  05:28 PM         1,110,016 mpich2.dll
04/04/2008  05:47 PM           151,552 mpich2mpe.dll
04/04/2008  05:23 PM           159,744 mpich2mpi.dll
04/04/2008  06:31 PM         1,159,168 mpich2mt.dll
04/04/2008  06:42 PM         1,351,680 mpich2mtp.dll
04/04/2008  05:43 PM         1,306,624 mpich2p.dll
04/04/2008  05:55 PM         1,093,632 mpich2shm.dll
04/04/2008  06:03 PM         1,294,336 mpich2shmp.dll
11/23/2005  02:33 AM         1,032,192 mpich2sshm.dll  <<<<<<<<<<<<<<<<
11/23/2005  02:36 AM         1,294,336 mpich2sshmp.dll <<<<<<<<<<<<<<<<
04/04/2008  06:14 PM         1,122,304 mpich2ssm.dll
04/04/2008  06:22 PM         1,343,488 mpich2ssmp.dll
              12 File(s)     12,419,072 bytes
               0 Dir(s)   4,497,502,208 bytes free

-----Original Message-----
From: [email protected]

[mailto:[email protected]
mailto:[email protected] ]
On Behalf Of mpich2
Sent: Wednesday, August 13, 2008 2:58 PM
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-------------

  Reporter:  "Ayer, Timothy  C." [email protected]  |

Owner:
jayesh
Type: bug |
Status:
assigned
Priority: major |
Component:
mpich2
Resolution: |
Keywords:

------------------------------------------------------------+-------------

Comment (by Jayesh Krishna):

 Hi,
   The logs sent by you show that the communication btw the process
 managers on the hosts is good. The problem looks to be with the
 communication btw the MPI processes.

 # Can you try compiling icpi.c (MPICH2\examples) and run the program

in
your setup (Make sure that the problem is not related to fortran
bindings).
# I have seen that some times that the uninstall/install of MPICH2
does
not result in the dlls being updated correctly (This has lead to some
wierd-difficult-to-debug hangs in our tests. This is not usual but it
does
not hurt to check for it though). To make sure that you have the right
dlls try listing the MPICH2 dlls in your windows system32 directory on
both the hosts,

 >>> dir c:\windows\system32\mpich2*.dll
 >>> dir c:\windows\system32\mpe*.dll

   Send us the results for verification (Sanity check- they should have

the
same datestamp)

 # Also when running fpi.exe using your setup try leaving the job (or

may
be specify a timeout of 10 mins or so) for 10mins or so and see if it
reports any errors. You might want to run netstat (or use "Process
explorer" from microsoft and check the TCP/IP tab in the
process->properties) to see what happens to the connections btw the
MPI
processes from both hosts.

 (PS: The MPICH2 1.1.0a1 release

(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
nloads) is aimed at MPICH2 devs and not for production machines. )

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Tuesday, August 05, 2008 9:20 AM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Please find attached the output from the smpd -d procs.  Also, the

output
from the mpiexec just so you can see what I typed.

 H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
 10.30.73.34 v:\fpi.exe
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive
 100



   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 5:10 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The socket/channel connection between the MPI processes take place

during
MPI_Bcast() (not before that in fpi.f).

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 4:00 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The firewall has been disabled.

 The inputs were from me entering values for estimating pi...I wanted

to
make sure the program ran through all the logic.

 I will send the other debug output a little later.

 Also,  as an fyi, we have been running MPICH on thousands of PC's for
 years now.  The other strange part is that over a year ago I did
 successfully run MPICH2 on over 30 processors.  My first thought was

the
firewall as well.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 4:46 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 # Do you have windows firewall (or any firewall) running on these

machines
?
# Why do I see two inputs (10 & 100) in the mpiexec debug output ?
# Can you send us the debug output of smpd along with mpiexec ?
# Can you check the status of the remote smpd from each host ?
--- On host A, run "smpd -status IPAddressOf_hostB"
--- On host B, run "smpd -status IPAddressOf_hostA"

 (PS: I just tried running fpi.exe in a shared drive across two 32-bit
 windows XP machines in our lab but did not get any errors/hang)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 This is the same fpi.f which comes with the installation with the
 exception that I have added print statements.

 The setup is homogenous (both 32-bit).  The output is attached.

 Thanks for your help.

 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:48 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 # Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you

modified
the program ?)?
# I am assuming that the setup is not heterogeneous (MPICH2 currently
does
not support running jobs across machines with different data models
eg:
You cannot run your MPI job across 32-bit and 64-bit machines)
# Please provide us with the debug/verbose output when running
fpi.exe.
Start smpd on both the machines in debug mode (1. Stop any instances
of
smpd running on the system, smpd -stop 2. Start smpd in debug mode,
smpd
-d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
y:\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA
IPAddressOf_hostB
y:\fpi.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:21 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Thanks, here is the output (note:  I have not included IP address or
 actual hostnames in this email but did use them in testing)

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe

 OUTPUT:
  Process            0  of            2  is alive
 Enter the number of intervals: (0 quits)
  Process            1  of            2  is alive
  Before bcast            1  of            2  is alive
 10
  Before bcast            0  of            2  is alive

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

 XXXXXX (hostname of hostA)



   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 3:13 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The command hostname (c:\windows\system32\hostname.exe)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:11 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 You have "hostname" at the end of the second line...what is that

referring
to?

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:47 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

  What is the error message (output) that you get when you run mpiexec

?
Pls provide us with the output of the following commands (Make sure
that
you specify ipaddresses of the hosts involved),

 # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2

IPAddressOf_hostA
IPAddressOf_hostB y:\fpi.exe
# mpiexec.exe -map y:\IPAddressOf_hostA\temp hostname

 Regards,
 Jayesh


   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 1:25 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 No this does not work...the behavior is the same.  The UNC's

should/have
worked regardless of whether a user a user is logged in. We have
never
relied on drive network drive mappings since they are intermittently
an
"interactive" feature.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 2:02 PM
To: 'Ayer, Timothy C.'
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 You should try,

 mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
 <file://hosta/temp/fpi.exe <file://hosta/temp/fpi.exe> >

  Let us know if it works for you.

 (PS: The shared drive is accessible across machines because the drive

is
accessible/mapped by the user logged on to the machines. SMPD runs as
a
service logged on as "Local System" and does not - should not- have
access
to drives shared by users)

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 12:50 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 The exe can be directly accessed from hostB by executing
 \\hostA\temp\fpi.exe, that is, you could type it directly into a

command
prompt from hostB if you wanted. Note also that \temp directory is a
shared location. I am not sure physically how this is setup on our
network but this has worked with out any "mapping" for MPICH (MPICH1).

 Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA

hostB
\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

 The interesting part is that it gets through the initialization:

       call MPI_INIT( ierr )
       call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
       call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


 All execute.

 Thanks,
 Tim

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 1:33 PM
To: 'Ayer, Timothy C.'
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 How (what mechanism) does hostB access data (exe) in hostA ?

 Regards,
 Jayesh

   _____

 From: Ayer, Timothy C. [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 12:31 PM
To: Jayesh Krishna
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 Thanks Jayesh for the quick reply.  This is a network availabe UNC

path
-
why do I need to map a drive?

 I am familiar with the machines file - I was just using the command

line
for debugging.

   _____

 From: Jayesh Krishna [mailto:[email protected]

mailto:[email protected] ]
Sent: Monday, August 04, 2008 10:56 AM
To: [email protected]
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

  Hi,
   If you are running your executable from a shared network drive you

need
to map (see "--map" option of mpiexec in the window's developer's
guide)
the network drive with mpiexec when launching your job.
Also make sure that you have turned the windows firewall (or any
other
firewalls) off on the machines involved in the job.
Try specifying the ip addresses of the machines instead of the
hostnames.
Let us know the results.

 (PS: Instead of the "-hosts" option you could try using the

"-machinefile"
option available with mpiexec. See the window's developer's guide for
details.)

 Regards,
 Jayesh
 -----Original Message-----
 From: [email protected] [mailto:owner- <mailto:owner-> 

[email protected]]
On Behalf Of mpich2
Sent: Monday, August 04, 2008 9:33 AM
To: undisclosed-recipients:
Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

-----------------------------------------------------------+------------
-----------------------------------------------------------+----
Reporter: "Ayer, Timothy C." [email protected] |
Type:
bug
Status: new |
Priority:
major
Component: mpich2 |

-----------------------------------------------------------+------------
-----------------------------------------------------------+----

  I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have installed

it
on
2
hosts (hostA, hostB) and trying to run the fpi.exe built with
fmpich2.lib.
The code is hanging in a MPI_Bcast call. The fpi.exe source is
attached.

  The following tests work fine from hostA, both prompt for a number of
 intervals, accept input, and produce and estimate of PI

  mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>

  mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>



  The following test hangs when submitted from hostA (in MPI_Bcast).

It
does prompt for input (number of intervals) but once entered it
hangs.
I
have launched the smpd process using smpd -d but see no output from
the
smpd after I enter an interval value

  mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
 <\\hostA\temp\fpi.exe>


  Any suggestions would be appreciated.   Also let me know if you want

me
to
send debug output.

  Thanks,
  Tim

  _____________________
  Timothy C. Ayer
  High Performance Technical Computing
  United Technologies - Pratt & Whitney
  [email protected]
  (860) 565 - 5268 v
  (860) 565 - 2668 f

   <<fpi.f>>


 --
 Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36

https://trac.mcs.anl.gov/projects/mpich2/ticket/36 >

--
Ticket URL:

https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL:
https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

--
Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

Ticket URL: https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-09-11 12:08:42 -0500


Attachment added: part0001.8.html (40.9 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-10-23 16:24:38 -0500


Attachment added: part0001.9.html (47.2 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Jayesh Krishna on 2008-10-23 16:24:38 -0500



 Hi,
  Did you get a chance to look at the setup ?

Regards,
Jayesh

-----Original Message-----
From: [email protected] [mailto:[email protected]]
On Behalf Of mpich2
Sent: Thursday, September 11, 2008 12:09 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
  Reporter:  "Ayer, Timothy  C." <[email protected]>  |       Owner:
jayesh
      Type:  bug                                            |      Status:
assigned
  Priority:  major                                          |   Component:
mpich2
Resolution:                                                 |    Keywords:

------------------------------------------------------------+-----------
------------------------------------------------------------+----


Comment (by Ayer, Timothy  C.):

 Jayesh,

 I apologize for the delay.  I hope to get back to this soon but other
items  have taken higher priority.

 Thanks,
 Tim

   _____

 From: Ayer, Timothy C.
 Sent: Wednesday, August 13, 2008 4:10 PM
 To: Jayesh Krishna; Ayer, Timothy C.
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 Will do.

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Wednesday, August 13, 2008 4:02 PM
 To: [email protected]
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



  Hi,
   I just cross-verified the timestamps of the dlls and they look alright.
 Make sure that you have the date/timestamps right on all the hosts
involved.

 Regards,
 Jayesh

 -----Original Message-----
 From: [email protected] [mailto:[email protected]
 <mailto:[email protected]> ] On Behalf Of mpich2
 Sent: Wednesday, August 13, 2008 2:53 PM
 To: undisclosed-recipients:
 Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 ------------------------------------------------------------+-----------
 ------------------------------------------------------------+----
   Reporter:  "Ayer, Timothy  C." <[email protected]>  |
Owner:
 jayesh
       Type:  bug                                            |
Status:
 assigned
   Priority:  major                                          |
Component:
 mpich2
 Resolution:                                                 |
Keywords:

 ------------------------------------------------------------+-----------
 ------------------------------------------------------------+----


 Comment (by Ayer, Timothy  C.):

  That's a bummer I thought for sure that must be it....oh well.  I will
pursue the other two options.

  Thanks,
  Tim


  -----Original Message-----
  From: [email protected]
[mailto:[email protected]
 <mailto:[email protected]> ]
  On Behalf Of mpich2
  Sent: Wednesday, August 13, 2008 3:50 PM
  Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



------------------------------------------------------------+-------------
--
    Reporter:  "Ayer, Timothy  C." <[email protected]>  |
 Owner:
  jayesh
        Type:  bug                                            |
 Status:
  assigned
    Priority:  major                                          |
 Component:
  mpich2
  Resolution:                                                 |
 Keywords:



------------------------------------------------------------+-------------
--


  Comment (by Jayesh Krishna):

   Hi,
     I spoke too soon. We have discontinued supporting sshm channel and
that
   is the reason that you have an old version of sshm related dlls in your
   system32 directory.

   Regards,
   Jayesh

   -----Original Message-----
   From: [email protected] [mailto:owner- <mailto:owner->
[email protected]]
   On Behalf Of mpich2
   Sent: Wednesday, August 13, 2008 2:38 PM
   To: undisclosed-recipients:
   Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


------------------------------------------------------------+-----------
   ------------------------------------------------------------+----
     Reporter:  "Ayer, Timothy  C." <[email protected]>  |
  Owner:
   jayesh
         Type:  bug                                            |
  Status:
   assigned
     Priority:  major                                          |
  Component:
   mpich2
   Resolution:                                                 |
  Keywords:


------------------------------------------------------------+-----------
   ------------------------------------------------------------+----


   Comment (by Jayesh Krishna):

    Hi,
      Hmmm... This looks like the problem that I mentioned in my email.
    -sshm*.dll s should have the same datestamp as other dlls (should not
be
   from 2005!).
      Please try the following,

    # Uninstall MPICH2 on the hosts involved in your job.
    # Manually delete the MPICH2 dlls from windows\system32 directory
(Please
   be careful! Make sure that you delete only mpich2*.dll & mpe*.dll)  #
   Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .
    # Re-compile cpi.c/fpi.c and try running your job.

      Let us know the results.

    Regards,
    Jayesh

    -----Original Message-----
    From: [email protected]
   [mailto:[email protected]
 <mailto:[email protected]> ]
    On Behalf Of mpich2
    Sent: Wednesday, August 13, 2008 2:11 PM
    To: undisclosed-recipients:
    Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

 ------------------------------------------------------------+-----------
    ------------------------------------------------------------+----
      Reporter:  "Ayer, Timothy  C." <[email protected]>  |
   Owner:
    jayesh
          Type:  bug                                            |
   Status:
    assigned
      Priority:  major                                          |
   Component:
    mpich2
    Resolution:                                                 |
   Keywords:

 ------------------------------------------------------------+-----------
    ------------------------------------------------------------+----


    Comment (by Ayer, Timothy  C.):

     Hi Jayesh,

     Great to hear from you.  I will try your suggestions (icpi.c and slow
   response).

     Also here is the output you requested.  I have been wondering why the
   dates  on mpich2sshm.dll and mpich2sshmp.dll seem so old (from 2005)???
    ...I  should  have mentioned it sooner.

     Thanks,
     Tim


     C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
      Volume in drive C is System
      Volume Serial Number is D8B5-0657

      Directory of c:\windows\system32

     04/04/2008  05:46 PM           135,168 mpe.dll
                    1 File(s)        135,168 bytes
                    0 Dir(s)   4,497,502,208 bytes free

     C:\WINDOWS\system32>


     C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
      Volume in drive C is System
      Volume Serial Number is D8B5-0657

      Directory of C:\WINDOWS\system32


      Directory of C:\WINDOWS\system32

     04/04/2008  05:28 PM         1,110,016 mpich2.dll
     04/04/2008  05:47 PM           151,552 mpich2mpe.dll
     04/04/2008  05:23 PM           159,744 mpich2mpi.dll
     04/04/2008  06:31 PM         1,159,168 mpich2mt.dll
     04/04/2008  06:42 PM         1,351,680 mpich2mtp.dll
     04/04/2008  05:43 PM         1,306,624 mpich2p.dll
     04/04/2008  05:55 PM         1,093,632 mpich2shm.dll
     04/04/2008  06:03 PM         1,294,336 mpich2shmp.dll
     11/23/2005  02:33 AM         1,032,192 mpich2sshm.dll
 <<<<<<<<<<<<<<<<
     11/23/2005  02:36 AM         1,294,336 mpich2sshmp.dll
 <<<<<<<<<<<<<<<<
     04/04/2008  06:14 PM         1,122,304 mpich2ssm.dll
     04/04/2008  06:22 PM         1,343,488 mpich2ssmp.dll
                   12 File(s)     12,419,072 bytes
                    0 Dir(s)   4,497,502,208 bytes free

     -----Original Message-----
     From: [email protected]
    [mailto:[email protected]
 <mailto:[email protected]> ]
     On Behalf Of mpich2
     Sent: Wednesday, August 13, 2008 2:58 PM
     Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP




------------------------------------------------------------+-------------
    --
       Reporter:  "Ayer, Timothy  C." <[email protected]>  |
    Owner:
     jayesh
           Type:  bug                                            |
    Status:
     assigned
       Priority:  major                                          |
    Component:
     mpich2
     Resolution:                                                 |
    Keywords:




------------------------------------------------------------+-------------
    --


     Comment (by Jayesh Krishna):

      Hi,
        The logs sent by you show that the communication btw the process
      managers on the hosts is good. The problem looks to be with the
      communication btw the MPI processes.

      # Can you try compiling icpi.c (MPICH2\examples) and run the program
in
      your setup (Make sure that the problem is not related to fortran
      bindings).
      # I have seen that some times that the uninstall/install of MPICH2
does
      not result in the dlls being updated correctly (This has lead to
some
      wierd-difficult-to-debug hangs in our tests. This is not usual but
it
   does
      not hurt to check for it though). To make sure that you have the
right
      dlls try listing the MPICH2 dlls in your windows system32 directory
on
      both the hosts,

      >>> dir c:\windows\system32\mpich2*.dll
      >>> dir c:\windows\system32\mpe*.dll

        Send us the results for verification (Sanity check- they should
have
   the
      same datestamp)

      # Also when running fpi.exe using your setup try leaving the job (or
   may
      be specify a timeout of 10 mins or so) for 10mins or so and see if
it
      reports any errors. You might want to run netstat (or use "Process
      explorer" from microsoft and check the TCP/IP tab in the
      process->properties) to see what happens to the connections btw the
MPI
      processes from both hosts.

      (PS: The MPICH2 1.1.0a1 release



(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow

<http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=dow
>
      nloads) is aimed at MPICH2 devs and not for production machines. )

      Regards,
      Jayesh


        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Tuesday, August 05, 2008 9:20 AM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Please find attached the output from the smpd -d procs.  Also, the
   output
      from the mpiexec just so you can see what I typed.

      H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
      10.30.73.34 v:\fpi.exe
       Process            0  of            2  is alive
      Enter the number of intervals: (0 quits)
       Process            1  of            2  is alive
       Before bcast            1  of            2  is alive
      10
       Before bcast            0  of            2  is alive
      100



        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 5:10 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The socket/channel connection between the MPI processes take place
   during
      MPI_Bcast() (not before that in fpi.f).

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 4:00 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The firewall has been disabled.

      The inputs were from me entering values for estimating pi...I wanted
to
      make sure the program ran through all the logic.

      I will send the other debug output a little later.

      Also,  as an fyi, we have been running MPICH on thousands of PC's
for
      years now.  The other strange part is that over a year ago I did
      successfully run MPICH2 on over 30 processors.  My first thought was
   the
      firewall as well.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 4:46 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      # Do you have windows firewall (or any firewall) running on these
   machines
      ?
      # Why do I see two inputs (10 & 100) in the mpiexec debug output ?
      # Can you send us the debug output of smpd along with mpiexec ?
      # Can you check the status of the remote smpd from each host ?
         --- On host A, run      "smpd -status IPAddressOf_hostB"
         --- On host B, run      "smpd -status IPAddressOf_hostA"

      (PS: I just tried running fpi.exe in a shared drive across two
32-bit
      windows XP machines in our lab but did not get any errors/hang)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:11 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      This is the same fpi.f which comes with the installation with the
      exception that I have added print statements.

      The setup is homogenous (both 32-bit).  The output is attached.

      Thanks for your help.

      Tim

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:48 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      # Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you
   modified
      the program ?)?
      # I am assuming that the setup is not heterogeneous (MPICH2
currently
   does
      not support running jobs across machines with different data models
  eg:
      You cannot run your MPI job across 32-bit and 64-bit machines)
      # Please provide us with the debug/verbose output when running
fpi.exe.
      Start smpd on both the machines in debug mode (1. Stop any instances
of
      smpd running on the system, smpd -stop   2. Start smpd in debug
mode,
     smpd
      -d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
      y:\\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA
IPAddressOf_hostB
      y:\fpi.exe)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:21 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Thanks, here is the output (note:  I have not included IP address or
      actual hostnames in this email but did use them in testing)

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2
   IPAddressOf_hostA
      IPAddressOf_hostB y:\fpi.exe

      OUTPUT:
       Process            0  of            2  is alive
      Enter the number of intervals: (0 quits)
       Process            1  of            2  is alive
       Before bcast            1  of            2  is alive
      10
       Before bcast            0  of            2  is alive

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

      XXXXXX (hostname of hostA)



        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:13 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The command hostname (c:\windows\system32\hostname.exe)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:11 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      You have "hostname" at the end of the second line...what is that
   referring
      to?

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:47 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


       What is the error message (output) that you get when you run
mpiexec  ?
       Pls provide us with the output of the following commands (Make sure
   that
      you specify ipaddresses of the hosts involved),

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2
   IPAddressOf_hostA
      IPAddressOf_hostB y:\fpi.exe
      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

      Regards,
      Jayesh


        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 1:25 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      No this does not work...the behavior is the same.  The UNC's
   should/have
      worked regardless of whether a user a user is logged in.  We have
never
      relied on drive network drive mappings since they are intermittently
an
      "interactive" feature.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:02 PM
      To: 'Ayer, Timothy C.'
      Cc: [email protected]
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      You should try,

      mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
      <file://hosta/temp/fpi.exe <file://hosta/temp/fpi.exe> >

       Let us know if it works for you.

      (PS: The shared drive is accessible across machines because the
drive
   is
      accessible/mapped by the user logged on to the machines. SMPD runs
as  a
      service logged on as "Local System" and does not - should not- have
   access
      to drives shared by users)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 12:50 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The exe can be directly accessed from hostB by executing
      \\hostA\temp\fpi.exe, that is, you could type it directly into a
   command
      prompt from hostB if you wanted.  Note also that  \temp directory is
a
      shared location.  I am not sure physically how this is setup on our
      network but this has worked with out any "mapping" for MPICH
(MPICH1).

      Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA
hostB
      \\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

      The interesting part is that it gets through the initialization:

            call MPI_INIT( ierr )
            call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
            call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


      All execute.

      Thanks,
      Tim

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 1:33 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      How (what mechanism) does hostB access data (exe) in hostA ?

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 12:31 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Thanks Jayesh for the quick reply.  This is a network availabe UNC
path
     -
      why do I need to map a drive?

      I am familiar with the machines file - I was just using the command
   line
      for debugging.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 10:56 AM
      To: [email protected]
      Cc: [email protected]
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP



       Hi,
        If you are running your executable from a shared network drive you
   need
      to map (see "--map" option of mpiexec in the window's developer's
   guide)
      the network drive with mpiexec when launching your job.
        Also make sure that you have turned the windows firewall (or any
   other
      firewalls) off on the machines involved in the job.
        Try specifying the ip addresses of the machines instead of the
      hostnames.
        Let us know the results.

      (PS: Instead of the "-hosts" option you could try using the
   "-machinefile"
      option available with mpiexec. See the window's developer's guide
for
      details.)

      Regards,
      Jayesh
      -----Original Message-----
      From: [email protected] [mailto:owner- <mailto:owner->
   [email protected]]
      On Behalf Of mpich2
      Sent: Monday, August 04, 2008 9:33 AM
      To: undisclosed-recipients:
      Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



-----------------------------------------------------------+------------
      -----------------------------------------------------------+----
       Reporter:  "Ayer, Timothy  C." <[email protected]>  |
   Type:
      bug
         Status:  new                                            |
   Priority:
      major
      Component:  mpich2                                         |


-----------------------------------------------------------+------------
      -----------------------------------------------------------+----


       I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have
installed
   it
      on
       2
       hosts (hostA, hostB) and trying to run the fpi.exe built with
      fmpich2.lib.
       The code is hanging in a MPI_Bcast call.  The fpi.exe source is
   attached.

       The following tests work fine from hostA, both prompt for a number
of
      intervals, accept input, and produce and estimate of PI

       mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>

       mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>



       The following test hangs when submitted from hostA (in MPI_Bcast).
  It
      does  prompt for input (number of intervals) but once entered it
hangs.
     I
      have  launched the smpd process using smpd -d but see no output from
   the
      smpd  after I enter an interval value

       mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>


       Any suggestions would be appreciated.   Also let me know if you
want
   me
      to
       send debug output.

       Thanks,
       Tim

       _____________________
       Timothy C. Ayer
       High Performance Technical Computing
       United Technologies - Pratt & Whitney
       [email protected]
       (860) 565 - 5268 v
       (860) 565 - 2668 f

        <<fpi.f>>


      --
      Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36
 <https://trac.mcs.anl.gov/projects/mpich2/ticket/36> >

     --
     Ticket URL:
   <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

    --
    Ticket URL:
  <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

   --
   Ticket URL:
 <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

  --
  Ticket URL:
<https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

 --
 Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

--
Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-10-27 11:49:11 -0500



Hi Jayesh,

Thanks for checking.  Unfortunately I have not.  If you folks prefer to
close the ticket we can reopen once I get more information.

Tim


________________________________

From: Jayesh Krishna [mailto:[email protected]]
Sent: Thursday, October 23, 2008 5:24 PM
To: Ayer, Timothy C.
Cc: [email protected]
Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



 Hi,
  Did you get a chance to look at the setup ?

Regards,
Jayesh

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of mpich2
Sent: Thursday, September 11, 2008 12:09 PM
To: undisclosed-recipients:
Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP

------------------------------------------------------------+-----------
------------------------------------------------------------+----
  Reporter:  "Ayer, Timothy  C." <[email protected]>  |
Owner:  jayesh
      Type:  bug                                            |
Status:  assigned
  Priority:  major                                          |
Component:  mpich2
Resolution:                                                 |
Keywords:
------------------------------------------------------------+-----------
------------------------------------------------------------+----


Comment (by Ayer, Timothy  C.):

 Jayesh,

 I apologize for the delay.  I hope to get back to this soon but other
items  have taken higher priority.

 Thanks,
 Tim

   _____

 From: Ayer, Timothy C.
 Sent: Wednesday, August 13, 2008 4:10 PM
 To: Jayesh Krishna; Ayer, Timothy C.
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


 Will do.

   _____

 From: Jayesh Krishna [mailto:[email protected]]
 Sent: Wednesday, August 13, 2008 4:02 PM
 To: [email protected]
 Cc: [email protected]
 Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



  Hi,
   I just cross-verified the timestamps of the dlls and they look
alright.
 Make sure that you have the date/timestamps right on all the hosts
involved.

 Regards,
 Jayesh

 -----Original Message-----
 From: [email protected]
[mailto:[email protected]
 <mailto:[email protected]> ] On Behalf Of mpich2
 Sent: Wednesday, August 13, 2008 2:53 PM
 To: undisclosed-recipients:
 Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


------------------------------------------------------------+-----------
 ------------------------------------------------------------+----
   Reporter:  "Ayer, Timothy  C." <[email protected]>  |
Owner:
 jayesh
       Type:  bug                                            |
Status:
 assigned
   Priority:  major                                          |
Component:
 mpich2
 Resolution:                                                 |
Keywords:


------------------------------------------------------------+-----------
 ------------------------------------------------------------+----


 Comment (by Ayer, Timothy  C.):

  That's a bummer I thought for sure that must be it....oh well.  I will
pursue the other two options.

  Thanks,
  Tim


  -----Original Message-----
  From: [email protected]
[mailto:[email protected]
 <mailto:[email protected]> ]
  On Behalf Of mpich2
  Sent: Wednesday, August 13, 2008 3:50 PM
  Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



------------------------------------------------------------+-----------
----
    Reporter:  "Ayer, Timothy  C." <[email protected]>  |
 Owner:
  jayesh
        Type:  bug                                            |
 Status:
  assigned
    Priority:  major                                          |
 Component:
  mpich2
  Resolution:                                                 |
 Keywords:



------------------------------------------------------------+-----------
----


  Comment (by Jayesh Krishna):

   Hi,
     I spoke too soon. We have discontinued supporting sshm channel and
that
   is the reason that you have an old version of sshm related dlls in
your
   system32 directory.

   Regards,
   Jayesh

   -----Original Message-----
   From: [email protected] [mailto:owner- <mailto:owner->
[email protected]]
   On Behalf Of mpich2
   Sent: Wednesday, August 13, 2008 2:38 PM
   To: undisclosed-recipients:
   Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP


------------------------------------------------------------+-----------
   ------------------------------------------------------------+----
     Reporter:  "Ayer, Timothy  C." <[email protected]>  |
  Owner:
   jayesh
         Type:  bug                                            |
  Status:
   assigned
     Priority:  major                                          |
  Component:
   mpich2
   Resolution:                                                 |
  Keywords:


------------------------------------------------------------+-----------
   ------------------------------------------------------------+----


   Comment (by Jayesh Krishna):

    Hi,
      Hmmm... This looks like the problem that I mentioned in my email.
    -sshm*.dll s should have the same datestamp as other dlls (should
not  be
   from 2005!).
      Please try the following,

    # Uninstall MPICH2 on the hosts involved in your job.
    # Manually delete the MPICH2 dlls from windows\system32 directory
(Please
   be careful! Make sure that you delete only mpich2*.dll & mpe*.dll)  #
   Re-install MPICH2 1.0.7 (stable version) on the hosts/nodes .
    # Re-compile cpi.c/fpi.c and try running your job.

      Let us know the results.

    Regards,
    Jayesh

    -----Original Message-----
    From: [email protected]
   [mailto:[email protected]
 <mailto:[email protected]> ]
    On Behalf Of mpich2
    Sent: Wednesday, August 13, 2008 2:11 PM
    To: undisclosed-recipients:
    Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


------------------------------------------------------------+-----------
    ------------------------------------------------------------+----
      Reporter:  "Ayer, Timothy  C." <[email protected]>  |
   Owner:
    jayesh
          Type:  bug                                            |
   Status:
    assigned
      Priority:  major                                          |
   Component:
    mpich2
    Resolution:                                                 |
   Keywords:


------------------------------------------------------------+-----------
    ------------------------------------------------------------+----


    Comment (by Ayer, Timothy  C.):

     Hi Jayesh,

     Great to hear from you.  I will try your suggestions (icpi.c and
slow
   response).

     Also here is the output you requested.  I have been wondering why
the
   dates  on mpich2sshm.dll and mpich2sshmp.dll seem so old (from
2005)???
    ...I  should  have mentioned it sooner.

     Thanks,
     Tim


     C:\WINDOWS\system32>dir c:\windows\system32\mpe*.dll
      Volume in drive C is System
      Volume Serial Number is D8B5-0657

      Directory of c:\windows\system32

     04/04/2008  05:46 PM           135,168 mpe.dll
                    1 File(s)        135,168 bytes
                    0 Dir(s)   4,497,502,208 bytes free

     C:\WINDOWS\system32>


     C:\WINDOWS\system32>dir dir c:\windows\system32\mpich2*.dll
      Volume in drive C is System
      Volume Serial Number is D8B5-0657

      Directory of C:\WINDOWS\system32


      Directory of C:\WINDOWS\system32

     04/04/2008  05:28 PM         1,110,016 mpich2.dll
     04/04/2008  05:47 PM           151,552 mpich2mpe.dll
     04/04/2008  05:23 PM           159,744 mpich2mpi.dll
     04/04/2008  06:31 PM         1,159,168 mpich2mt.dll
     04/04/2008  06:42 PM         1,351,680 mpich2mtp.dll
     04/04/2008  05:43 PM         1,306,624 mpich2p.dll
     04/04/2008  05:55 PM         1,093,632 mpich2shm.dll
     04/04/2008  06:03 PM         1,294,336 mpich2shmp.dll
     11/23/2005  02:33 AM         1,032,192 mpich2sshm.dll
 <<<<<<<<<<<<<<<<
     11/23/2005  02:36 AM         1,294,336 mpich2sshmp.dll
 <<<<<<<<<<<<<<<<
     04/04/2008  06:14 PM         1,122,304 mpich2ssm.dll
     04/04/2008  06:22 PM         1,343,488 mpich2ssmp.dll
                   12 File(s)     12,419,072 bytes
                    0 Dir(s)   4,497,502,208 bytes free

     -----Original Message-----
     From: [email protected]
    [mailto:[email protected]
 <mailto:[email protected]> ]
     On Behalf Of mpich2
     Sent: Wednesday, August 13, 2008 2:58 PM
     Subject: Re: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP




------------------------------------------------------------+-----------
--
    --
       Reporter:  "Ayer, Timothy  C." <[email protected]>  |
    Owner:
     jayesh
           Type:  bug                                            |
    Status:
     assigned
       Priority:  major                                          |
    Component:
     mpich2
     Resolution:                                                 |
    Keywords:




------------------------------------------------------------+-----------
--
    --


     Comment (by Jayesh Krishna):

      Hi,
        The logs sent by you show that the communication btw the process
      managers on the hosts is good. The problem looks to be with the
      communication btw the MPI processes.

      # Can you try compiling icpi.c (MPICH2\examples) and run the
program  in
      your setup (Make sure that the problem is not related to fortran
      bindings).
      # I have seen that some times that the uninstall/install of MPICH2
does
      not result in the dlls being updated correctly (This has lead to
some
      wierd-difficult-to-debug hangs in our tests. This is not usual but
it
   does
      not hurt to check for it though). To make sure that you have the
right
      dlls try listing the MPICH2 dlls in your windows system32
directory  on
      both the hosts,

      >>> dir c:\windows\system32\mpich2*.dll
      >>> dir c:\windows\system32\mpe*.dll

        Send us the results for verification (Sanity check- they should
have
   the
      same datestamp)

      # Also when running fpi.exe using your setup try leaving the job
(or
   may
      be specify a timeout of 10 mins or so) for 10mins or so and see if
it
      reports any errors. You might want to run netstat (or use "Process
      explorer" from microsoft and check the TCP/IP tab in the
      process->properties) to see what happens to the connections btw
the  MPI
      processes from both hosts.

      (PS: The MPICH2 1.1.0a1 release



(http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=d
ow

<http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=d
ow>
      nloads) is aimed at MPICH2 devs and not for production machines. )

      Regards,
      Jayesh


        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Tuesday, August 05, 2008 9:20 AM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Please find attached the output from the smpd -d procs.  Also, the
   output
      from the mpiexec just so you can see what I typed.

      H:\>mpiexec.exe -map v:\\10.30.73.170\temp -hosts 2 10.30.73.170
      10.30.73.34 v:\fpi.exe
       Process            0  of            2  is alive
      Enter the number of intervals: (0 quits)
       Process            1  of            2  is alive
       Before bcast            1  of            2  is alive
      10
       Before bcast            0  of            2  is alive
      100



        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 5:10 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The socket/channel connection between the MPI processes take place
   during
      MPI_Bcast() (not before that in fpi.f).

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 4:00 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The firewall has been disabled.

      The inputs were from me entering values for estimating pi...I
wanted  to
      make sure the program ran through all the logic.

      I will send the other debug output a little later.

      Also,  as an fyi, we have been running MPICH on thousands of PC's
for
      years now.  The other strange part is that over a year ago I did
      successfully run MPICH2 on over 30 processors.  My first thought
was
   the
      firewall as well.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 4:46 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      # Do you have windows firewall (or any firewall) running on these
   machines
      ?
      # Why do I see two inputs (10 & 100) in the mpiexec debug output ?
      # Can you send us the debug output of smpd along with mpiexec ?
      # Can you check the status of the remote smpd from each host ?
         --- On host A, run      "smpd -status IPAddressOf_hostB"
         --- On host B, run      "smpd -status IPAddressOf_hostA"

      (PS: I just tried running fpi.exe in a shared drive across two
32-bit
      windows XP machines in our lab but did not get any errors/hang)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:11 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      This is the same fpi.f which comes with the installation with the
      exception that I have added print statements.

      The setup is homogenous (both 32-bit).  The output is attached.

      Thanks for your help.

      Tim

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:48 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      # Are you running fpi.exe (fpi.f) provided with MPICH2 (Have you
   modified
      the program ?)?
      # I am assuming that the setup is not heterogeneous (MPICH2
currently
   does
      not support running jobs across machines with different data
models
  eg:
      You cannot run your MPI job across 32-bit and 64-bit machines)
      # Please provide us with the debug/verbose output when running
fpi.exe.
      Start smpd on both the machines in debug mode (1. Stop any
instances  of
      smpd running on the system, smpd -stop   2. Start smpd in debug
mode,
     smpd
      -d) and run mpiexec in verbose mode (mpiexec.exe -verbose -map
      y:\\IPAddressOf_hostA\temp -hosts 2 IPAddressOf_hostA
IPAddressOf_hostB
      y:\fpi.exe)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:21 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Thanks, here is the output (note:  I have not included IP address
or
      actual hostnames in this email but did use them in testing)

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2
   IPAddressOf_hostA
      IPAddressOf_hostB y:\fpi.exe

      OUTPUT:
       Process            0  of            2  is alive
      Enter the number of intervals: (0 quits)
       Process            1  of            2  is alive
       Before bcast            1  of            2  is alive
      10
       Before bcast            0  of            2  is alive

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

      XXXXXX (hostname of hostA)



        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 3:13 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The command hostname (c:\windows\system32\hostname.exe)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:11 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      You have "hostname" at the end of the second line...what is that
   referring
      to?

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:47 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


       What is the error message (output) that you get when you run
mpiexec  ?
       Pls provide us with the output of the following commands (Make
sure
   that
      you specify ipaddresses of the hosts involved),

      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp -hosts 2
   IPAddressOf_hostA
      IPAddressOf_hostB y:\fpi.exe
      # mpiexec.exe -map y:\\IPAddressOf_hostA\temp hostname

      Regards,
      Jayesh


        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 1:25 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      No this does not work...the behavior is the same.  The UNC's
   should/have
      worked regardless of whether a user a user is logged in.  We have
never
      relied on drive network drive mappings since they are
intermittently  an
      "interactive" feature.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 2:02 PM
      To: 'Ayer, Timothy C.'
      Cc: [email protected]
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      You should try,

      mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA hostB y:\fpi.exe
      <file://hosta/temp/fpi.exe > >

       Let us know if it works for you.

      (PS: The shared drive is accessible across machines because the
drive
   is
      accessible/mapped by the user logged on to the machines. SMPD runs
as  a
      service logged on as "Local System" and does not - should not-
have
   access
      to drives shared by users)

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [ <file://hosta/temp/fpi.exe
<file://hosta/temp/fpi.exe> mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 12:50 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      The exe can be directly accessed from hostB by executing
      \\hostA\temp\fpi.exe, that is, you could type it directly into a
   command
      prompt from hostB if you wanted.  Note also that  \temp directory
is  a
      shared location.  I am not sure physically how this is setup on
our
      network but this has worked with out any "mapping" for MPICH
(MPICH1).

      Note:  I did try:  mpiexec.exe -map y:\\hostA\temp -hosts 2 hostA
hostB
      \\hostA\temp\fpi.exe but that still hangs in the MPI_Bcast call.

      The interesting part is that it gets through the initialization:

            call MPI_INIT( ierr )
            call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
            call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )


      All execute.

      Thanks,
      Tim

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 1:33 PM
      To: 'Ayer, Timothy C.'
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      How (what mechanism) does hostB access data (exe) in hostA ?

      Regards,
      Jayesh

        _____

      From: Ayer, Timothy C. [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 12:31 PM
      To: Jayesh Krishna
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP


      Thanks Jayesh for the quick reply.  This is a network availabe UNC
path
     -
      why do I need to map a drive?

      I am familiar with the machines file - I was just using the
command
   line
      for debugging.

        _____

      From: Jayesh Krishna [mailto:[email protected]
<mailto:[email protected]> ]
      Sent: Monday, August 04, 2008 10:56 AM
      To: [email protected]
      Cc: [email protected]
      Subject: RE: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows
XP



       Hi,
        If you are running your executable from a shared network drive
you
   need
      to map (see "--map" option of mpiexec in the window's developer's
   guide)
      the network drive with mpiexec when launching your job.
        Also make sure that you have turned the windows firewall (or any
   other
      firewalls) off on the machines involved in the job.
        Try specifying the ip addresses of the machines instead of the
      hostnames.
        Let us know the results.

      (PS: Instead of the "-hosts" option you could try using the
   "-machinefile"
      option available with mpiexec. See the window's developer's guide
for
      details.)

      Regards,
      Jayesh
      -----Original Message-----
      From: [email protected] [mailto:owner- <mailto:owner->
   [email protected]]
      On Behalf Of mpich2
      Sent: Monday, August 04, 2008 9:33 AM
      To: undisclosed-recipients:
      Subject: [mpich2-maint] #36: MPICH2 fpi.exe hanging on Windows XP



-----------------------------------------------------------+------------
      -----------------------------------------------------------+----
       Reporter:  "Ayer, Timothy  C." <[email protected]>  |
   Type:
      bug
         Status:  new                                            |
   Priority:
      major
      Component:  mpich2                                         |


-----------------------------------------------------------+------------
      -----------------------------------------------------------+----


       I am testing MPICH2 MPICH2-1.0.7 Windows XP (sp2).  I have
installed
   it
      on
       2
       hosts (hostA, hostB) and trying to run the fpi.exe built with
      fmpich2.lib.
       The code is hanging in a MPI_Bcast call.  The fpi.exe source is
   attached.

       The following tests work fine from hostA, both prompt for a
number  of
      intervals, accept input, and produce and estimate of PI

       mpiexec.exe -hosts 2 hostA hostA \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>

       mpiexec.exe -hosts 2 hostB hostB \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>



       The following test hangs when submitted from hostA (in
MPI_Bcast).
  It
      does  prompt for input (number of intervals) but once entered it
hangs.
     I
      have  launched the smpd process using smpd -d but see no output
from
   the
      smpd  after I enter an interval value

       mpiexec.exe -hosts 2 hostA hostB \\hostA\temp\fpi.exe
      <\\hostA\temp\fpi.exe>


       Any suggestions would be appreciated.   Also let me know if you
want
   me
      to
       send debug output.

       Thanks,
       Tim

       _____________________
       Timothy C. Ayer
       High Performance Technical Computing
       United Technologies - Pratt & Whitney
       [email protected]
       (860) 565 - 5268 v
       (860) 565 - 2668 f

        <<fpi.f>>


      --
      Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/36
 <https://trac.mcs.anl.gov/projects/mpich2/ticket/36> >

     --
     Ticket URL:
   <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

    --
    Ticket URL:
  <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

   --
   Ticket URL:
 <https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

  --
  Ticket URL:
<https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

 --
 Ticket URL:
<https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>

--
Ticket URL:
<https://trac.mcs.anl.gov/projects/mpich2/ticket/36#comment:>



from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by Ayer, Timothy C. on 2008-10-27 11:49:11 -0500


Attachment added: part0001.10.html (48.2 KiB)
Added by email2trac

from mpich.

mpichbot avatar mpichbot commented on May 19, 2024

Originally by jayesh on 2008-10-27 12:08:43 -0500


Closing the ticket for now - reopen when user provides more information

-Jayesh

from mpich.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.