[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Problems running Fortran program in Vanilla Universe on Windows machine



Hi,

I am a novel user of condor who is trying to get a Fortran program running. I’m running Personal Condor and my own computer is the only one in the pool. Below I have put the submission file, plus some typical repeating parts from the different log files. The program runs perfect if I just run it directly. I have even removed all output to the screen to see if it would help. Can anybody help me?

 

Thank you!

Steen Chirstesen, CAPEC, Institute of Chemical Engineering, DTU, Denmark

 

Submission file:

#

# Run fortran analysis program using condor

#

universe = vanilla

executable = COM2RDF_v1_5Condor.exe

#nice_user = True

#input files

transfer_input_files = General.inf,md3.xst,B-MA_1-7_1K_3.com,B-MA_1-7_1K.typ

#output files

transfer_output_files = rdf_B-MA_1-7_1K_3.out,rdf_B-MA_1-7_1K_3V.out

error = rdftest.err

log = rdftest.log

output = rdftest.out

 

queue

 

 

 

 

 

Log file from my submission:

001 (007.000.000) 08/23 12:47:48 Job executing on host: <192.38.89.192:1052>

...

007 (007.000.000) 08/23 12:47:49 Shadow exception!

            Can no longer talk to condor_starter on execute machine (192.38.89.192)

            0  -  Run Bytes Sent By Job

            575326720  -  Run Bytes Received By Job

...

ShadowLog:

8/23 12:31:27 ** condor_shadow (CONDOR_SHADOW) STARTING UP

8/23 12:31:27 ** C:\Condor\bin\condor_shadow.exe

8/23 12:31:27 ** $CondorVersion: 6.6.10 Jun 22 2005 $

8/23 12:31:27 ** $CondorPlatform: INTEL-WINNT50 $

8/23 12:31:27 ** PID = 3652

8/23 12:31:27 ******************************************************

8/23 12:31:27 Using config file: C:\Condor\condor_config

8/23 12:31:27 Using local config files: C:\Condor/condor_config.local

8/23 12:31:27 DaemonCore: Command Socket at <192.38.89.192:2062>

8/23 12:31:28 Initializing a VANILLA shadow

8/23 12:31:28 (7.0) (3652): Request to run on <192.38.89.192:1052> was ACCEPTED

8/23 12:32:54 (7.0) (3652): condor_read(): recv() returned -1, errno = 10054, assuming failure.

8/23 12:32:55 (7.0) (3652): DaemonCore: Can't receive command request (perhaps a timeout?)

8/23 12:32:55 (7.0) (3652): condor_read(): recv() returned -1, errno = 10054, assuming failure.

8/23 12:32:55 (7.0) (3652): ERROR "Can no longer talk to condor_starter on execute machine (192.38.89.192)" at line 63 in file ..\src\condor_shadow.V6.1\NTreceivers.C

 

 

 

 

ShedLog example:

8/23 12:32:56 Started shadow for job 7.0 on "<192.38.89.192:1052>", (shadow pid = 3512)

8/23 12:32:57 Sent ad to central manager for sch@xxxxxxxxxxxxxxxxxxxxxxx

8/23 12:34:21 DaemonCore: Command received via UDP from host <192.38.89.192:2102>

8/23 12:34:21 DaemonCore: received command 60001 (DC_PROCESSEXIT), calling handler (HandleProcessExitCommand())

8/23 12:34:21 Shadow pid 3512 for job 7.0 exited with status 4