[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Job disconnected, attempting to reconnect



Hi Simon,

You can try this:

your file .cmd:

Executable = fac.py
Universe        = vanilla
Output  = out.$(cluster)
Error  = err.$(cluster)
Log  = log.$(cluster)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Queue


New file cmd:

Executable = fac.sh
Universe        = vanilla
Output  = out.$(cluster)
Error  = err.$(cluster)
Log  = log.$(cluster)
should_transfer_files = YES
transfer_input_files = fac.py
when_to_transfer_output = ON_EXIT
Queue

fac.sh:

#!bin/bash

python $1

You must install python in all nodes.

Best regards


On Mon, 7 May 2007 03:58:22 -0700 (PDT), simon kagwe wrote

> Hi everyone,
> I have jest installed COndor 6.8.4 on 2 Windows 2000
> machines. I am submitting a simple python script
> (fac.py that calculates factorials) using the
> following submit description file:
>
> # file : test.condor
> # For testing submission of a python script on Condor
>
> Executable = fac.py
> Universe        = vanilla
> Output  = out.$(cluster)
> Error  = err.$(cluster)
> Log  = log.$(cluster)
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT
> Queue
>
> I am getting the following messages in my log file:
> 000 (002.000.000) 05/05 13:47:42 Job submitted from
> host: <10.2.28.73:2798>
> ...
> 001 (002.000.000) 05/05 13:48:28 Job executing on
> host: <10.2.28.73:2799>
> ...
> 022 (002.000.000) 05/05 13:48:28 Job disconnected,
> attempting to reconnect
>    Socket between submit and execute hosts closed
> unexpectedly
>    Trying to reconnect to
> lab121machine6.icsdomain.uonbi.ac.ke <10.2.28.73:2799>
> ...
> 024 (002.000.000) 05/05 13:48:54 Job reconnection
> failed
>    Job not found at execution machine
>    Can not reconnect to
> lab121machine6.icsdomain.uonbi.ac.ke, rescheduling job
>
> Please help me understand why the job would fail to
> reconnect when it's being executed on the same machine
> it was submitted from. I also have a Personal Condor
> installation on another machine that gives me similar
> log messages.
>
> By the way, according to the Condor installation
> manual, after specifying that I am installing a new
> pool, I am supposed to be asked the number of machines
> in the pool. That is not happening with any of my
> installations. Is there a problem with the MSI file I
> am using?
>
> The fac.py looks like this:
> def fac(n):
> if n == 0:
>  return 1
> if n == 1:
>  return 1
> else:
>  return fac(n-1)*n
> print fac(400)
>
> Is the output of the 'print' statement going to be
> placed in the designated output file or do I have to
> place it in the file myself within the fac.py code? My
> assumption is that since python is installed on all
> the execute machines and it is added to the system
> path, fac.py will run as an executable. Is my
> assumption correct?
>
> I know its a lot of questions but I really need your
> help. Thank you.
>
> ____________________________________________________________________________________
> Finding fabulous fares is fun.  
> Let Yahoo! FareChase search your favorite travel sites to find flight and hotel bargains.
> http://farechase.yahoo.com/promo-generic-14795097
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at either
> https://lists.cs.wisc.edu/archive/condor-users/
> http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR