[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] condor problem : shadow unable to transmit output file



Hi,

Take out the "transfer_output_files" line -- the file
"true" does not exist on the worker side.

In general, you do not need to tell Condor to tranfer
back the output files, or list the files to be
returned.

OC.

--- USTV_condor_Task_Force USTV_condor_Task_Force
<ustv.condor.task.force@xxxxxxxxx> wrote:

> Hello , We are making a test grid in order to
> harness all our lab computer
> processing power
> and we ran in a problem we are unable to solve.
> 
> our pool is for currently made out of
> licinfo10.uni LINUX       INTEL  Owner      Idle    
>   0.000   502
> 0+00:10:02 - ubuntu edgy eft
> licinfo11.uni LINUX       INTEL  Owner      Idle    
>   0.000   502
> 0+00:10:01 - ubuntu edgy eft
> vm1@moua      LINUX       INTEL  Owner      Idle    
>   0.060   504
> 0+00:08:24 - RH FC 6
> vm2@moua      LINUX       INTEL  Owner      Idle    
>   0.000   504
> 0+00:08:25
> vm1@nocte     LINUX       INTEL  Owner      Idle    
>   0.270   506
> 0+00:10:09 - debian sid
> vm2@nocte     LINUX       INTEL  Owner      Idle    
>   0.000   506
> 0+00:10:10
> vm1@nous      LINUX       INTEL  Owner      Idle    
>   0.070   505
> 0+00:10:09 - ubuntu festy fawn
> vm2@nous      LINUX       INTEL  Owner      Idle    
>   0.000   505
> 0+00:10:10
> 
> i tested a test submit i had on this ml :
> 
> executable = /bin/hostname
> universe = vanilla
> TransferExecutable = true
> transfer_output_files= true
> output=results.output.$(Process)
> error=results.error.$(Process)
> log=results.log.$(Process)
> should_transfer_files = YES
> when_to_transfer_output = ON_EXIT_OR_EVICT
> queue 5
> 
> our problem consist in all our jobs going quickly
> from idle to held state
> with all our job logs telling :
> 
> 
> 000 (001.003.000) 06/05 14:07:02 Job submitted from
> host: <10.9.185.29:38947
> >
> ...
> 001 (001.003.000) 06/05 14:17:11 Job executing on
> host: <10.9.185.211:42641>
> ...
> 007 (001.003.000) 06/05 14:17:11 Shadow exception!
>         Error from starter on licinfo11.xxx: STARTER
> at 10.9.185.211 failed
> to send file(s) to <10.9.185.29:60059>: error
> reading from
> /condor/licinfo11/execute/dir_9027/true: (errno 2)
> No such file or
> directory; SHADOW failed to receive file(s) from
> <10.9.185.211:53966>
>         0  -  Run Bytes Sent By Job
>         8572  -  Run Bytes Received By Job
> ...
> 012 (001.003.000) 06/05 14:17:11 Job was held.
>         Error from starter on licinfo11.xxx: STARTER
> at 10.9.185.211 failed
> to send file(s) to <10.9.185.29:60059>: error
> reading from
> /condor/licinfo11/execute/dir_9027/true: (errno 2)
> No such file or
> directory; SHADOW failed to receive file(s) from
> <10.9.185.211:53966>
>         Code 13 Subcode 2
> ...
> 
> i have
> LOCAL_DIR        = /condor/$(HOSTNAME)
> previously had
> #LOCAL_DIR        = $(RELEASE_DIR)/hosts/$(HOSTNAME)
> 
> changed it in order to have the local dir local to
> the nodes as i saw on the
> ml that remote local dirs could pose some problems
> if the machines weren't
> correctly time synchronised (our /home/condor is nfs
> shared amoung all our
> nodes)
> 
> additionnal info : all our UIDs are shared among our
> hosts
> 
> apparently condor don't manage to create the dirs in
> $(LOCAL_DIR)/execute
> (wich i chmoded to be world writable) to sed them
> back
> 
> Hope somebody can Help :)
> 
> The USTV Condor Task Force
> > _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
> 
> The archives can be found at: 
> https://lists.cs.wisc.edu/archive/condor-users/
> 



      ___________________________________________________________ 
Yahoo! Mail is the world's favourite email. Don't settle for less, sign up for
your free account today http://uk.rd.yahoo.com/evt=44106/*http://uk.docs.yahoo.com/mail/winter07.html