[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] grid_resource = pbs problems with 7.2.4



Hi,

I am using this Condor:

[skoranda@coma2 ~]$ condor_version
$CondorVersion: 7.4.2 Mar 29 2010 BuildID: 227044 $
$CondorPlatform: X86_64-LINUX_RHEL5 $

When I submit this job using condor_submit

[skoranda@coma2 ~]$ cat test02.sub
universe = grid
grid_resource = pbs
transfer_executable = False
output = test02.$(Process).out
error  = test02.$(Process).err
log    = test02.log
executable = /usr/bin/whoami
queue 10

pbs_submit.sh generates a PBS submit like the one attached.
There are a few problems with that file:

1) The PBS cluster uses a shared file system and there is no
need for the staging commands

#PBS -W stagein=whoami@xxxxxxxxxxx:/usr/bin/whoami
#PBS -W stageout=home_bl_07a94ebd3a6c/test02.7.out@xxxxxxxxxxx:/home/skoranda/test02.7.out,home_bl_07a94ebd3a6c/test02.7.err@xxxxxxxxxxx:/home/skoranda/test02.7.err

I cannot find an option to include in condor_submit or in the
GAHP configuration to turn this off. Is there some way to do
it?

2) The PBS submit script includes this line

mv whoami $new_home &>/dev/null

and that is a race condition because after the first PBS
script in the job cluster does the 'mv' no other job can do
it.

Is this a bug? Is there some way to turn this off?

Thanks,

Scott
#!/bin/bash
# PBS job wrapper generated by pbs_submit.sh
# on Tue May 18 14:46:10 BST 2010
#
# stgcmd = yes
# proxy_string = 
# proxy_local_file = 
#
# PBS directives:
#PBS -S /bin/bash
#PBS -o /dev/null
#PBS -e /dev/null
#PBS -W stagein=whoami@xxxxxxxxxxx:/usr/bin/whoami
#PBS -W stageout=home_bl_07a94ebd3a6c/test02.7.out@xxxxxxxxxxx:/home/skoranda/test02.7.out,home_bl_07a94ebd3a6c/test02.7.err@xxxxxxxxxxx:/home/skoranda/test02.7.err
#PBS -m n
new_home=`pwd`/home_bl_07a94ebd3a6c
mkdir $new_home
mv whoami $new_home &>/dev/null
export HOME=$new_home
cd $new_home

# Command to execute:
if [ ! -x ./whoami ]; then chmod u+x ./whoami; fi
$new_home/whoami  </dev/null >test02.7.out 2>test02.7.err &
job_pid=$!

# Wait for the user job to finish
wait $job_pid
user_retcode=$?


exit $user_retcode