[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] a job submited to condor-g never stops running



Hi,

Has anybody ever had a problem like this. I submit the following script:

universe = grid
grid_type = gt2
globusscheduler = zen.citi.umich.edu/jobmanager-condor
executable = sh_loop
arguments = 600
x509userproxy = /tmp/x509_proxy_aglo
should_transfer_files = true
whentotransferoutput = on_exit
MyProxyHost     = yoga.citi.umich.edu:7512
MyProxyPassword = foobar
MyProxyServerDN = /C=US/ST=Michigan/L=Ann Arbor/O=University of Michigan/OU=CITI Production KCA/CN=myproxy/yoga.citi.umich.edu/emailAddress=aglo@xxxxxxxxxxxxxx
MyProxyNewProxyLifetime = 5
MyProxyRefreshThreshold = 180
error = script6.err
output = script6.out
log = script6.log
queue

The job is successfully starts and finishes in condor. StarterLog has:

7/11 17:19:33 Starting a VANILLA universe job with ID: 42.0
7/11 17:19:33 IWD: /home/aglo/gram_scratch_UW0W4vAkqk
7/11 17:19:33 Output file: /home/aglo/.globus/job/zen.citi.umich.edu/28687.1152652478/stdout 7/11 17:19:33 Error file: /home/aglo/.globus/job/zen.citi.umich.edu/28687.1152652478/stderr 7/11 17:19:33 Using wrapper /usr/local/bin/spkm3-wrapper to exec condor_exec.exe /home/aglo/.globus/.gass_cache/local/md5/32/0e2d7cf69a8d3a612c2a49374d3087/md5/5a/53a49f318032335ea7bf518346b3f4/data 600
7/11 17:19:33 Create_Process succeeded, pid=28812
7/11 17:29:41 Process exited, pid=28812, status=0
7/11 17:29:42 Got SIGQUIT.  Performing fast shutdown.
7/11 17:29:42 ShutdownFast all jobs.
7/11 17:29:42 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

Yet. the job's status in condor's queue never changes from "running". GridmanagerLog happily spews out a continum of messages that contain what looks like queries for results:

7/11 17:59:30 [28682] Using constraint ((Owner=?="aglo"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) && (JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?= "External"))
7/11 17:59:30 [28682] Fetched 0 job ads from schedd
7/11 17:59:30 [28682] leaving doContactSchedd()
7/11 17:59:33 [28682] GAHP[28683] <- 'RESULTS'
7/11 17:59:33 [28682] GAHP[28683] -> 'S' '0'

The job never gets done. Any ideas why? Thanks.