[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] a job submited to condor-g never stops running




I don't know of a specific cause for this sort of behavior. You may want to compare how it works both with and without the "gridmonitor" enabled in your condor configuration. Example:

ENABLE_GRID_MONITOR = False

--Dan

On Jul 11, 2006, at 5:03 PM, Olga Kornievskaia wrote:

Hi,

Has anybody ever had a problem like this. I submit the following script:

universe = grid
grid_type = gt2
globusscheduler = zen.citi.umich.edu/jobmanager-condor
executable = sh_loop
arguments = 600
x509userproxy = /tmp/x509_proxy_aglo
should_transfer_files = true
whentotransferoutput = on_exit
MyProxyHost     = yoga.citi.umich.edu:7512
MyProxyPassword = foobar
MyProxyServerDN = /C=US/ST=Michigan/L=Ann Arbor/O=University of
Michigan/OU=CITI Production
KCA/CN=myproxy/yoga.citi.umich.edu/emailAddress=aglo@xxxxxxxxxxxxxx
MyProxyNewProxyLifetime = 5
MyProxyRefreshThreshold = 180
error = script6.err
output = script6.out
log = script6.log
queue

The job is successfully starts and finishes in condor. StarterLog has:

7/11 17:19:33 Starting a VANILLA universe job with ID: 42.0
7/11 17:19:33 IWD: /home/aglo/gram_scratch_UW0W4vAkqk
7/11 17:19:33 Output file:
/home/aglo/.globus/job/zen.citi.umich.edu/28687.1152652478/stdout
7/11 17:19:33 Error file:
/home/aglo/.globus/job/zen.citi.umich.edu/28687.1152652478/stderr
7/11 17:19:33 Using wrapper /usr/local/bin/spkm3-wrapper to exec
condor_exec.exe
/home/aglo/.globus/.gass_cache/local/md5/32/ 0e2d7cf69a8d3a612c2a49374d3087/md5/5a/53a49f318032335ea7bf518346b3f4/ data
600
7/11 17:19:33 Create_Process succeeded, pid=28812
7/11 17:29:41 Process exited, pid=28812, status=0
7/11 17:29:42 Got SIGQUIT.  Performing fast shutdown.
7/11 17:29:42 ShutdownFast all jobs.
7/11 17:29:42 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

Yet. the job's status in condor's queue never changes from "running".
GridmanagerLog happily spews out a continum of messages that contain
what looks like queries for results:

7/11 17:59:30 [28682] Using constraint
((Owner=?="aglo"&&JobUniverse==9)) && ((Managed =!= "ScheddDone")) &&
(JobStatus == 3 || JobStatus == 4 || (JobStatus == 5 && Managed =?=
"External"))
7/11 17:59:30 [28682] Fetched 0 job ads from schedd
7/11 17:59:30 [28682] leaving doContactSchedd()
7/11 17:59:33 [28682] GAHP[28683] <- 'RESULTS'
7/11 17:59:33 [28682] GAHP[28683] -> 'S' '0'

The job never gets done. Any ideas why? Thanks.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR