[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Condor/SGE cluster



Hi,

I'm trying to set up Condor in order to be able to submit jobs to a local SGE cluster. The SGE cluster is already up and running, and I can execute Vanilla universe Condor jobs (e.g. "/usr/bin/condor_run -u vanilla -a periodic_remove=JobStatus==5 /bin/hostname &). But if I try to submit a Grid universe job (grid_resource=sge), the job always ends up in hold state.

condor_status -analyze
Hold reason: Attempts to submit failed:

cat /var/log/condor/GridmanagerLog.xxx
01/06/14 13:20:03 [27666] Found job 77.0 --- inserting
01/06/14 13:20:03 [27666] gahp server not up yet, delaying ping
01/06/14 13:20:03 [27666] (77.0) doEvaluateState called: gmState GM_INIT, remoteState 0
01/06/14 13:20:03 [27666] GAHP server pid = 27673
01/06/14 13:20:08 [27666] resource  is now up
01/06/14 13:20:08 [27666] (77.0) doEvaluateState called: gmState GM_SAVE_SANDBOX_ID, remoteState 0
01/06/14 13:20:08 [27666] (77.0) doEvaluateState called: gmState GM_SUBMIT, remoteState 0
01/06/14 13:20:08 [27666] (77.0) blah_job_submit() failed: submission command failed (exit code = 1) (stdout:) (stderr:)
01/06/14 13:20:13 [27666] (77.0) doEvaluateState called: gmState GM_CLEAR_REQUEST, remoteState 0
01/06/14 13:20:18 [27666] (77.0) doEvaluateState called: gmState GM_SAVE_SANDBOX_ID, remoteState 0

Best regards,
Lukas