[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] How to submit a job via SOAP API



I am new to Condor. I have been able to successfully set up a personal Condor (version 7.0.0), submit  and run some simple jobs of Java and C program via command line. Then I attempted to submit jobs via SOAP client written in Java by following the IBM tutorial article. It seems condor received the job but always put the job on "idle",

Here are the java code I used to submit a job:

			files[0] = "/workspace/condor/jobs/submit.java";
			
			WebServicesHelper.submitJobHelper(schedd, "aa0586", UniverseType.JAVA, "java", "Simple 4 10", null,  files);

and submit.java is the file which works fine with command "condor_submit submit.java", The content of the file is shown as below:

Universe   = java
Executable = Simple.class
Arguments  = Simple 4 10
Log        = simple.log
Output     = simple.out
Error      = simple.error
Queue

Can any one tell me how I should pass parameters to WebServicesHelper.submitJobHelper()? I beleive this source code is provided by Condor group with method sigature like:

	public static void submitJobHelper(CondorScheddPortType schedd,
			String owner, UniverseType type, String cmd, String args,
			String requirements, String[] files) throws JobSubmissionException,
			SendFileException, java.io.IOException, java.rmi.RemoteException {
}

I also provided the log file below for analysis.

Thanks and regards,

Zhifeng


-- Submitter: localhost.localdomain :  : localhost.localdomain
 ID      OWNER/NODENAME   SUBMITTED     RUN_TIME ST PRI SIZE CMD
   9.0   aa0586          3/19 21:58   0+00:00:00 I  0   0.0  java Simple 4 10
1 jobs; 1 idle, 0 running, 0 held

Negotiator.log, it seems that negotiation is aborted in the middle as,

3/19 22:05:33 ---------- Started Negotiation Cycle ----------
3/19 22:05:33 Phase 1:  Obtaining ads from collector ...
3/19 22:05:33   Getting all public ads ...
3/19 22:05:33   Sorting 6 ads ...
3/19 22:05:33   Getting startd private ads ...
3/19 22:05:33 Got ads: 6 public and 2 private
3/19 22:05:33 Public ads include 1 submitter, 2 startd
3/19 22:05:33 Phase 2:  Performing accounting ...
3/19 22:05:33 Phase 3:  Sorting submitter ads by priority ...
3/19 22:05:33 Phase 4.1:  Negotiating with schedds ...
3/19 22:05:33   Negotiating with aa0586@localdomain at 
3/19 22:05:33 0 seconds so far
3/19 22:05:33     Request 00009.00000:
3/19 22:05:33       Matched 9.0 aa0586@localdomain  preempting none  slot1@xxxxxxxxxxxxxxxxxxxxx
3/19 22:05:33       Successfully matched with slot1@xxxxxxxxxxxxxxxxxxxxx
3/19 22:05:33     Got NO_MORE_JOBS;  done negotiating
3/19 22:05:33 ---------- Finished Negotiation Cycle ----------

And starter.log indicates signal error:

3/19 22:05:33 slot1: match_info called
3/19 22:05:33 slot1: Received match #1205981602#4#...
3/19 22:05:33 slot1: State change: match notification protocol successful
3/19 22:05:33 slot1: Changing state: Unclaimed -> Matched
3/19 22:05:33 slot1: Request accepted.
3/19 22:05:33 slot1: Remote owner is aa0586@localdomain
3/19 22:05:33 slot1: State change: claiming protocol successful
3/19 22:05:33 slot1: Changing state: Matched -> Claimed
3/19 22:05:35 slot1: Got activate_claim request from shadow ()
3/19 22:05:36 slot1: Remote job ID is 9.0
3/19 22:05:36 slot1: Got universe "JAVA" (10) from request classad
3/19 22:05:36 slot1: State change: claim-activation protocol successful
3/19 22:05:36 slot1: Changing activity: Idle -> Busy
3/19 22:05:36 slot1: Called deactivate_claim_forcibly()
3/19 22:05:36 attempt to connect to  failed: Connection refused (connect errno = 111).
3/19 22:05:36 Send_Signal: ERROR sending signal 3 (SIGQUIT) to pid 3517 (still alive)
3/19 22:05:36 slot1: Error sending signal to starter, errno = 25 (Inappropriate ioctl for device)
3/19 22:05:37 Starter pid 3517 exited with status 4
3/19 22:05:37 slot1: State change: starter exited
3/19 22:05:37 slot1: Changing activity: Busy -> Idle
3/19 22:05:37 slot1: State change: received RELEASE_CLAIM command
3/19 22:05:37 slot1: Changing state and activity: Claimed/Idle -> Preempting/Vacating
3/19 22:05:37 slot1: State change: No preempting claim, returning to owner
3/19 22:05:37 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle
3/19 22:05:37 slot1: State change: IS_OWNER is false
3/19 22:05:37 slot1: Changing state: Owner -> Unclaimed


And shadow file looks like:
3/19 22:05:35 ******************************************************
3/19 22:05:35 ** condor_shadow (CONDOR_SHADOW) STARTING UP
3/19 22:05:35 ** /usr/local/condor/sbin/condor_shadow
3/19 22:05:35 ** $CondorVersion: 7.0.0 Jan 22 2008 BuildID: 72173 $
3/19 22:05:35 ** $CondorPlatform: I386-LINUX_RHEL3 $
3/19 22:05:35 ** PID = 3516
3/19 22:05:35 ** Log last touched 3/19 21:55:41
3/19 22:05:35 ******************************************************
3/19 22:05:35 Using config source: /usr/local/condor/etc/condor_config
3/19 22:05:35 Using local config sources: 
3/19 22:05:35    /home/aa0586/pool/condor_config.local
3/19 22:05:35 DaemonCore: Command Socket at 
3/19 22:05:35 Initializing a JAVA shadow for job 9.0
3/19 22:05:36 (9.0) (3516): Request to run on  was ACCEPTED
3/19 22:05:36 (9.0) (3516): ReliSock::put_file_with_permissions(): Failed to stat file '/home/aa0586/pool/spool/cluster9.proc0.subproc0/java': No such file or directory (errno: 2, si_error: 1)
3/19 22:05:36 (9.0) (3516): DoUpload: (Condor error code 13, subcode 2) SHADOW at 192.168.0.20 failed to send file(s) to : error reading from /home/aa0586/pool/spool/cluster9.proc0.subproc0/java: (errno 2) No such file or directory; STARTER failed to receive file(s) from 
3/19 22:05:36 (9.0) (3516): Job 9.0 going into Hold state (code 13,2): Error from starter on slot1@xxxxxxxxxxxxxxxxxxxxx: STARTER failed to receive file(s) from 
3/19 22:05:36 (9.0) (3516): ZKM: setting default map to (null)
3/19 22:05:36 (9.0) (3516): **** condor_shadow (condor_SHADOW) EXITING WITH STATUS 112