[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Problem using schedd web service



Have you tried DeclareFile()ing job.err and job.out? This may be a ownership issue if the Schedd is being run as root. Does your submitted job contain attributes StageInStart and StageInFinish?


matt

On Oct 27, 2005, at 5:53 AM, Peter Ledbrook wrote:

Hi there,

I am currently trying to submit jobs via the schedd web service, but I
have run into problems with the "Out" and "Err" JobAd properties. If
neither of these properties are present, then the job runs just fine.
However, as soon as one of them is added to the JobAd, the process
fails. The only unusual bit in the log files that I have found is in
ShadowLog, where it says that it failed to open '/dev/null':

10/27 11:38:55 (?.?) (13074):******* Standard Shadow starting up *******
10/27 11:38:55 (?.?) (13074):** $CondorVersion: 6.7.12 Sep 24 2005 $
10/27 11:38:55 (?.?) (13074):** $CondorPlatform: I386-LINUX_RH9 $
10/27 11:38:55 (?.?) (13074):*******************************************
10/27 11:38:55 (?.?) (13074):uid=0, euid=19419, gid=0, egid=100
10/27 11:38:55 (?.?) (13074):RemoveNewShadowDroppings(): Old shadow
removed new shadow ckpt directory:
/home/condor/spool/cluster125.proc0.subproc0
10/27 11:38:55 (?.?) (13074):RemoveNewShadowDroppings(): Old shadow
removed new shadow ckpt directory:
/home/condor/spool/cluster125.proc0.subproc0.tmp
10/27 11:38:55 (?.?) (13074):Hostname = "<xxx.xxx.xxx.xxx:nnnnn>", Job =
125.0
10/27 11:38:55 (125.0) (13074):Requesting Primary Starter
10/27 11:38:55 (125.0) (13074):Shadow: Request to run a job was ACCEPTED
10/27 11:38:55 (125.0) (13074):Shadow: RSC_SOCK connected, fd = 17
10/27 11:38:55 (125.0) (13074):Shadow: CLIENT_LOG connected, fd = 18
10/27 11:38:55 (125.0) (13074):My_Filesystem_Domain = "ixico.net"
10/27 11:38:55 (125.0) (13074):My_UID_Domain = "ixico.net"
10/27 11:38:55 (125.0) (13074): Entering pseudo_get_file_stream
10/27 11:38:55 (125.0) (13074): file =
"/opt/condor-6.6.10/examples/env.remote"
10/27 11:38:55 (125.0) (13074): Weird 0xc0a8010b
10/27 11:38:55 (125.0) (13074): Weird 0xc0a8010b
10/27 11:38:56 (125.0) (13074):Reaped child status - pid 13076 exited
with status 0
10/27 11:38:56 (125.0) (13074):Read: User Job - $CondorPlatform:
I386-LINUX_RH9 $
10/27 11:38:56 (125.0) (13074):Read: User Job - $CondorVersion: 6.6.10
Jun 13 2005 $
10/27 11:38:56 (125.0) (13074):Read: Checkpoint file name is
"/home/condor/spool/cluster125.proc0.subproc0"
10/27 11:38:56 (125.0) (13074):error: Error: Couldn't open standard file
'/dev/null'
10/27 11:38:56 (125.0) (13074):Shadow: Job 125.0 exited, termsig = 9,
coredump = 0, retcode = 0
10/27 11:38:56 (125.0) (13074):Shadow: Job was kicked off without a
checkpoint
10/27 11:38:56 (125.0) (13074):Shadow: DoCleanup: unlinking TmpCkpt
'/home/condor/spool/cluster125.proc0.subproc0.tmp'
10/27 11:38:56 (125.0) (13074):Trying to unlink
/home/condor/spool/cluster125.proc0.subproc0.tmp
10/27 11:38:56 (125.0) (13074):user_time = 1 ticks
10/27 11:38:56 (125.0) (13074):sys_time = 4 ticks
10/27 11:38:56 (125.0) (13074):********** Shadow Exiting(107) **********


Does anyone have any pointers on how to fix this problem? I have added
my test code to the end of the e-mail. It is currently trying to run the
"env.remote" example. Using "condor_submit" works fine.


Thanks in advance,

Peter

-------------
    public int submitJob(String command, List<String> arguments,
List<File> inputFiles, List<File> outputFiles) throws IOException{
        try{
            // Create a transaction, a cluster, and a new job.
            CondorScheddPortType stub =
this.scheddService.getcondorSchedd(this.wsUrl);
            Transaction txn =
stub.beginTransaction(TRANSACTION_DURATION).getTransaction();
            int clusterId = stub.newCluster(txn).getInteger();
            int jobId = stub.newJob(txn, clusterId).getInteger();

            // Convert the arguments into a single string.
            StringBuilder buffer = new StringBuilder();
            for (String arg : arguments){
                buffer.append(arg).append(' ');
            }

// Send over the input files.
for (File file : inputFiles){
Status retval = stub.declareFile(txn, clusterId, jobId,
file.getName(), (int) file.length(), HashType.NOHASH, null);
System.out.println("Declaring file " + file + ": " +
retval.getCode());
sendFile(stub, txn, clusterId, jobId, file);
}
stub.commitTransaction(txn);


            // Now submit the job.
            txn =
stub.beginTransaction(TRANSACTION_DURATION).getTransaction();
            ClassAdStructAttr[] templ = stub.createJobTemplate(
                    clusterId, jobId, "user", UniverseType.STANDARD,
command, buffer.toString(), "").getClassAd();
            Map<String, ClassAdStructAttr> jobAd = new HashMap<String,
ClassAdStructAttr>();
            for (ClassAdStructAttr attribute : templ){
                jobAd.put(attribute.getName(), attribute);
            }

// Customise the template.
jobAd.put("Iwd", new ClassAdStructAttr("Iwd",
ClassAdAttrType.value3, "/tmp/test-submit"));
jobAd.put("UserLog", new ClassAdStructAttr("UserLog",
ClassAdAttrType.value3, "/tmp/test-submit/log"));
jobAd.put("LeaveJobInQueue", new
ClassAdStructAttr("LeaveJobInQueue", ClassAdAttrType.value5, "FALSE"));
jobAd.put("WantCheckpoint", new
ClassAdStructAttr("WantCheckpoint", ClassAdAttrType.value5, "TRUE"));
jobAd.put("WantRemoteSyscalls", new
ClassAdStructAttr("WantRemoteSyscalls", ClassAdAttrType.value5, "TRUE"));
jobAd.put("Err", new ClassAdStructAttr("Err",
ClassAdAttrType.value3, "job.err"));
jobAd.put("Out", new ClassAdStructAttr("Out",
ClassAdAttrType.value3, "job.out"));
// jobAd.put("ShouldTransferFiles", new
ClassAdStructAttr("ShouldTransferFiles", ClassAdAttrType.value3, "NO"));
// jobAd.put("TransferIn", new
ClassAdStructAttr("TransferIn", ClassAdAttrType.value5, "TRUE"));
// jobAd.put("In", new ClassAdStructAttr("In",
ClassAdAttrType.value3, "cmd.in"));
// jobAd.put("TransferFiles", new
ClassAdStructAttr("TransferFiles", ClassAdAttrType.value3, "NEVER"));
// jobAd.put("WhenToTransferOutput", new
ClassAdStructAttr("WhenToTransferOutput", ClassAdAttrType.value3,
"ON_EXIT"));


            RequirementsAndStatus retval = stub.submit(txn, clusterId,
jobId, jobAd.values().toArray(new ClassAdStructAttr[0]));
            System.out.println("Submit status: " +
retval.getStatus().getCode());
            stub.commitTransaction(txn);

            // Try to get the file.
//            stub.getFile(null, clusterId, jobId,
            return jobId;
        }
        catch (ServiceException ex){
            // TODO Auto-generated catch block
            ex.printStackTrace();
        }
        catch (RemoteException ex){
            // TODO Auto-generated catch block
            ex.printStackTrace();
        }
        return 0;
    }
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users