Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Web Services JDL Parsing
- Date: Tue, 22 Jul 2008 14:04:39 -0700 (PDT)
- From: Sean Manning <seangwm@xxxxxxx>
- Subject: Re: [Condor-users] Web Services JDL Parsing
Matthew Farrellee wrote:
>
>
>Sean Manning wrote:
>> Hi,
>>
>> I appreciate that my last email was somewhat lengthy, and I have
made
>> some progress since then. I now have a very specific question about
>> how to stage back output in a grid environment.
>>
>> Again, I am working on Web Services code using the birdbath and
>> condor Java packages. I can submit a job (see the attached JDL) using
>> my Web Services interface from my account, and see it appear in the
>> condor queue of the grid metascheduler. The input files get
>> transferred correctly from my client machine to the metascheduler
(they
>> go to the folder
>> /opt/condor/local.babargt4/spool/cluster1234.proc0.subproc0 or
>> similar), but the folder and its contents belong to root (the user who
>> is running Condor) not myself (the user who submitted the job).
Unless
>> I change the owner of the files to myself by hand, I get an error
>> HoldReason = "Failed to get expiration time of Proxy" because the job
>> and the proxy certificate must be owned by the same user.
>>
>> When we changed the owner of the spool/cluster folder and its
>> contents to myself, the job can create a gridftp wrapper and start
>> running. We can see it on the head node of one of our clusters, and
>> see it create a scratch folder (in /hepuser/gcprod01/.globus/scratch
on
>> our NFS) and store the output and error there. But the output does
not
>> get staged back from the head node to the metascheduler to the client,
>> and the job hangs in mode C = Completed. We have tried several
variant
>> JDL files without success.
>>
>> In other words, we have two problems:
>>
>> (i) How can we run the jobs as the user who submits them, not the user
>> who owns condor?
>>
>> (ii) How can we get output to stage back from the cluster to the
>> metascheduler and the client machine?
>>
>> Can anyone advise how to solve either of these problems?
>>
>> Thanks,
>>
>> Sean Manning
>
>Is your JDL parser setting StateInStart and StageInFinish?
>
>from src/condor_schedd.V6/soap_scheddStub.C, in createJobTemplate:
> // It is kinda scary but if ATTR_STAGE_IN_START/FINISH are
> // present and non-zero in a Job Ad the Schedd will do the
> // right thing, when run as root, and chown the job's spool
> // directory, thus fixing a long standing permissions problem.
> job->Assign(ATTR_STAGE_IN_START, 1);
> job->Assign(ATTR_STAGE_IN_FINISH, 1);
>
>$ grep STAGE_IN src/condor_c++_util/condor_attributes.C
>const char *ATTR_STAGE_IN_START = "StageInStart";
>const char *ATTR_STAGE_IN_FINISH = "StageInFinish";
>
>Best,
>
>
>matt
>
Hi Matthew,
They are both set to 1, but I'm not sure how, and it isn't helping.
Jobs still halt with HoldReason = "Failed to get expiration time of
proxy" unless I explicitly change the owner of the folder containing
the proxy to myself.
I think the parser is org.glite.jdl.Ad.fromFile () plus some code of
my own to get a condor.ClassAdStructAttr for each attribute in the
submit description file. These are passed to
birdbath.Transaction.submit () to create the ClassAd which Condor-G
sees.
This need to change the owner of the spool/cluster folder by hand is
the main problem I have left. I can get stdout, stderr, and other
output files back now from the cluster to the metascheduler by setting
TransferOutput = {"specialOutputFile1.txt", "specialOutputFile2.txt"};
in the JDL, and I can stage them from the metascheduler to the client
with some other code.
Thanks for your help,
Sean