Mailing List Archives
Public Access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Web Services JDL Parsing
- Date: Tue, 22 Jul 2008 01:59:32 -0500
- From: Matthew Farrellee <mfarrellee@xxxxxxxxxx>
- Subject: Re: [Condor-users] Web Services JDL Parsing
Sean Manning wrote:
Hi,
I appreciate that my last email was somewhat lengthy, and I have made
some progress since then. I now have a very specific question about
how to stage back output in a grid environment.
Again, I am working on Web Services code using the birdbath and
condor Java packages. I can submit a job (see the attached JDL) using
my Web Services interface from my account, and see it appear in the
condor queue of the grid metascheduler. The input files get
transferred correctly from my client machine to the metascheduler (they
go to the folder
/opt/condor/local.babargt4/spool/cluster1234.proc0.subproc0 or
similar), but the folder and its contents belong to root (the user who
is running Condor) not myself (the user who submitted the job). Unless
I change the owner of the files to myself by hand, I get an error
HoldReason = "Failed to get expiration time of Proxy" because the job
and the proxy certificate must be owned by the same user.
When we changed the owner of the spool/cluster folder and its
contents to myself, the job can create a gridftp wrapper and start
running. We can see it on the head node of one of our clusters, and
see it create a scratch folder (in /hepuser/gcprod01/.globus/scratch on
our NFS) and store the output and error there. But the output does not
get staged back from the head node to the metascheduler to the client,
and the job hangs in mode C = Completed. We have tried several variant
JDL files without success.
In other words, we have two problems:
(i) How can we run the jobs as the user who submits them, not the user
who owns condor?
(ii) How can we get output to stage back from the cluster to the
metascheduler and the client machine?
Can anyone advise how to solve either of these problems?
Thanks,
Sean Manning
Is your JDL parser setting StateInStart and StageInFinish?
from src/condor_schedd.V6/soap_scheddStub.C, in createJobTemplate:
// It is kinda scary but if ATTR_STAGE_IN_START/FINISH are
// present and non-zero in a Job Ad the Schedd will do the
// right thing, when run as root, and chown the job's spool
// directory, thus fixing a long standing permissions problem.
job->Assign(ATTR_STAGE_IN_START, 1);
job->Assign(ATTR_STAGE_IN_FINISH, 1);
$ grep STAGE_IN src/condor_c++_util/condor_attributes.C
const char *ATTR_STAGE_IN_START = "StageInStart";
const char *ATTR_STAGE_IN_FINISH = "StageInFinish";
Best,
matt