[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [condor-users] globus universe question



Hi,

Thanks a lot for your replies. Jaime's solution worked for me.

However, I do not have a shared FS, so I believe I do have
Mark's problem. From another pool (at bnl.gov) I submit the following:

spider:/direct/usatlas+u/sgr/> cat tio_globus.sdf
Executable      = tio
Universe        = globus
globusscheduler = machine1.harvard.edu/jobmanager-condor
transfer_files        = ONEXIT
transfer_input        = true
transfer_input_files  = arg_tio.i
transfer_output       = true
transfer_output_files = arg_gtio.o
Arguments = arg_tio.i arg_gtio.o
Queue

If machine1 executes the job it works fine, but if machine1 submits
the job to machine2 of the pool I see (in machine2 StarterLog):

12/9 13:51:27 Submitting machine is "machine1.harvard.edu"
12/9 13:51:27 Starting a VANILLA universe job with ID: 175.0
12/9 13:51:27 IWD: /home/sgr/gram_scratch_XcH1hwkyuH
12/9 13:51:27 Failed to open standard output file
'/home/sgr/.globus/.gass_cache/local/md5/3e/6df7ac71c2853102cdde129d4bbc62/md5/b
2/084e33b70352d25af0753832650d45/data':
No such file or directory (errno 2)
12/9 13:51:27 Output file:
/home/sgr/.globus/.gass_cache/local/md5/3e/6df7ac71c2853102cdde129d4bbc62/md5/b2
/084e33b70352d25af0753832650d45/data
12/9 13:51:27 Failed to open standard error file
'/home/sgr/.globus/.gass_cache/local/md5/3e/6df7ac71c2853102cdde129d4bbc62/md5/3
c/78e676262e3765f3c13773d24c9e33/data':
No such file or directory (errno 2)
12/9 13:51:27 Error file:
/home/sgr/.globus/.gass_cache/local/md5/3e/6df7ac71c2853102cdde129d4bbc62/md5/3c
/78e676262e3765f3c13773d24c9e33/data
12/9 13:51:27 Failed to open some/all of the std files...
12/9 13:51:27 Aborting OsProc::StartJob.
12/9 13:51:27 Failed to start job, exiting
12/9 13:51:27 ShutdownFast all jobs.
12/9 13:51:27 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0

This is the problem you are refering to, right?
So I will contact Mark who kindly offered to help me.

Thanks again,
Sebastian

On Tue, 9 Dec 2003, Jaime Frey wrote:

> The problem Mark is describing was caused by the fact that the Condor pool
> he was submitting to (via the globus universe) didn't have a shared
> filesystem, a situation that Globus doesn't handle well. If you're not in
> that situation, you shouldn't have to jump through the hoops he's had to.
>
> -- Jaime
>
> On Tue, 9 Dec 2003, Mark Calleja wrote:
>
> > Hi Sebastian,
> > I also came up against the problem of retrieving output files from
> > condor-g jobs (see a previous thread in last month's mailing list), and
> > the only way I got round it was to amend the condor.pm file in globus so
> > that a new process is forked by the jobmanager which is then exec'd with
> > a monitoring process. This process waits for the condor job to finish
> > before using gsiftp to return all output files back to the submitting
> > machine. Not pretty, but it works. If you're interested in this route
> > then drop me a line and I'll give you what I've done.
> >
> > Cheers,
> >
> > Mark Calleja
> > --
> > Department of Earth Sciences, University of Cambridge
> > Downing Street, Cambridge CB2 3EQ, UK
> > Tel. (+44/0) 1223 333408, Fax  (+44/0) 1223 333450
> > http://www.esc.cam.ac.uk/~mcal00
> >
> > On Mon, 2003-12-08 at 20:25, Jaime Frey wrote:
> > > On Mon, 8 Dec 2003, Sebastian Grinstein wrote:
> > > Hello Condor users and experts,
> > > >
> > > > I'm starting to use condor. I have simple question:
> > > >
> > > > I submit a job in the globus universe, the executable generates
> > > > an output file. How do I retrieve this file (or files)?

Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>