[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Notification of Cluster Complete - notprocesscomplete






I was under the assumption that if I submitted jobs and my submit machine
died, I lost all connection to the running jobs and jobs yet to be
scheduled.  Is this the case or am I completely misguided?




|---------+-------------------------------->
|         |           Erik Paulson         |
|         |           <epaulson@xxxxxxxxxxx|
|         |           >                    |
|         |           Sent by:             |
|         |           condor-users-bounces@|
|         |           cs.wisc.edu          |
|         |                                |
|         |                                |
|         |           07/29/2004 02:35 PM  |
|         |           Please respond to    |
|         |           Condor-Users Mail    |
|         |           List                 |
|         |                                |
|---------+-------------------------------->
  >--------------------------------------------------------------------------------------------------------------|
  |                                                                                                              |
  |       To:       Condor-Users Mail List <condor-users@xxxxxxxxxxx>                                            |
  |       cc:                                                                                                    |
  |       Subject:  Re: [Condor-users] Notification of Cluster Complete - not    processcomplete                 |
  >--------------------------------------------------------------------------------------------------------------|




On Thu, Jul 29, 2004 at 07:20:17PM +0100, Matt Hope wrote:
> > -----Original Message-----
> > From: condor-users-bounces@xxxxxxxxxxx
> > [mailto:condor-users-bounces@xxxxxxxxxxx]On Behalf Of Zachary Miller
> > Sent: 29 July 2004 19:09
> > To: Condor-Users Mail List
> > Subject: Re: [Condor-users] Notification of Cluster Complete - not
> > processcomplete
> > >
> > > Disable all email notification (apart from maybe errors)
> > > After submitting the jobs run condor_wait pointing at the resulting
clusterid
> > > and have that launch some simple mail script to send the mail.
> >
> > i think it would be slightly better to submit condor_wait as
> > a condor job
> > itself, and then have that job email you when it's finished.
> > no scripts
> > necessary, and if the machine goes down for some reason, your
> > condor_wait
> > process will come back up when condor does.
>
> Though this would have the side effect of tacking up a slot either on
your pool or on your local machine which may be problematic if you run more
concurrent clusters

No, you would submit the job as a "scheduler universe" job. Scheduler
universe
jobs run on the submit machine - they're not matched, so you can run as
many of
them as you want. DAGMan is the prime example of a scheduler universe job -

run 'condor_submit_dag -nosubmit' to see what it would tell the schedd to
do.

The reason for running a scheduler universe job is that Condor will
automatically
restart it if the machine reboots - you don't have to write fancy cron
scripts to
make sure that it is still running for you.

-Erik
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users





*************************************************************************
PRIVILEGED AND CONFIDENTIAL: This communication, including attachments, is for the exclusive use of addressee and may contain proprietary, confidential and/or privileged information.  If you are not the intended recipient, any use, copying, disclosure, dissemination or distribution is strictly prohibited.  If you are not the intended recipient, please notify the sender immediately by return e-mail, delete this communication and destroy all copies.
*************************************************************************