[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [Condor-users] I wasted some CPU cycles ;-)



Hi,

the settings I mentioned are for testing if your job is running fine if it
isn't suspended by condor. If you want your job to be suspended and resumed
later, look at the following expressions in your config files:

WANT_SUSPEND, WANT_VACATE, START, SUSPEND, CONTINUE, PREEMPT

Set PREEMPT=FALSE to avoid that your job gets killed after a few minutes of
being suspended. I expect them to look like this:

WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
PREEMPT = FALSE
START = TRUE
SUSPEND = $( TEST_SUSPEND)
CONTINUE = $(UWCS_CONTINUE)

# Suspend jobs if the keyboard has been touched
TEST_SUSPEND = $(KeyboardBusy)

# Continue jobs if:
# 1) the cpu is idle, AND
# 2) we've been suspended more than 10 seconds, AND
# 3) the keyboard hasn't been touched in a while
UWCS_CONTINUE = ( $(CPUIdle) && ($(ActivityTimer) > 10) \
                  && (KeyboardIdle > $(ContinueIdleTime)) )


mit freundlichen Grüßen / with kind regards,

Matthias Röhm

=======================================================
Matthias Röhm, DaimlerChrysler AG, Research Center Ulm,
Department for Data Mining Solutions, RMI/DM
89013 Ulm,  Germany

Phone:               +49 731 505 4864
Email:               mailto:Matthias.M.Roehm@xxxxxxxxxxxxxxxxxxx
=======================================================

condor-users-bounces@xxxxxxxxxxx schrieb am 07.07.2005 14:11:28:

> Hi Matthias,

> Thanks a lot for your answer.
> Should I understand that there is no way to suspend when there's user
> activity and continue later on Windows?
> I expected exactly that from the answers I entered in the GUI, no
migration,
> but suspend and resume later.
> Regards,

> Miguel

>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of
> matthias.m.roehm@xxxxxxxxxxxxxxxxxxx
> Sent: 07 July 2005 12:49
> To: condor-users@xxxxxxxxxxx
> Subject: Re: [Condor-users] I wasted some CPU cycles ;-)

>
> Hi Miguel,

> Condor can't checkpoint on Windows systems, therefore the job is killed
and
> restarted (from the beginning) on another machine. If you want your jobs
to
> run all the time (even if the machine is used by a user), use the folling
in
> your config file:

> WANT_SUSPEND = FALSE
> WANT_VACATE = FALSE
> START = TRUE
> SUSPEND = FALSE
> PREEMPT = FALSE

> mit freundlichen Grüßen / with kind regards,

> Matthias Röhm

> =======================================================
> Matthias Röhm, DaimlerChrysler AG, Research Center Ulm, Department for
Data
> Mining Solutions, RMI/DM 89013 Ulm,  Germany

> Phone:               +49 731 505 4864
> Email:               mailto:Matthias.M.Roehm@xxxxxxxxxxxxxxxxxxx
> =======================================================

> condor-users-bounces@xxxxxxxxxxx schrieb am 07.07.2005 13:00:44:

> > Disclaimer: idiot here ;-)

> > I've got a serious problem.

> > I was running my jobs for the last few days, until I accumulated 2
> > days
> of
> > run time (the "normal" time for such a task to finish) and today I
> decided
> > to check the size of the file being generated.
> > This morning, after running overnight, the file was 44 MB... After 2
> > days
> of
> > running it should have been close to the final size of 610 MB, so that
> was
> > my first shock.
> > Just checked again (the machine is currently in use by the Owner, so
> Condor
> > is not active) and the file is not there anymore.

> > I suspected that this morning when I checked the file size... Instead
> > of being suspended to resume later, my jobs are being killed for some
> > reason. Being a new starter with Condor probably I missed something.

> > A bit of background: the machines are all Windows (2K and XP), with
> > the central server on 2K. After little struggling I got the jobs
> > running
> using
> > this .sub:

> > #
> > # Submit 4 jobs of rtgen.exe to Condor
> > Universe = vanilla
> > Executable = rtgen.exe
> > Arguments = ntlm alpha 1 7 $(Process) 9000 40000000 ncc Initialdir =
> > E:/ Transfer_input_files = libeay32.dll, charset.txt
> > Should_transfer_files = YES
> > When_to_transfer_output = ON_EXIT
> > Nice_user = True
> > Notification = Never
> > Getenv = False
> > Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") )
> > # later I've to try
> > #Requirements = ( (OpSys == "WINNT50") || (OpSys == "WINNT51") ) &&
> > (VirtualMachineID == 1)
> > # and
> > #hold = True
> > Queue 4
> >
> > I'm pretty sure that my problem is not there, but in the condor_config
> file
> > on each node, most likely under Part 3, that I left exactly as
> > installed
> by
> > the Windows GUI installer (I only modified bits in Parts 1 and 2, to
> > make
> it
> > work).

> > During installation using the GUI, I choose to suspend and continue
> later,
> > no migration.
> > What do I have to modify in condor_config (in the clients only? Or
> > also
> the
> > central server?) to ensure that a job that has to run for 2 days of
> > CPU time, generating a file of 610 MB, is not killed when the owner is
> > using
> the
> > machine?

> > TIA!
> > Regards,

> > Miguel

> >
> >
>
****************************************************************************

> *******************************

> > DISCLAIMER:
> > This e-mail contains proprietary information, some or all of which may
> > be legally privileged. It is for the intended recipient only. If an
> > addressing or transmission error has misdirected this e-mail,
> > please notify the author by replying to this e-mail. If you are not
> > the intended recipient you may not use,
> > disclose, distribute, copy, print or rely on this e-mail.
> >
>
****************************************************************************

> *******************************

>
> >
> > _______________________________________________
> > Condor-users mailing list
> > Condor-users@xxxxxxxxxxx
> > https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users

>
>
***********************************************************************************************************

> DISCLAIMER:
> This e-mail contains proprietary information, some or all of which
> may be legally privileged.
> It is for the intended recipient only. If an addressing or
> transmission error has misdirected this e-mail,
> please notify the author by replying to this e-mail. If you are not
> the intended recipient you may not use,
> disclose, distribute, copy, print or rely on this e-mail.
>
***********************************************************************************************************


>
> _______________________________________________
> Condor-users mailing list
> Condor-users@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users